CineFX (NV30) Inside
August 31, 2003 / by Demirug / page 3 of 7
Der Shader Core (Forts.)
After passing all ten stages the quad split into four pixels can either move on to the next units or, in case the current instruction requires another pass, be sent back again to the entry of the shader core (internal shader core loopback). In this case the four scalars of the output register are taken as inputs to stage 4 (first multiplexer) while both texture coordinates are discarded. If the quad leaves the shader core, the texture coordinates get passed on to the TMUs while the output register is stored into a FIFO buffer which runs parallel to the TMUs.
The following table shows the number of shader core passes neccessary to execure certain PS instructions.
Command | Cycle | Comment |
nop | 1 | |
mov | 1 | |
add | 1 | |
mad | 1 | |
mul | 1 | |
rcp | 1 | 1/x |
rsq16 | 1 | 1/sqrt(x) FP16 |
rsq32 | 2 | 1/sqrt(x) FP32 |
dp3 | 1 | |
dp4 | 1 | |
min | 1 | |
max | 1 | |
slt | 1 | |
sge | 1 | |
exp | 1 | |
log | 1 | |
frc | 1 | |
lit | 1 | |
dst | 1 | |
lrp | 2 | |
texcoord | 1 | |
texkill | 1 | |
2d texture read | 1 | Function can be performed twice per clock. |
cube texture read | 1 | Function can be performed twice per clock. |
3d texture | 1 | |
texbem | 2 | From here on instructions needed for NV25/PS 1.1-1.3 compatibility. |
texbeml | 3 | |
texbemproj | 2 | |
texreg2ar | 1 | |
texreg2gb | 1 | |
texm3x2pad | 1 | |
texm3x2tex | 2 | |
texm3x2depth | 2 | |
texm3x3pad | 1 | |
texm3x3tex | 2 | |
texm3x3cube | 3 | |
texreg2rgb | 1 | |
texreg2rgbcube | 1 | |
texdp3 | 2 | |
texdp3tex | 2 | |
texdp3depth | 2 | |
texbrdf | 1 | |
texm3x3spec | 7 |
2D and cubemap texture accesses are special cases as two of them can be executed per pass.