What's the use of a pixelshader 3.0 Chip today?
Contrary to nVidia, ATi has resolved as regards their new flagship, the R420 to do without a pixelshader 3.0. The effects of this decision are naturally limited predictable. As it will take some time until the official presentation of both adversaries, we want to bridge the time until the launches take place with a little mental tug and war on the effects of the technical (not featurewise) differences between pixelshader 2.0 and 3.0.
As it is the case with every mental survey, things can finally turn out to be completely different as you thought. Therefore you should make no purchase decision blindfolded on the ground of such technical analyses, as best they could be suplementary to the taken tests with available hardware.
As with every new technique, disillusionment follows in the wake of enthusiam, that is evoked by techdemos. This mainly happens because realisation takes place in so far that the demonstrated techdemos at the launch will not be followed for some time by games featuring the same technique. Finally remains only the basic performance of the new card for the use of current or later published titels.
Fitted with this knowledge from the past you may easily come to the conclusion, that the same will happen to the new pixelshader 3.0 technology. The last attempt of ATi to take a leading position with pixelshader 1.4 was not necessarely a big success. Thus the decision of ATi is comprehensable to do without pixelshader 3.0 with regard to R420 chip. When a new technique is put to action in the long term, its definitly correct not to spend ones money unthoughtfully for every new feature.
But with every new technique its also a relevant problem of medium or short term advantages. This creates a unusual situation with pixelshader 3.0: The ability, which is needed by the hardware to gain pixelshader 3.0 compatibility, can be applied for the acceleration of techniques that go back to DirectX7. Naturally in this process the driver plays an important role, as he has to adapt these older techniques (respectively the game code, that was written with it) for the optimal use of new possibilities.
The key to this acceleration is the ability of "dynamic branching". Up to now it always went like this, that for all pixels that belong to the same polygon also the same gradual procedures were carried out. But pixelshader 3.0 hardware makes it possible that for every pixel different instructions can be executed with regard to the the situation. Hereby the number of procedures per pixel can be naturally different. The legitimate question now is, how such an ability should be able to accelerate the calculation of pixel effects, which have been developed without considering this very ability.
The reason for this is simple: There have been already earlier situations, in which such an ability would have been useful. As it was not available, they had to ressort to a trick. They simply calculated the outcome of every possibility and then chose for each pixel the right one. When a driver recognises such a trick, which is incorporated (in the gamecode) he can change the pixel-calculation so that for each pixel only this possibility being calculated, that finally is also chosen for this very pixel. These tacts of calculation which would be applied for computing the unused alternatives, would have been spared.
• Alpha-test acceleration:
The alpha-test is an old measure to hold back the emission of a pixel into the matrix memory. Does the alpha value of a pixel not fullfill the adjustable condition, it will be quashed. This technology is being used for a display of elaborated structures like fences, for example. Hardware without "dynamic branching" in this case is always forced to completely calculate the pixel before the test.
Pixelshader 3.0 hardware could now give priority to the test, if the final alpha value is at hand. Is the outcome negative, the further calculation can be cancelled immediately, as the pixelcolor isn't needed any longer at all. On the whole this makes only then sense, if the calculation of the pixelcolor needs more clocks than the calculation of the alpha value. An example for this would be the combination of bumpmapping and alpha-test. In case of a negative test of a first fed in alpha value the selection and allocation of the bumpmap could be completely spared.
• Alpha-blending acceleration:
The alpha-blending ist related to the alpha-test. In contrast to the alpha-test it is not only a question of "to be or not to be", but also intermediate steps. Here the new pixel value will be allocated with the already existing in the matrix memory. Depending on the applied allocation function there are also alpha values here, with the effect that the new pixel value are of no account to the final picture. For these actions the same optimization is possible like with the alpha-test.
The alpha-test and the alpha-blending are quite old techniques, that had been possible already long ago before the introduction of the shader technology with DirectX8. For this reason these optimizations can be even deployed with games, that don't use shader at all.
• "Texkill" - The pixelkiller
With the DirectX8 shaders, a new possibility was brought to life to "kill" pixel. The shader command texkill allows to suppress the writing of single pixel into the matrix memory, if a certain condition is complied. As with the alpha-test the calculation is being pursued, in spite of the fact that the result is not needed by the chip anymore.
Bringing the "dynamic branching" to action would make a good opportunity here. As long as the condition is complied, all calculations that follow after a texkill instruction can be skipped. Additionally the driver can try to resort the shader program, so that the texkill instruction lies as close to the outset as possible. The performance gain of this method is depending on two factors:
- 1. The allotment of the pixels, that being cockled.
- 2. The amount of operations, being spared per pixel.
The following diagram shows examplary the saving potential for some shaders. On the vertical axis the relative speed was inscribed. Thereby corresponds 100 percent of the speed of a non pixelshader 3.0 capable hardware. On the horizontal axis the share of pixel that will be reprobated can be seen. For every shader there are two number data: The first specifies the length of the shader. The second corresponds the amount of directives, that at least have to be executed until it is ascertained, if the pixel can be cockled. 08:01 corresponds hence a shader with 8 directives which is deciding after the first directive, if the pixel is drawn.
Of course the graphic displays only the performance saving (of pixelshader raw power) for pixel, whose shader program contains a texkill directive. The amount of the affected pixel can however from application to application fluctuate deeply. Games with a high count of "texkill pixel" show good saving-potential, by courtesy of the "dynamic branching" with pixelshader 3.0 hardware.