Inside ATI's R420
May 4, 2004 / by aths / Page 2 of 4
Because the pixelshader 3.0 spec includes an minimum instruction count of 512, every pixelshader 2.X is limited to 512 instructions max. (3.0 includes all "caps" of 2.X, that's why 2.X is not allowed to extend 3.0 minimum requirements.) The R420 can deliver more: Up to 512 texture-operations plus up to 512 math-instructions. (R360 is able to handle 32 tex plus 64 math.) ATI counts vector3- and scalar-instruction separate, which the texture operations this results in 3 * 512 = 1536 instructions according to ATI. As we stated, DirectX 9 limits this to 512 instructions (while every instruction can have its modifier).
Temporary register count was raised from 12 to 32. This, by the way, is the one and only improvement over Nvidia's original CineFX engine (featuring 22 temporary registers.) More temps allow bigger blocks of texture sampling, which comes in handy due the dependent read limit of four. (Any texture-operation followed by an arithmetic instruction counts against this dependent read limit. A GeForce FX+ has no such limit.) The whole story is a bit more complicated, because for best performance, depending on the actual hardware, a given amout of used temps should not be exceeded even if spec allowes it. But for the record: R420 DirectX9 performance is very good indeed.
Please don't mistake shader target profiles with shader versions. With version 9.0c, there are four DirectX9 pixelshader profiles for hardware accelerated rendering: 2_0 (for Radeon 9500 - 9800), 2_A (for any GeForce FX), 2_B (for Radeon X800) and 3_0 (for GeForce 6800.) Every pixelshader hardware with any improvements over the minimum 2.0-spec is considered as a version of 2.x, as long as the hardware does not already meet the 3.0 spec. In other words, both R420's and NV30's pixelshader versions are called 2.x, while NV30's feature set is, in the end, much more advanced than R420's.
At this time, there is no need for any gamer to care about Shader Model 3.0. Of course, many developers are eager to play with this shader model already today, but the effects possible with pure 2.0 are not even remotely exposed today. Considering the large installed 2.0 hardware base, the next generation of games will concentrate on these specifications, and add SM 3.0 support as a goodie at most.
The F-Buffer was improved, its size can now be dynamically adjusted. While the drivers have to support this, it is not only a driver feature similar to "Temporal" AA but resulted in changes to the hardware, too.
The HyperZ subsystem (now called HyperZ HD) works a bit more efficiently compared to previous revisions. More important, though: The full HyperZ feature set is still available with some quads deactivated. (In this context, a quad means a bundle of four pipelines.) RV360's (Radeon 9600) performance would be about 10% higher with a full working HyperZ similar to HyperZ HD. Possible future entry or mainstream versions of the R420 (with 4 or 8 pipes) can take advantage of the whole HyperZ efficiency, increasing the performance compared to RV360.
In the near future, there will be two versions of X800 available: X800 Pro with 12 pixelshader and 6 vertexshader pipelines, and the X800 XT with 16 pixelshader and also 6 vertexshader-pipes. In fact, it is the same chip. The XT has to be a fully functional chip and must tolerate higher clock speeds, while the Pro can be either a fully functional chip with lower maximum clock speed or a chip with a single defective quad-pipeline. So, ATI can use more dies (instead of selecting the chips meeting the XT-requirements only), and the user has the choice between very good performance at a lower price or maximum performance for highend-prices. Nvidia's strategy is the same, showing that both IHVs are trying to increase yield rate.
While Nvidia uses dedicated logic for its video coding and decoding acceleration, ATI only uses pixelshaders to do the necessary calculations. As Nvidia announced a fully hardware accelerated video codec (which is not yet working with current drivers), ATI announced this feature, too. ATI is known to announce some video features as generally working, while they are in fact dependent on proprietary software support. We will have, once again, to wait and see.
What was kept
R420 got the version 2.0 vertex shaders from the R300 range of chip sets, but more of them (6 as compared to 4; together with the higher clock rates vertex power roughly is doubled.) Also, anisotropic filtering is exactly the same as in R300. This is a pity, of course. While R300's AF is an improvement in texture quality compared to R200 (on R200, some image quality aspects with AF are lower than in no-AF situations) it is far from the best possible improvement. We will talk about such trade-offs in a future column.
Regarding antialiasing, all the well-known modes are back, with no additional options such as 8x. Because ATI's 6x AA has a better edge resolution than Nvidia's 8xS this is no real drawback. (8xS also has an extreme impact on performance, while 6x on a today's Radeon range is an option worthy of consideration in quite a number of games. On the other hand, particularly the big supersampling-part in 8xS raises some aspects of the image quality compared to pure multisampling on Radeon.) 6x AA gives a very good smoothing to polygon edges. Still, R420 utilizises so called gamma-corrected downfiltering with a fixed gamma-value of 2.2. This often improves the visible reduction of jaggies while it worsens the aliasing-artefacts in some rare cases. Adjustable gamma for the downfiltering could remove this problem, but costs more transistors (that's why it's not in the chip).
"Centroid sampling" solves some problems associated with multisampling AA. This feature is required by Pixelshader 3.0. (Which is somewhat odd, as centroid sampling doesn't have anything to do with pixelshader calculations per se. It's a rule how to sample textures at edges of polygons.) As "Centroid sampling" is already available in R300, DirectX 9.0c will expose this feature for Shader Model 2-compliant hardware.