Multisampling Anti-Aliasing: A Closeup View
So what's the point? - Why multisampling is faster
The anti-aliasing quality of the GeForce3 wasn't groundbraking, in all fairness this title belongs to the Voodoo5. But the big advantage in favour of GeForce3 was speed. We'll talk about quality later, and turn to speed now. Because this is the big advantage of multisampling anti-aliasing.
GeForce3 doesn't calculate a texture value for every subpixel. This saves both fillrate and bandwidth. It saves a lot more fillrate than it saves bandwidth, though, because the bandwidth savings are mainly from reduced texture sampling, but this sampling is done mostly from cache anyway. And although there is only a single colour value to be generated per pixel, it still has to be stored seperately for every subpixel.
The Z test is done simultaneously for all subpixels in a pixel. Now here's the tricky part: The additional expense of Z fillrate required for anti-aliasing is saved again right away, but for now the Z bandwidth remains as high as it was.
As a consequence of that, the 4x anti-aliasing is quite slow, simply because there's not enough memory bandwidth, despite certain optimisations. The high anti-aliasing performance in comparison to the competition is mainly due to a lot of raw power. Though beaten by a GeForce2 Ultra in theoretical fillrate, the GeForce3 showed its true strength in 32 bit mode, and devastated all other 3D gamer cards available at that time.
Other than the GeForce2, the GeForce3 is able to make use of its 4x2 architecture also in 32 bit mode, and this additional power also affects the anti-aliasing speed, of course. Multisampling technology is only one component of high anti-aliasing speed. Another important part is the handling of the mentioned Z bandwidth which is extensively stressed by anti-aliasing. GeForce models since GeForce3 are capable of Z compression, which already increases the speed in general, but pays off even more for multisampling anti-aliasing.
We won't deprive you of some numbers, we tested Villagemark in 1400x1050x32 on a GeForce4, clocked at 300/350 (Core, RAM).
Multisampling anti-aliasing performance
This table shows two things: first, MSAA alone isn't sufficient for attractive anti-aliasing speeds. Second, in 4x MSAA mode Z compression has more effect than early Z, and that's especially interesting because our GeForce4's memory was overclocked, which increases the available memory bandwidth. And still Z compression yields such a significant performance boost. Without anti-aliasing, Z compression increases overall speed only by 3%, with 2x MSAA already by 9%, and with 4x MSAA by a full 13%. Depending on the benchmark the results scale differently, but the tendency is obvious: Z compression is crucial for fast MSAA. The FX 5200 does not posses this feature and therefore is not a recommendable card for anti-aliasing.
Differences with HRAA and AccuView: What has been improved for GeForce4 Ti?
anti-aliasing and anisotropic filtering are marketed under a compound name since GeForce4: AccuView on GeForce4, Intellisample on GeForce FX, and Intellisample HCT on GeForce FX 5900. AccuView technology was adopted unaltered, so quality hasn't changed either way. AccuView itself is based on the GeForce3's HRAA ("High Resolution anti-aliasing"), but was improved. The changes concern speed as well as quality. First about quality. Geforce3 has a bit more awkward subpixel-positions:
GeForce3 on the left, GeForce4 Ti (and FX) on the right.
This picture shows the positions of the subpixels (red), the spot used for texture sample coordinate calculation (blue) and the temporary lines in the triangle setup (green).
Why do GeForce4 and FX sample "better" than GeForce3? The average distance from the subpixels to the texel position (where the colour value is sampled from, which applies to all the subpixels) is shorter than with GeForce3. Thus the average colour error per subpixel is smaller. In addition, the subpixels are centered nicely, not offset to the upper left-hand corner, the geometry sampling is more natural.
There are also differences in performance. The early Z test, which is the process of discarding occluded pixels, doesn't work on GeForce3 when MSAA is active. That's why already with 2x MSAA the performance drops significantly. GeForce4 Ti generally is a bit chary concerning Early Z, but with newer drivers, tweak programs such as aTuner are able to activate this feature and it also works with anti-aliasing.
There's another difference which only concerns the 2x mode in 3d fullscreen though. While GeForce3 generally uses conventional downfiltering, GeForce4 or higher wait with the filtering from the multisampling buffers until the RAMDAC scanout. This saves read/write-bandwidth for the downsampling buffer, but it only improves the performance as long as the framerate doesn't drop under 1/3 of the display refresh rate. The NV35 (GeForce FX 5900) also uses this method with 4x MSAA. The break-even here is 3/5 of the refresh rate. But where does NV35 draw the power for 4x MSAA from?