Multisampling Anti-Aliasing: A Closeup View
May 22, 2003 / by aths / page 3 of 8 / translated by 3DCenter Translation Team
GeForce multisampling: done the quick way
The fundamental idea behind multisampling is to sample only one texture value per pixel, although more subpixels are generated. In other words: while supersampling renders the whole frame in higher resolution, multisampling does that only for edges, not for textures. Because sampling makes up the biggest part of the texturing pipeline, a relatively simple extension allows the generation of several multisampling anti-aliasing samples in a single pipeline. So how exactly do the nVidia chips handle that? Due to higher popularity our example adresses GeForce4 Ti and FX. The implementation differences in comparison to the GeForce3 will be addressed later.
The purpose of the triangle setup is to supply the pixel pipelines with work. For multisampling anti-aliasing (MSAA) the triangle setup of GeForce is done at doubled resolution. Per screen line two triangle lines are generated. The higher internal resolution also applies to the columns of course. An example:
For efficiency-reasons the triangle setup of modern graphics card doesn't work with lines anymore, but uses blocks instead. The GeForce uses 2x2 sized blocks, which are also used for the early Z test. Such a block contains 4 pixels, so it's quite obvious, that every of the 4 GeForce-pipelines is responsible for one the pixels.
In this block, a quarter of the pixel in the bottom right-hand corner is covered by the triangle.
The pipelines now get a pixel each. In this example, three pipeline stays idle, because only one of the 4 pixels is inside the polygon. This kind of efficiency loss is inevitable, but with a line-based architecture which processes 4x1-"blocks", the overall loss would be even greater. Generally, the smaller the rendered triangles are, the higher are the risks for idle pipelines. Complex geometry - e.g. with very smooth curves - creates a lot of small triangles, so it is worthwhile to optimise the triangle setup especially for this kind of situation.
At this point we have to mention that the blocks are generated stepwise. To begin with, the triangle is divided into rough blocks, to obtain data for calculating the Z values later on. Then these are divided again into smaller blocks which go through the early Z test and, if at least partially visible, are finally rendered. The LOD calculation for MIP mapping is done per 2x2 block as well. This is still accurate enough and saves a bit of precious silicon real estate.
The reasons for using blocks are mainly the increased rendering efficiency in comparison to line based processing. Long bursts are desirable to make full use of the memory interface and in this respect block wise rendering is dramatically better than line based rendering. Cache hits are also significantly better when using blocks, which again reduces pressure on the memory controller.
Enough of our trip to possible ways of optimising this process, let's follow the subpixels through a pipeline.
The pipeline generates one texture sample which applies to all subpixels. With multisampling each subpixel has its framebuffer. Since in our example only one subpixel lies within the polygon, the texture sample is written only in the framebuffer which belongs to the subpixel "upper right hand corner". There are two possible ways for "empty" subpixels within the pixel to occur: either they don't belong to the polygon or the Z test showed that they are occluded.
Each subpixel has its "own" framebuffer; these buffers are also called multisample buffers.
"Downfiltering" reads the colour values from all four framebuffers and averages them to determine the final colour of the pixel. That's also how the smoothing effect is achieved.
Multisample buffers would allow several other effects besides anti-aliasing, although that would require a lot more than just 4 buffers. Memory consumption really isn't the issue here, it's performance. By the way, it is functionally irrelevant whether the buffers are separate or interleaved. Addressing logic would have to be different, but the visual result would be the same.