Inside nVidia NV40

April 14, 2004 / by aths / page 5 of 6

Improved Multisampling

At last, NV40 uses a "rotated grid" with 4x antialiasing. There is a common misunderstanding here: You have not only to rotate, but also to scale the grid. We already discussed this subject in previous articles. Take a peek at what a sparsed (for 4x, it's also rotated) grid looks like here (in German only, but the images should be enough, really.) We also have an in-depth article about multisampling for further information (also in German only).

Actually, nVidia's press material is rather inaccurate, e.g. there's a wrong picture describing the 4x ordered grid mask. That's one of the reason we don't even use a single screenshot from the official papers :).

With its new 4x multisampling, nVidia gets close to ATI's 4x antialiasing-quality. nVidia doesn't deliver so-called "gamma-corrected" downfiltering. While nVidia is right claiming "gamma-correction" would produce a not completely accurate output, ATI's downfiltering often shows better visible antialiasing. ATI can also offer 6x. This is not a big improvement over 4x (for that, you'd need an 8x sparsed grid, nVidia's 8x AA is quite inefficient because not full "sparsed" plus it contains some supersampling.) Of course ATI's 6x it is better than 4x. In quality terms, ATI remains the antialiasing king. But anyway, the GeForce 6800 Ultra's multisampling should be good enough for every quality-loving hardcore gamer.

The driver can change the sample position (on an 8x8-grid), so 4x ordered grid multisampling is possible. Oddly enough, nVidia also still offers the Quincunx modes (a technique producing a fullscreen-blurred image.)

Do we need Pixelshader 3.0?

Pixelshader 3.0 is 2.X with every CAP (capability, that means extra options here) turned to the max, plus support for dynamic branching (i.e. conditional jumps in the shader code.) So is Pixelshader 3.0 a big leap now, or just another step in the evolution of realtime 3D graphics?

Many of our readers strictly focus on "usability for games." To be honest, branching is expensive with GeForce 6800 Ultra. Due Early-Out is supported, Pixelshader 3.0 can speed up future and (with patches) existing games. It depends. We expect that Pixelshader 3.0 will be used in the not too distant future. Fortunately, this shader version is easy to implement. Even if ATI's new part will not support PS3.0, there are good chances that Pixelshader 3.0 will be used anyway. Take a look what Pixelshader 3.0 could do for you (German language only. We're working on an English translation.)

As always, nVidia delivers features first, performance second. This is of course good for the marketing department - it can promote the same feature twice: First, when it's comes out, and again, when performance is good enough for "real life" use. Think about 32-bit rendering (TNT, TNT2), T&L (GeForce256, GeForce2 GTS) or Vertexshader (GeForce3, GeForce4 Ti.) Through all the briefings, we encountered lots of praise of what great features nVidia delivers already today, but we have to face the fact that the marketing department may not telling us the whole story. They try to sell their product, after all. It's their job.

In short: the GeForce 6800 (Ultra) will not be a good choice to play "true Shader 3.0 games." While it's the only product supporting Shader Model 3.0 today, future generations (NV50+) will perform 3.0-shaders much faster.

Do we need Pixelshader 3.0 now? Let's have a look at the features Pixelshader 3.0 provides:

All-purpose precision with FP32
Many math operations, huge instructions count
Conditional branching

Considering the GPU a coprocessor, this is a major step forward to a general-purpose coprocessor. Even if branching is expensive (= has a serious impact on performance) you can use it, if you want or need to. You are also free to stay with Shader Model 2.0 (or 2.X) which is rendered extremely fast on NV40.

Pixelshader 3.0 is an option. nVidia provides this option now, ATI does not. It's always good to have options.

Vertexshader 3.0

To simplify, Vertexshader 3.0 is CineFX's Vertexshader 2.X with texture sampling. This texture sampling is not "free". To help the vertex engine to hide latency caused by texture fetches, there should be texture-independent instructions right after the texture access. Also, while MIP-mapping is supported, filtering is not. In other words, the vertexshader can do pointsampling only. It's of course possible to filter bilinear or trilinear doing some extra calculations and some extra texture samplings in the vertexshader. We expect texture filtering support in the vertexshader-TMUs with the next generation (NV50.)

NV40 delivers a two-pass displacement mapping solution. While this is nice, it's not the end of the ladder. Even though displacement mapping is a great technology, and we are really eager to see it in games, we don't expect it even in the second-next generation of games (considering Doom3, HL2 nexgen, and maybe Unreal 3 as second-next), and certainly not during the 6800 Ultra's lifespan.

But without such a hardware-accelerated feature, there will never be such a game. The same applies to Pixelshader 3.0. Of course, at this time it is not a must-have. Later, it will be used. Developers can begin by experimenting with all the nifty new features. For a real market leader it is important to not only deliver what the gamers need now, but also what the developers would like to have. Texture access from the vertexshader is limited on NV40, but it is possible. The pixelshader also offers full 32 bit floating point precision, so the GeForce 6800 Ultra brings a sophisticated shader solution, with some efforts also usable for physical calculations. In the future, parts of game engines other than graphics may run on your 3D accelerator.

Since CineFX, GeForce's pixelshader has been able to render to a vertexbuffer. The vertexshader now can read textures, so there is a "duplex-channel" of data transfer.

Most importantly, the vertexshader power was heavily increased: Clock by clock twice as much as with previous highend GeForce products! This is not a waste of transistors, take the Early-Z-pass for example: This extra pass doubles the requirements for geometry handling. Such tremendous vertexshading power still needs a fast CPU. While more vertex-performance allowes more (and therefore, smaller) triangles, the developer has to take the size of the triangles into account. We already mentioned the issues with quad-based rendering. The solution is simple: The triangles' sizes should be as small as necessary while as big as passable. Such "geometry LOD" (level of detail) could be provided by an advanced displacement mapping technique.

Overall, Shader Model 3.0 is a nice next step in the evolution. Compared to ATI's shader model, it's way more advanced. The common gamer of course does not need the features today. Developers, though, will be very pleased about their new toys.

2 April, San Jose, Hotel Valencia: nVidia's Editor's Day. We received lots of information found in this article right there.