Inside nVidia NV40
April 14, 2004 / by aths / page 6 of 6
An architecture to scale
To be honest, the GeForce 6800 Ultra is not a card for everyone: It is too big, too fast, and therefore too expensive. It also wants power like nobody's business. Only the AGP-card will ship with two power plugs, the PCI-E version will feature a reduced count of power-plugs (that means, having only one.)
nVidia was clever enough to stongly focus on scalability while developing the NV40. The GeForce 6800 Ultra includes 16 pipes, the GeForce 6800 still 12. Further NV40-based designs with 8, 4 or even only 2 pipes are also possible. The vertexshader is an array of 6 units with 4 pointsampling TMUs in total. A single vertexshader with just one TMU (the other 3 can be "virtual" TMUs, then) will also meet the VS 3.0 requirements.
GeForce3 and 4 are not scalable. GeForce FX was better designed to fit different needs with the same architecture. NV40 takes this a step further: Even the circuits for future mobile version are included now (but not used) because it's easier to cut something out than to integrate new functions in a given design.
This is one of the points we really like about NV40's design: One architecture, to find them all and bind them. NV40 has the most advanced feature set today. The version with 16 pipes is extremely fast, we expect the 8-pipe-version on par with FX 5950 Ultra (if the 8-pipes-version gets a 256-bit-DDR-interface, that is.) Of course let's not rejoice to early, we have to wait and bench it through when those parts are available. nVidia should launch such medium range products as soon as possible.
Even a small 4-pipe-chip based on NV40 could suit the needs of the occasional gamer (as long as it won't be a 64-bit-version.) While FX 5200 was really disapointing for gamers, and FX 5600 wasn't that much better, we hope that the smaller versions of NV40 will be suitable for most casual gamers.
Of course, we cannot expect the same from a hypothetical version with only 2 pipes. A 2 pipe GPU is, by the way, still a quad-based design, but it takes two clock cycles to perform a given operation for the entire quad. Two pipes cannot deliver the performance even casual gamers require. But NV40's architecture comes with more than just 3D features.
Howard Marks: This nice scientist is responsible for finding hardware errors. nVidia utilize state-of-the-art tools for this job.
GeForce 6800 Ultra includes a video codec in hardware. That means the GPU offloads the CPU while performing video decoding or even compression. This of course asks for driver support. The video engine itselfs uses its own integer logic, but the pixelshaders can be used as well. The architecture even supports different clock speeds for the 2D and 3D engines. This is good for mobile versions. nVidia also implemented some interesting things to solve problems with wrongly marked DVDs, i.e. to eliminate interlacing-artifacts. We will have to see if this works out as promised.
"Overlay" is a common technique to show scalable video content onscreen. We don't have all the details yet, but nVidia promised to fix the gamma correction issue - previous GeForce boards are not able to gamma-correct the overlay output.
nView, nVidia's marketing term for dual screen support, is still there. We cannot be more specific as of, but the new nView will offer some additional features compared to its current incarnations. (We don't know yet, if there are may be only driver-improvements.)
It seems as if it's not decided yet whether the full video engine will also be included in very small, NV40-based spin-offs.
The effects possible with Pixel- and Vertexshader 2.0 are not fully used today. Quick jump to the future. It is 2007 AD. While every modern game needs DirectX-Next hardware to turn everything on, DirectX9 (with shader model 2.0) is still the entry tech level. Second-generation DirectX-Next hardware deliver both the featureset and the power for graphical realities previously unthought of. There is a reasonable number of games with tremendous geometry detail, we get offset mapping (a technique to fake displacement mapping) with self-shadowing, and image-based lighting with medium dynamic range rendering. Now, the GeForce 6800 Ultra opens the door to this world.
NV40 is extremely well designed for "todays next generation games", e.g. Doom3 and Halflife 2, we expect very high performance here. For today's games, have a look at our benchmark results.
NV40's architecture is also well designed to scale from lowest- to high end. One single chip cannot satisfy the whole market, nor the whole gamers' community. nVidia's new design offers different performance levels, but always with all the latest features.
Editor's personal conclusion
nVidia did it's job very well indeed. But there is a point I am personally put off about: Poor anisotropic quality, comparable to ATIs. It is a different technique, but brings similiar overall texture "quality." Our time was too short to offer an in-depth analysis right now, but we will investigate this matter as soon as we get another card.
While we have the full "trilinear band" back now, NV40 does not produce full isotropic quality. In fact, the GeForce4 is the last GPU from nVidia where today's drivers produce full standard trilinear quality. GeForce FX's and GeForce 6800 Ultra 's are both "optimized".
If you buy a card such as the 6800 Ultra, you naturally want to have the option of full quality on maximized settings. My Ti 4600 is fast enough for most of today's games and provides very good texture quality I don't want to miss. Simply put, a rather old Radeon 9500+ is also "fast enough" and delivers similar texture "quality" like the GeForce 6800 Ultra . Nearly any scene is fully textured - so a good filtering technique improves the entire image. Basically every pixel depends on it.
Besides this issue, the NV40 is both faster and more feature-complete than I expected. It's also incredibly complex: with about 220 million transistors it certainly can be called a great engineering feat. FP-filtering is something I have been looking forward to for some time now, and even if it may not be really usable during the NV40's lifespan, it's great to see the first steps have been taken.
The 6800 Ultra comes with fullspeed FP32 pixelshading, the latest and best feature set in a long time, and performance galore. I just insist to have the option for full possible texture quality, too.
We want to thank the following parties for making this article possible:
The 3DCenter Forums: RLZ, Xmas, zeckensack (alphabetically order) and many others for helpful input, Demirug for insider information plus his Pixelshader 3.0 article, and nggalai for his strong support especially regarding the English version of this article.
nVidia: nVidia Germany for the NV40-briefing in Munich, the team responsible for the "Editor's Day" in San Jose, David Kirk for his time and invaluable input, and all the other employees at nVidia for their kind answers to our many questions.