nVidia NV40, NV41, NV45 & Co. Informations
There are lots of rumours around nVidia's next-gen graphics chip NV40, but absolutely nothing has been confirmed officially yet. Unfortunately there are also speculations which gained rumour-status, although nothing has been said concerning that, not even unofficially. Today we want to sum up our knowledge about NV40, along with some brand-new informations, which haven't been released anywhere else yet.
To categorise the NV40 project: The zero in the codename indicates an entirely new architecture concerning the features, as it has always been the case with nVidia (NV10 = GeForce1 with DirectX7-features, NV20 = GeForce3 with DirectX8-features, NV30 = GeForceFX with DirectX9 Shader 2.0 features). NV40 will still be DirectX9 though, (and not DirectX9.1), but will raise the shader-specs to 3.0-level.
According to original nVidia plannings, every year a new architecture should be released, and every half year the matching refresh-chip. But nVidia could only stick to this plan with Riva TNT 1/2, and GeForce 1/2, already GeForce3 and even more so the GeForceFX chip were delayed heavily, which of course also affected the successing projects. Due to the delay of GeForceFX, the NV40 chip finally was planned for Fall 2003.
This plan also didn't work out, so nVidia released the NV38 (GeForceFX 5950), a second, initially not planned refresh chip of the original NV30 chip (GeForceFX 5800 /Ultra). When the NV38 was launched, it was still unclear when NV40 was due, but now we know that nVidia will present the NV40 either at the CeBIT (18th - 24th of March) or at the Game Developers Conference (22nd - 26th of March), while first purchasable NV40 graphics cards can be expected around the end of April or the beginning of May.
Now about the technological changes of NV40 compared to NV38: It is highly probable that the pixelshader-architecture has been improved heavily, and that the shaders work much more efficiently than with GeForceFX cards. There are two ways to achieve this: Enlarge the temp-register files (to minimise idle cycles of the pipeline), and splitting the Vector4 caltulation-units into Vector3+scalar units (Register Combiner and the R300-shaders achieve a big plus in performance by doing that). Of course these alterations cost transistors, but the gained performance would more than justify that. Therefore we're pretty sure, that the NV40 will receive changes along this lines.
It's likely that NV40 has at least double the amount of calculation units for the pixel shader compared to NV38. Architecture improvements and -widening together would mean a massive increase in pixel shader perfomance, at least in versions 2.0 and 2.X. Shader version 3.0 will be supported, but we expect that the use of jump commands in 3.0 will slow the process down quite strongly. The reason for that is: Up to now four pixels were rendered together (either simultaneously or in a row), but these dependencies can't be used in pixel shader 3.0 because the commands can be different for each pixel.
The spec-term "pipeline x TMUs" is probably out-dated by NV40, but we think that with multitexturing the results are similar to a 8x2 architecture. Likely is also that 16 Z/Stencil-tests can be done per cycle, which of course would help in games that use a seperate Z-pass (Doom III). We ask you though, to remain highly cautious about the "8x2 architecture" part.
The NV40 is assumed to have 175 millions transistors, which seems believable for a new architecture, bearing the current growth-rates in mind. In our opinion this offers enough space for additional pixel processors and texture mapping units. But this would pretty much use up all the resources, so that beyond performance increase and support of pixel and vertex shader 3.0, not many new features should be expected. There are also some points which would lead to believe that the NV40 was altered during development (instead of trying to build the original design and testing it). Whether that was enough to beat R420 resp. R423, the next-gen chips of ATi, still has to be proven.
To increase overall performance it isn't enough just to multiple arithmetic power. Fillrate remains as important as ever, which also is why we speculate about a 8x2 architecture in multitexturing (with up to 16 textures per cycle). But to be able to make use of such enormous fillrate, a gigantic bandwith would be required: We think nVidia shouldn't have problems with 600 Mhz or even faster DDR2 memory. This also would follow the tradition of nVidia, to always use the fastest memory available. It would even be possible that the Ultra version could have memory with up to 800 Mhz physical clockspeed, but nothing is for sure in this case.
The TMUs are likely to be able to generate a bilinear sample per cycle each, as it has been the case before. Trilinear and anisotropic filtering should also be implemented the way it is on GeForceFX (while S3's DeltaChrome can filter trilinear from one MIP-map, which will be the way to do it long-term wise). If nVidia sticks to 8x AF, or if NV40 has 16x AF or even higher implemented, remains unclear.
We say, the GeForceFX 8x AF in quality-mode is still up to date. Of course a non-"optimised" GeForce4-like 16x AF would be interesting, to have a direct comparison to Radeon. Whether the new chip will permit full trilinear filtering, is also unknown, but we are worried that nVidia continues to force "brilinear" filtering, although the hardware would be capable of full trilinear filtering.
Side note concerning AF: Quality "by the book" is extremely difficult, that's why nVidia (and S3 in their DeltaChrome) accept optimisations at certain angles in 4x or higher. In comparison to the strong angle-dependency in ATi's AF this can nearly be disregarded though, we accept a certain weakness at 45° angles in higher modes. The "optimisations" can most probably be activated in NV40 again, although this appears pointless to us, since more quality is burnt than performance gained. Well, we're used to nVidia's chips developing evolutionary.
Another point, of which there is no information to this point, is anti-aliasing. Facing the good 6x "sparsed"-mask which ATi's offering since the R300 (Radeon 9500/9700), an improvement in this matter is crucial for the GeForce-series. We know that nVidia is introducing at least one new anti-aliasing mode with NV40, but unfortunately we don't know where to put it: Similar quality with better performance, or better edge-smoothing.
It's probable that NV40 is still marketed as GeForce. Though parts of the nVidia team wanted a new name, the marketing department objected to that. Hence the GeForceFX series continue, according to our information still with a four-digit number-appendix beginning with a "6".
Summary of what we know (or think to know) about NV40 at the moment:
- nVidia NV40
- 175 millions transistors, manufactured in 130nm by IBM
- 8x2 architecture, but 16 Z/Stencil-test per cycle
- DirectX 9.0 architecture, supports shaders 3.0
- Doubled in number compared to NV38 and more efficient pixel shaders
- 256 bit memory interface, supports DDR1, GDDR2, GDDR3
- Internal interface is AGPx8
- Exact clockspeeds: unknown; estimated 500-600 MHz core and 600-800 MHz memory
- Improvements for anti-aliasing: (at least) one new mode, its subpixel-mask is still unknown though
- Improvements for anisotropic filtering: unknown
- Presentation: CeBIT or GDC, end of March
- Market entry: end of April or beginning of May 2004
- Official name: GeForce FX 6XXX
We dare making a prognosis about NV40: At least under the aspect of performance nVidia will excel once more. To know whether it's enough to beat the strong competition we will have to wait until the GDC resp. the CeBIT, because with the R420 and R423, ATi is also expected to present new high end graphics chips.