Inside nForce
By Van Smith
Date: July 16, 2001
About a month and a half ago, InQuest Market Research published an article I wrote on the NVIDIA (nasdaq: nvda) nForce core logic controller. The nForce is a highly integrated chipset targeted for the AMD Athlon line of processors. Expected to ship this fall, the nForce will debut roughly at the same time as AMD's "Palomino" enhanced desktop Athlon.
The role of the CPU in a computer is often popularly compared with the brain's job in a human. Extending this analogy, core logic controllers, sometimes referred to as chipsets or glue logic, have been traditionally the heart of computers, pumping data to the CPU and various subsystems. For years, however, chipsets have been evolving beyond this role to incorporate other key components such as advanced I/O controllers, sound and even graphics.
NVIDIA's marketing positions the nForce as no mere chipset, and indeed the nForce goes well beyond any previous core logic offering. In addition to what we have come to expect from a chipset, the nForce brings powerful integrated 3d graphics, industry leading sound, StreamThrough architecture and, in the nForce 420, a standard setting dual channel DDR SDRAM memory controller.
===================================
Graphics Performance
As is evident from my InQuest preview, I am very excited about the nForce line of chipsets. For the first time ever, an integrated 3d controller should be able to deliver 3d gaming performance at high resolution that is satisfactory for many users. The nForce 220 delivers 3d gaming on par with a GeForce2 MX200, while the nForce 420 is about 30% faster. For the hardcore gamer, graphics can be upgraded via an AGP 4X card to exploit the latest technologies.
For business applications the integrated graphics of the nForce should be roughly on par with any other solution, although an external AGP card will likely improve performance slightly by reducing video controller bandwidth contention for system memory.
===================================
Bandwidth
The nForce is able to produce simply prodigious bandwidth numbers. The term "bandwidth" refers to the rate of flow of data, in this case from main memory to the CPU. Thanks in equal measure to the Dynamic Adaptive Speculative Pre-processor (DASP), as well as to its 4.2GB/s dual channel DDR SDRAM memory interface, the nForce 420 can reach over 300% (3x) of the performance of a comparable AMD760 system on certain commonly used bandwidth tests.
For most typical operations, CPU bandwidth is limited by factors such as level 2 (L2) cache line size (often simply called "line size"), latency (or lateness) to access main memory, and the memory to chipset bandwidth. The Athlon has a 64-byte line size which means for typical operations the Athlon grabs chunks of data 64 bytes at a time. To fill a line, the CPU makes a request for data but has to wait until the data starts flowing from main memory. This initial latency greatly reduces effective bandwidth.
Once the requested data starts flowing, DDR SDRAM is able to supply data roughly at whatever its rated bandwidth is. For PC2100, this rate is 2.1GB/s which matches exactly the bandwidth of the 64-bit memory bus of the Athlon its mated with. The nForce 420 support two channels of PC2100 bringing up memory to chipset bandwidth to an unprecedented 4.2GB/s. However, the initial latencies mentioned above are so great that practical throughput would be diminished close to parity with single channel DDR systems if NVIDIA had not addressed other variables.
Bandwidth can be improved by increasing the line size, but line size is fixed by CPU architecture. And increasing line size has its drawbacks. The Pentium 4 has a 128-byte line size which provides a direct benefit for bandwidth since the initial latency hit is diluted. However increasing line size can reduce bus utilization efficiency and this appears to be a problem at times for the P4.
This leaves trimming initial memory access latencies as hope to improve CPU bandwidth. Unfortunately, there are simple rules of physics that limit how low these latencies can be and that is why chipsets from AMD, VIA and SiS provide bandwidth generally within 10% of each other.
However the nForce does significantly reduce main memory access latencies for streaming operations. NVIDIA manages this by incorporating the DASP, a forwardly speculative engine. The DASP tries to anticipate simple, forward memory access patterns and preemptively pull this data into cache integral to it. NVIDIA is very tight lipped about design details of this mechanism since it is key to differentiating their products. However, speculation of the DASP's cache size has been discussed elsewhere, with 64kB as the most common size mentioned. Tests run with a diagnostic program I wrote, COSBI's BandwidthBurn, seem to indicate that either the cache is only 32kB, or perhaps the cache is partitioned so that half is devoted to the CPU and half to the graphics unit.
Regardless, the results of combining the DASP with dual channel DDR SDRAM are spectacular. As we mentioned above, on some commonly used bandwidth tests, InQuest observed greater than 3x performance advantage for the nForce 420 compared with an identically configured AMD760 DDR SDRAM system. Although the results varied down to only a few percent depending on the test, by and large the nForce's bandwidth is jaw dropping.
Please note that to extract maximum bandwidth performance benefit from the DASP, the dual channel DDR SDRAM nForce 420 will be necessary along with an AGP graphics card. The nForce 420 has sufficient bandwidth to sufficiently stay ahead of CPU memory streaming requests. Use of an AGP graphics card eliminates possible bandwidth contention from the integrated controller. The DASP also does not provide significant benefits for either random, or reverse stride (backwards) memory access patterns.
Finally, it is important to point out that bandwidth is rarely a bottleneck in typical applications. However, there are a few specific tasks that demand high memory throughput like video and audio encoding, intense context switching, a handful of games like Quake III and matrix manipulation of large datasets. The nForce's great bandwidth should be evident in these tasks. For all other chores, latency is more important.
===================================
Latency
I have stressed in my chipset reviews that the two most important fundamental performance characteristics of a typical chipset are memory bandwidth and memory latency. From these two performance aspects, accurate predictions can be made towards application level performance. Clearly, the nForce 420 has outstanding bandwidth. What about latency?
Examining the claims made in the InQuest article regarding application level benchmark performance "very little difference" is noticed between the nForce and the AMD760 on tests that are not bandwidth sensitive. This suggests that the latency characteristics of the nForce are probably a little worse than that of the AMD760 -- significant enough to negate any advantages that the DASP might bring.
InQuest was working with early preproduction silicon, and memory access latencies are one of the areas most susceptible to changes as a chipset nears release -- usually latencies are tightened, both in new silicon revisions and in BIO's. Therefore it is highly likely that the nForce will sport substantially better latency characteristics in shipping units. In fact, it is known that NVIDIA has already released a newer chipset revision than what InQuest tested and reportedly latencies have been improved.
Interestingly, it is not beyond the realm of possibility that the nForce 220 might even beat the 420 in some application level tests. The reason for this is that an aggressively tuned speculation engine like DASP tends to increase access latencies for non-streaming access patterns. Since the 220 will almost certainly be tuned less aggressively, memory latencies could turn out to be lower than with the 420.
===================================
Xbox Lineage
The Microsoft Xbox promises to be the most powerful and exciting gaming console yet. With the help of Microsoft's deep pockets, the Xbox looks to be an almost guaranteed commercial success. As early as January of this year, the waiting list for the Xbox was already around thirty deep at a Software, Etc. here in Fayetteville, Arkansas. The waiting list required a deposit, so these people are evidently serious about their intentions to purchase the gaming system. The Xbox is not scheduled to become available until November 8, 2001 and will cost $299 -- extraordinarily inexpensive given the power and feature set of the system which includes a hard drive, DVD ROM drive, 64MB of 6.4GB/s DDR SDRAM, and broadband (Ethernet) connectivity.
But what really sets the Xbox apart is its custom NVIDIA core logic implementation. Composed of two chips linked by an impressive 800 MB/s AMD HyperTransport interconnect, the two chips are labeled the "IGP" and "MCP." The IGP, or Integrated Graphics Processor, serves as the north bridge while the MCP, or Media and Communications Processor, is the south bridge.
The nForce is essentially this Xbox chipset ported to the PC. With the main difference of a slightly weaker graphics core in the nForce, the two products are virtually identical.
One potential benefit of a chipset having integrated graphics is that the AGP bus, notoriously problematic for motherboard vendors, can be internally bypassed allowing transfer to occur well above AGP's current maximum transfer rate. The nForce allows effectively for AGP 6x speeds to the internal graphics core.
One of the more exciting features of the MCP is the APU (Audio Processor Unit). The APU is a 4 billion operation per second (BOP) multi-DSP device that can simultaneously provide 192 hardware accelerated 2D voice at the same time with 64 hardware accelerated 3D voices. Real-time hardware accelerated effects include occlusions, reflections, reverberations and Doppler shifts.
Thanks to the high bandwidth AMD HyperTransport interconnect, the APU can render these effects directly to system memory.
The NVIDIA APU is also the first mainstream audio solution that can encode a Dolby Digital AC-3 audio stream in real-time. At a time when only a few competitors are just now providing AC-3 decoding, the APU has the muscle to be able to perform such tasks as encoding a DirectSound compliant game into a 5.1 AC-3 stream so that the game audio can be played through a home theatre system so that the full benefit of the 3D audio processing can be enjoyed.
Not only is the NVIDA APU clearly the most powerful integrated audio solution ever, but the APU has a feature set that exceeds that of Creative Labs best offerings. Moreover, the APU is not saddled with having to reside on the PCI bus, opening up much more bandwidth facilitating direct memory rendering and reducing the chances of resource contention.
NVIDIA's APU is a serious threat to all mainstream audio card makers, because few consumers will see any need to upgrade beyond it.
One of the more important design details of the nForce is the AMD HyperTransport interconnect linking the IGP and MCP. With traditional chipsets, this interconnect is the PCI bus, whose limited 133 MB/s of bandwidth must be shared with all I/O controllers and with the PCI expansion slots. With Fast Ethernet here and Gb Ethernet finally becoming affordable, ATA100 drives, IEEE1394b promising GB/s transfer rates, and other I/O standards soaking up bandwidth, the PCI bus is overloaded as a chipset interconnect. While Intel and VIA have both introduced proprietary interconnects that double this bandwidth to 266 MB/s these are inadequate steps.
With HyperTransport, NVIDIA chose a solution with longevity and flexibility and the rewards are immediate. The nFoce supports isochronous (real-time) streaming through the integrated Fast Ethernet controller. Even simple operations such as a disk to disk copy can be accelerated as well. Only SiS, with its 1.2GB/s "Multithreaded I/O Link" in its single chip core logic solutions, provides a higher bandwidth interconnect.
===================================
TCO Tango
One of the most onerous chores of both consumers and IT departments alike is the need to provide maintenance for computer systems primarily through driver upgrades. For IT departments, this maintenance is measured by "TCO" or "Total Cost of Ownership." For companies, the costs to maintain an ocean of PC's can greatly exceed the initial purchase price of these systems. For consumers, upgrading to a new driver is often akin to a game of Russian Roulette -- it is all too common for a new driver to hammer another part of the system or bring down the whole system altogether.
Highly integrated solutions such as the nForce -- actually there are no other integrated solutions as advanced and comprehensive as the nForce -- offer huge benefits in reducing maintenance headaches. Since driver packages can be distributed that address most of the system's functionality, the vast majority of device conflicts are eliminated. And instead of having to upgrade half a dozen or more drivers, with nForce one driver package upgrades every integrated component.
Businesses should pay close attention to the nForce. Not only will it be the most powerful integrated chipset ever introduced, but it should be able to reduce the all important TCO.
===================================
Market Outlook
Ace's Hardware has reported that nForce motherboards may be available as soon as August. NVIDIA has only stated that product availability will be this fall. The Santa Clara, California based graphics chip company has disclosed, however, that the launch date is contingent on both AMD Palomino and Windows XP availability. In fact, NVIDIA has even positioned that initial nForce OS support would be limited to Windows XP and ME. This casts some doubt on the August timeframe, and may mean that nForce motherboards will not reach market until September or even as late as October 25th, the Windows XP release date. [edit: The XP information is outdated. For clarification, please see related news item.]
Assessing market potential, in terms of technical merit there is no current product that can compete with the nForce in the segments NVIDIA is targeting. However, many other factors must be considered.
Examining typical business application performance, several non-integrated DDR SDRAM Athlon chipset perform almost identically with the integrated nForce. The near to market SiS 735 appears to have even a slight edge in business application performance. However, with games, video and audio encoding, and in terms of innate home entertainment potential, the nForce is currently the most compelling integrated solution and will likely remain so well beyond its fall launch.
The nForce's use of DDR SDRAM gives it a distinct advantage among integrated chipsets. Typical SDRAM configurations cannot supply data fast enough to integrated graphics cores to provide acceptable 3D performance at mid and high resolutions. The nForce 420 has an even greater advantage thanks to its dual channel DDR DRAM design, giving it about four times the memory streaming capacity of SDRAM based integrated chipsets. Adding to this advantage, the nForce also boasts by far the most powerful integrated graphics core of any integrated chipset.
VIA Technologies has made huge strides in both market share and product reliability (despite recent widely reported south bridge problems) over the last two years. Major OEMs tend to be very conservative about adopting new technology. Despite NVIDIA's amazing track record delivering industry dominating graphics controllers, it is still a novice chipset maker who has yet to deliver a chipset product to market. In this regard it is at a disadvantage with VIA, ALi, SiS and even AMD which doesn't consider itself in the chipset business.
If nVidia can bring the nForce to market on time this fall without bugs and at the proper price points, expect the company to see rapid success in the DIY (Do It Yourself - enthusiasts that assemble their own systems) market with the adoption rate from smaller OEM's closely tracking DIY acceptance. Examining all other integrated chipsets, currently there are none that approach the nForce in terms of bandwidth, 3D graphics performance, audio capabilities and comprehensive feature set. As long as nVidia can execute, these qualities alone will almost guarantee success in the performance centric DIY market.
VIA et al are no pushovers, but they have a new competitor -- and potentially a serious one. VIA may have focused more resources on its upcoming Pentium 4 DDR SDRAM chipset, which, according to insiders, delivers performance competitive with Intel's i850 RDRAM chipset.
Beleaguered Intel, on the other hand, has yet another attacking army to worry about. The nForce brings real value to the already attractive and successful AMD Athlon/Duron platforms. There is no integrated chipset available for the Intel Pentium 4 and certainly none as advanced as the nForce exists for it even on the horizon. Lack of a viable integrated chipset will place the Pentium 4 at a strategic disadvantage in competition with Athlon-nForce systems for low cost segments. Perhaps most acutely and critically, the P4 will find itself with comparatively poor traction in the business arena where the TCO benefits of integrated solutions are becoming more widely recognized. The nForce may be the springboard AMD has needed to penetrate the lucrative business segments which have thus far largely resisted the company's efforts.
Duron-nForce 220 systems might also soon vault AMD into budget segment dominance.
The caveat must be made that the nForce has not reached market yet and is still months away from doing so. Chipset development is a demanding task so wholesale endorsement must be made cautiously.
One last note, ATi is quietly working on an advanced integrated chipset. Through its ArtX subsidiary, ATi already has some experience with a dual channel SDRAM chipset, the ALi Aladdin 7 designed for the AMD K6-III+. Ironically, ATi now appears to have wedded itself with Intel. Perhaps ATi will be the source of an integrated P4 chipset that can begin to compare with nForce.
===================================
Conclusion
The NVIDIA nForce is an impressive product. With its DASP intelligent caching system and the 420's dual channel DDR SDRAM interface, nForce sets new bandwidth records. The integrated GeForce2 graphics core is the most powerful of any integrated solution by a factor of two or three at high resolutions.
The APU is in a league of its own among integrated audio processors. It is also arguably the most potent mainstream solution even when compared with audio cards costing as much as an entire nForce motherboard. The nForce APU is a dark cloud on the horizon for consumer level audio card vendors.
The AMD HyperTransport interconnect link enables the IGP and MCP to communicate with sufficient bandwidth for even the most demanding applications. Compared with the latest technologies from Intel and VIA, both only reaching 266MB/s, the 800 MB/s for HyperTransport is a significant leap forwards.
If the nForce has a weakness it appears to be in business applications, but even in its worst suit means the nForce manages a toss up with its strongest competitors. Thanks to its exceptional feature set and placed opposite a vacuum of P4 integrated solutions, the nForce could be the springboard that catapults AMD into corporate environments. The nForce 220 should also help AMD's Duron secure its place against Intel's budget offerings.
But the nForce is far from a guaranteed success. NVIDIA is an unknown commodity in the chipset business. If NVIDIA executes as flawlessly with the nForce as its respected and feared reputation in graphics core business hints at, then expect rapid adoption among the DIY market with second tier OEMs closely tracking. Major OEMs tend to be very conservative in adopting new technology, so it may take several months before nForce gains traction among Top-Tier OEMs.
In terms of technical merit, the Athlon chipset market this fall looks to be dominated by nForce with perhaps the promising SiS 735 finding a foothold as well. In terms of market share, VIA will be hard to shake, but if NVIDIA and SiS both are able to produce chipsets in quantity and free of major bugs, these two upstarts could make rapid inroads.
===================================
===================================