AMD Announces Opteron
By
Van Smith
Date: April 24, 2002
In a 3:30 PM Central webcast, Advanced Micro Devices christened its highly anticipated 64-bit enterprise-level server chip. Codenamed “Sledgehammer,” the new CPU will be marketed as the AMD "Opteron." The AMD Opteron will also be available in high end workstations.
The Opteron name, derived from the Latin word optimus for “best,” helps AMD position the upcoming enterprise CPU against Intel’s Itanium and Xeon server chips. The chipmaker states that the seamless dual 64-bit/32-bit operating modes of Opteron open up scaling possibilities from low-end Xeon-class servers to enterprise level Itanium-class applications.
Sophisticated Architecture
The original IBM PC used an Intel 8088 microprocessor. This chip spoke a 16-bit version of x86, a CPU language that has evolved over the years into a 32-bit adaptation used in modern Windows-based PCs. Today x86 is by far the most widely spoken CPU language and the amount of x86 programming expertise dwarfs what is available for competing instruction sets. Of course with the Microsoft Windows code base, the number of applications that run on x86 is staggering.
Years ago x86 CPUs were among the slowest families of microprocessors, vastly trailing their RISC competitors. Today, the RISC versus CISC arguments are moot, since x86 chips have absorbed many of the strongest design tenants developed elsewhere. In fact, x86 chips like the Athlon XP and the Intel Pentium 4 are among the fastest microprocessors available in any architecture.
However, there are areas where the x86 world has remained weak.
Up until when Hammer arrives, x86-level chips have suffered from antiquated bus architecture. Intel’s PIII and P4 use a simple shared bus where each CPU sips data from the same pipe. Although a shared bus is relatively inexpensive to deploy, bus contention becomes as severe problem as more CPUs are added. Consequently, Intel’s Xeon-class server solutions scale poorly when more than two CPUs are deployed in an SMP system.
Although the AMD Athlon has a much more efficient point-to-point bus borrowed from its DEC Alpha pedigree, Hammer-class CPUs are a quantum leap forwards.
Deployed in 1-to-8 way servers (or 1-to-2 way workstations – a “way” is the number of CPUs in a system), each Opteron Hammer has its own dedicated pool of memory, while each Opteron communicates with other CPUs in the system over dedicated high-speed HyperTransport links.
Unlike Intel Xeons where each additional CPU gets a decreasing fraction of the fixed system bandwidth (bandwidth is the rate of flow of data), additional Opterons increase system bandwidth in an additive fashion. Therefore a 2-way Opteron system enjoys twice the system bandwidth of a 1-way Opteron system; an 8-way Opteron system has an eightfold bandwidth advantage over a 1-way Opteron system. For an Intel Xeon server, system memory bandwidth never increases beyond a 1-way Pentium 4 desktop.
Since each Opteron has an integrated DDR SDRAM memory controller with up to two channels each, the latencies for operations on local memory is a small fraction of that seen in current x86 systems. Even when memory accesses occur to memory locations belonging to other CPUs, the HyperTransport protocols are so fast that average latencies are no worse than what are experienced with today’s systems.
Because each chip has its own memory resources and does not have to fight for a single unified pool of memory, Opteron has what is known as a "NUMA" (Non-Unified Memory Architecture) scheme. Opteron’s design removes some of the most critical barriers to performance and scalability that have plagued the x86 world for years. In fact, Opteron’s Hammer architecture is reminiscent of some supercomputer designs.
The benefits of Opteron’s NUMA design will be most clearly seen in n-way systems (particularly enterprise-scale servers), but the core itself enjoys IPC (Instructions Per Clockcycle – a measure of computational efficiency) advantages over the Athlon XP. AMD states that users will typically see a 25% improvement in performance over the Athlon XP at any given clockspeed thanks to reduced latencies from the integrated memory controller (20%) and other core improvements.
The Opteron will be available with L2 caches up to 1MB in size. All Opterons will have reduced latency Translation Lookaside Buffers (TLBs) and improved branch prediction. 64-bit applications will also see performance gains thanks to addition many more general purpose registers. Availability of these extra registers in 64-bit mode will accelerate 32-bit applications when run under 64-bit operating systems.
All Hammers will have full SSE2 support, thus removing the one remaining technical advantage of Intel processors.
Hammers will utilize Silicon-on-Insulator (SOI) technology that will enable the processors to run more efficiently, therefore consuming less power at any given clock speed. For the end user, SOI means higher clock speeds and cooler chips.
Chipset Support
As we have reported here before, third party chipset support for Hammer is unprecedented with some Clawhammer chipsets reaching silicon in June, well ahead of the platform’s introduction.
With Opteron, the level of chipset sophistication must increase to handle enterprise-scale workloads. Today AMD also discussed its set of Opteron core logic, which, with the AMD-8131, includes support for multiple PCI-X buses. Used primarily in top level servers and workstations, PCI-X is a new and much more powerful version of the familiar PCI parallel expansion interface.
Clawhammer Takes Athlon Moniker
While the Opteron took center stage, AMD also divulged that Opteron’s little brother, Clawhammer, will assume the Athlon moniker. This can be seen in the new AMD roadmap released today.
Because of the Clawhammer’s die size, we believe this chip will feature 512kB of level 2 cache like the Barton Athlon.
Opteron Server Demonstrated at WinHEC
At the recent WinHEC 2002 in Seattle, AMD provided closed-door demos of a dual-processing Opteron system serving multiple video streams to a Clawhammer-Athlon system networked via Gb Ethernet. Below are pictures from this demo.
The dual-processing AMD Opteron system.
The Opteron system had 1GB of registered DDR SDRAM and a Gb NIC.
Microsoft has developed a 64-bit version of Windows for Opteron.
The Athlon-Clawhammer system pulled multiple video streams from the Opteron server.
Microsoft Guru Throws Weight Behind Hammers
As we reported yesterday, Microsoft is committed to supporting AMD’s x86-64, the open standard instruction set native to Hammer. Key decision makers inside the software company have been so enamored with x86-64 that the software giant persuaded Intel to adopt the Hammer language.
One of most avid proponents of x86-64 is the legendary Microsoft programmer responsible for the NT kernel. David Cutler, formerly of DEC, has reportedly voiced inside Microsoft extremely strong preferences for x86-64 over Intel IA-64. Allegedly, Cutler and Microsoft do not want to expend the extra resources to produce ongoing support of IA-64, an instruction set Cutler and others inside the software giant reportedly disdain as inferior.
Update: Opteron Teleconference Follow-Up
In addition to the information we reported in today’s featured article, AMD answered numerous follow-up questions from the media. Beyond revealing the Opteron brand name, AMD also disclosed that they are collaborating with Microsoft on x86-64 OS development. Tomorrow AMD will publicly demo an x86-64 .Net Server.
AMD did not reveal specifically which OS’s were being developed by Microsoft. AMD representatives also did not comment on whether there will be an x86-64 version of Windows XP Home at Clawhammer’s debut later this year.
AMD reiterated that it had a cross-license agreement with Intel that would allow Intel's Yamhill to be x86-64 compatible.
Multi-processing enabled Hammers will be marketed as Opteron. Desktop versions of Hammer will be sold as Athlons. There will be a Clawhammer Athlon and and a Clawhammer Opteron.
Duron will be phased out in favor of Athlon going into 2003.
AMD’s NUMA architecture is so efficient that 1st order NUMA optimizations are not required. However, extra performance can be gained when accessing remote nodes by taking into account NUMA behavior when coding.
===================================
===================================