Bits & Bytes: The P4 is Dead
By
Van
Smith
Date: May 13, 2004
Let's hope that
those Intel engineers who gave the thumbs up to Prescott don't
decide to take up bridge building, aircraft construction, and nuclear
power plant design. If they do, then we'll all need to walk
around in radiation-proof-scuba-equipped-skydiving outfits.
As it stands now, these silicon wielding simulacra have managed to
craft the biggest catastrophe in the history of computing.
They've completely wrecked Intel's bread-and-butter desktop
business. They've killed the P4.
===================================
The
Pentium
4 is Dead
Late last week Intel announced that
Tejas, the next Pentium 4 in line after Prescott, has been canceled
because of thermal issues. We exclusively disclosed the existence
of Tejas two years ago. Instead of Tejas, the world's largest
chipmaker will concentrate its efforts on bringing dual-cored Pentium-M
technology to the desktop sometime in 2005.
Faithful readers of these pages will find these developments awfully
familiar. We've been agonizingly enunciating the flaws with Intel's Pentium 4
architecture ever since Willamette's
introduction in 2000, and the very issues we raised were those that led to the
high-profile processor's ultimate demise. Moreover, the course of action
that we have been recommending for Intel is the direction the chipmaker is
finally following. (For background information, good places to
start are
here,
here and
here.)
In a nutshell, HyperPipelining is crazy. While some in Intel have
sung panegyric love songs about sixty stage pipelines sweeping us
tenderly to 10GHz in a few short years, these pundits must have plugged
their ears to the head-splitting racket that thermal density and signal
integrity were raising.
Deep pipelines require high clock speeds to remain performance
competitive with more processing efficient, shallower pipeline
designs. But higher frequencies demand more power and faster
transistors. Yet fast transistors are leaky transistors and they
get leakier with newer, smaller process technologies. So deep
pipelines require relatively higher and higher power as CPUs migrate to
new technology.
And higher frequencies magnify voltage transients, so Vcc has to be
bumped up to ensure signal integrity. But, worst yet, power
requirements rise with the square of voltage, so…
…you end up with Prescott, a new CPU on a smaller process with a
ridiculously deep pipeline that makes Prescott slower than its
predecessor, Northwood, at the same clock speed (despite having twice
the L2 cache - quite a feat in itself). Yet Prescott demands more
power than Northwood and has to dissipate its greater heat output
through a smaller die thus making thermal density rise egregiously.
Consequently, Prescott's >50% deeper pipeline is all for naught
because thermal (and probably signal quality) issues gate
gigahertz. The very sad thing for Intel is that, barring
process/circuit design miracles, Prescott won't be able to reach much
higher frequencies than Northwood.
Speaking of the process and circuit design folks, it's no secret that
Intel has some of the very best in the world. Ever since the
Willamette, the heat has been on them to save Intel's bacon. You
can bet that few tears will be shed among them over the demise of the
P4. With each process shrink, their jobs became more and more
impossibly difficult.
Although Prescott was no doubt a marketing driven monstrosity, at least
one engineer in a place of authority who should have known better made
the boneheaded decision to give it a green light. Crazy, just,
downright crazy.
===================================
Moving Out to Move Up
The future of desktop computing is the production and refinement of
small, out-of-order, superscalar processing cores and then scaling
performance upwards by placing several of these cores on a die.
Multiple cores per die is easy to implement, guarantees well understood
performance gains, and does not pose surprises in the thermal density
or signal integrity arenas.
Processors today are rapidly approaching a limit to practical
computational efficiency. In fact, the AMD Athlon 64 is very near
that asymptote where increasing design complexity is not worth the
meager performance gains.
It is worth noting that, while Intel's Pentium-M and VIA's processors
are following AMD to this asymptote, the Pentium 4 was headed in the
opposite direction; while the thermal ceiling slips downwards with each
process shrink, therefore normalizing clock speeds among CPU design
families, the P4's overall performance was getting shoved into the dirt.
Besides the addition of on-die memory controllers, in the near future,
the only practical areas still open for improving processing efficiency
is through the addition of more/wider SIMD/vector units and hardware
support for specific functionality like VIA's AES implementation in
it's C5P line of chips. Eventually graphics core functionality
will also be moved on-die where arrays of vector units can be
dynamically allocated for either general purpose processing or graphics
rendering.
Small, efficient, modular cores have the added benefit of allowing a
processor vendor to produce a highly scalable product line targeting
everything from handheld gaming devices to behemoth NSA supercomputers.
While it is true that few of today's major software applications will
see any benefit from multi-cored CPUs, you can bet your bottom dollar
that this is going to change rapidly over the next few years for the
simple reason that adding multiple threads to applications is the only
direction to go to get significantly greater performance from emerging
processor designs. And besides, multithreaded applications are
relatively easy to write with modern programming tools.
===================================
In
Through the Out Door
We've always referred to Intel's Pentium-M, the chip maker's current
line of mobile processors derived from the Pentium III, as the design
that the Pentium 4 should have been. Perhaps Intel was listening
to us (well we know that they were reading us) because now PM is
filling the void from the P4 collapse. Intel intends to drive
multi-core Pentium-M designs to the desktop next year. Bully for
them because this is the spot-on right move.
Up until now, Intel has focused its Banias/Dothan (130nm and 90nm
Pentium-M, respectively) design efforts on optimizing for power
efficiency in order to maximize battery life in mobile
applications. Although Dothan's leakage power has gone up a lot
from Banias's, for the most part Pentium-M is a very strong mobile line
- and certainly much, much better for mobile products than Pentium 4
products ever were.
Because of this focus on low power consumption, Pentium-M's floating
point unit, one of the most power hungry parts of the CPU, has been
neglected. It's still a good FPU, but compared with the unit
outfitted to the Athlon 64 (or even the Athlon XP - its FPU is almost
identical to its 64-bit son's), it comes up wanting.
With little additional design effort Intel should be able to produce
desktop Pentium-M derivatives that match the clock speeds of AMD's
parts. Furthermore, bumping the front side bus up to 1GHz should
be no sweat. While this is not as optimal as integrating the DRAM
controller on-board a la the Athlon 64, these two measures alone will
go a long way to providing performance competitive parts. On top
of this, Intel will likely enjoy a significant die size advantage.
However, as long as AMD's 90nm shrink of the Athlon 64 goes as
swimmingly as the company publicly maintains, they should enjoy a
growing performance lead over Intel for about two years. After
that, Intel will likely have refined the old PM-PIII core enough so
that it will be able to stand toe-to-toe with AMD's cores at any given
clock speed.
Of course, Intel would have already been there by now if it hadn't
wasted so much time, money and effort on the P4.
Now is AMD's time to execute on the production side and act swiftly and
decisively to grab market share that is ripe for the plucking.
However, if AMD squanders its lead away, Intel could come back and
squash them like a bug.
Make no mistake, Intel has the resources to whip the Pentium-M into the
same performance league as the Athlon 64, but it will take about two
years to do so. After that, AMD, Intel and even VIA will have
small, modular, out-of-order, superscalar x86 cores that all perform
roughly in the same ballpark.
Beyond this convergence point awaits new territory and new business
challenges as x86 cores will swiftly become commodity items. At
this time Intel's fabrication strengths can play to its advantage, but
only if the chip giant is willing to suffer much lower margins than it
has historically enjoyed.
Only when competitive 3d graphics processing becomes integrated on die
will sufficient product differentiation exist to, at least temporarily,
drive back up profit margins.
VIA, too, can capitalize on Intel's recent missteps. Dothan's
leakage power is quite high compared with Banias' or VIA
Antaur's. Moreover, VIA plans to have a 2GHz part of its own late
this year built with IBM's best 90nm SOI technology. While Antaur
will not come very close to the Dothan's overall performance, VIA's
SSE2 enhanced, security laden chips will offer very, very good
performance per Watt. The 90nm Antaur will also be an inexpensive
drop-in substitute for economical thin and light notebooks leveraging
the Pentium-M chipset infrastructure.
But to get back to Intel, as drastic as it was for the Santa Clara,
Kalifornia-based company to kill off its bread-and-butter line of
desktop processors, it was clearly the right move. Although the
chip peddler will suffer in the performance stakes for the next couple
of years, finally promoting its Pentium-M line over its swiftly
crashing NetBurst architecture is the best course of action for Intel
to make. The chip maker is unquestionably in a much stronger
position now, though near term market share might not always reflect
this.
Although the Pentium 4 (especially Prescott) was one of the biggest
missteps in computing history, we might all need to be grateful for
it. If Intel had followed AMD's lead instead of adopting the
marketing driven P4, Intel's process technology advantages might have
allowed it to create a chip that out-Athloned the Athlon, tipping the
talented but fragile rival into insolvency.
===================================
The
Security Wildcard
Intel and AMD appear to be committed to the so called "Trusted
Computing" road. If these measures are as egregiously invasive as
many fear them to be (try here, here, here, and here
for starters),
consumer backlash could drive people stampeding to VIA's chips which
take a completely antithetical approach to security.
VIA's CPUs provide extremely powerful security functions that allow the
end user to quickly, safely and easily encrypt/decrypt anything:
wireless data transmissions, email, voice communication, individual
files or even complete file systems. These security functions,
which will be expanded significantly in the 90nm Antaur, are exposed as
simple x86 instructions that often provide little, if any, hit to
concurrent processing.
Recently, VIA audaciously released a hardware accelerated WASTE-based
peer-to-peer encrypted file sharing client that also provides encrypted
chat features. What makes this move so stunning is that the
company released this Open Source program at a time when the FBI is
trying to quietly push through laws requiring snooping backdoors on
all
chat and voice-over-IP clients! Not only does this demonstrate
the company's commitment to personal liberty and privacy, but it shows
that VIA has a lot of guts as well.
VIA's security strategy is not without risks, however. By taking
a direction 180 degrees opposite of Trusted Computing, the company
could be left out in the cold if laws are passed mandating the use of
Trusted Computing compliant computers to access the Internet. As
insane as such laws may sound, there has already been a lot of work
done in this direction and, though Sen. Fritz Hollings' Bill failed to
pass, more effort is slowly forming a head.
The best outcome for personal privacy and liberty would be for all
parties to follow VIA's lead.
===================================
The
Creature Maker Visits Austin
Speaking of liberty, an author popular with freedom loving Americans
visited Austin to give a lecture on taking back our country.
Speaking to a standing room only crowd, G. Edward Griffin's lecture
captured the rapt attention of a group of Americans concerned with the
Patriot Act, erosion of the Bill of Rights, biometrics, chip
implants,
corporate media propaganda, computerized voting fraud, increasing
acceptance of torture, declining diversity between major party
political candidates, orchestrated wars, imposition of a world
government, gun control, our country's migration towards becoming a
cashless society and other trends in our nation away from the concepts
of personal liberty and independence that made our country unique and
strong.
You can hear a portion of Mr.
Griffin's lecture here.
Mr. Griffin's most famous work is the riveting treatise on the Federal
Reserve entitled The Creature from Jekyll Island.
===================================
Didn't
We Kill this Guy Already?
The grisly execution of American Nick Berg has been pinned to Jordanian "al Qaeda leader" Abu
Musab al-Zarqawi, yet,
according to reports released back in March, the U.S. killed the
wooden-legged al-Zarqawi in bombing attacks. It's certainly true that the
reports of al-Zarqawi's death were unverified, but
how
did the unfortunate Mr.
Berg end up in his killer's hands so quickly after he was released from apparently
unwarranted coalition
custody?
===================================
Transmeta's
Achilles' Heel
If you've ever used a Transmeta notebook, you have no doubt noticed how
sluggish Transmeta systems can be in everyday operation. The
reason for this is simple. There are severe translation latency
penalties paid under many, many conditions.
Transmeta processors are not native x86 CPUs,
but use an emulator to translate x86 instruction streams into native Transmeta
machine instructions.
First we'll look at the number of clock cycles necessary to multiply a
2,000 element, double precision (floating point) array by an arbitrary
value. GetCycleCount is an assembly language
wrapper for RDTSC.
The overhead for making calling two calls to GetCycleCount is included in all data
presented here (direct inlining yielded the same amount of
overhead). The code for the simple test follows.
function
TfrmMathTest.FPUMulArray: int64;
var
lStart, lStop : TRDTSCTimeStamp;
i :
integer;
begin
if
cboxRandomize.Checked then RandomizeValues;
randomizearray( FPArray );
lStart.int64 := GetCycleCount;
for i
:= low( FPArray ) to high( FPArray ) do begin
FPArray[ i ] := FPArray[ i ] * gamma;
end;
// for
lStop.int64 := GetCycleCount;
result
:= lStop.int64 - lStart.int64;
UpdateValues;
end;
Looking at the initial run, you can see that the
efficeon is
staggeringly poor on this test. The translation penalty that has to be paid each time a new
routine is run can be tremendous. In the chart below, we graph the number
of CPU clock cycles required to run the test discussed immediately above.
Fewer clock cycles (shorter bars) is better.
However, if you execute the same routine many
times, Transmeta processors sometimes cache translated code so that subsequent
runs do not suffer translation penalties. Under certain circumstances the
efficeon is able to cache the simple test above and if it is run again,
performance becomes much better.
The efficeon does not cache all
translations, though. For instance, it doesn't matter how many times we
reran short, nonlooping tests, we always experienced translation penalties.
The chart below illustrates an example.
Translation penalties explain why Transmeta-based
computers are so very sluggish when working through everyday applications.
Because most benchmarks loop over the same routines many times, translation
penalties are diluted. This is why most benchmarks do not reflect real
world usage experience on Transmeta systems.
===================================
New Notebook
I am a compulsive gadget geek. Fortunately for me, Kathy has a soft heart
and tolerates my binges. Last week I bought a new notebook and it is a
brawny beast, especially for the $1,350 with 2 years no interest that we paid.
It is an
eMachines M6809. Below are the specs copied directly from eMachines
web site.
Display: 15.4" Widescreen TFT LCD WXGA (1280 x 800 max. resolution)
Operating System: Microsoft® Windows® XP Home Edition
CPU: Mobile AMD Athlon™ 64
3200+ Processor
64-bit Architecture operates at 2.00 GHz
System Bus uses HyperTransport™
Technology operating at 1600 MHz
1 MB L2 Cache
Memory: 512 MB DDR SODIMM (PC 2700)
Hard Drive: 80 GB HDD
Optical Drives: DVD +/- RW Drive (Write Max: 2.4x DVD+R/RW, 2x DVD-R/RW, 6x
CD-R, 10x CD-RW; Reads 24x CD, 8x DVD)
Media Reader: 6-in-1 Digital Media Manger (Compact Flash, Micro Drive,
MultiMedia Card, Secure Digital (SD), Memory Stick, Memory Stick Pro)
Video: ATI® Mobility RADEON™ 9600 with 64 MB Video RAM
Sound: PC2001 Compliant AC '97 Audio
Built-in Stereo Speakers
Modem: 56K* ITU V.92 Fax/Modem
Network: 802.11g Built-in Wireless (up to 54Mbps), 10/100Mbps built-in
Ethernet
Pointing Device: Touchpad with Vertical Scroll Zone
Battery: 8-cell Lithium-ion (Li-ion)
Dimensions: 1.6"h x 14.0"w x 10.4"d Weight: 7.5 lbs. (8.65 total travel
weight)
Internet: AOL 3 month membership included,
click here for details
Ports/Other: 4 USB 2.0 ports, 1 IEEE 1394, 1 VGA External Connector, 1 S-Video
Out, Microphone In, Headphone/Audio Out, 1 PCMCIA Slot (Card Bus type I or
type II)
Pre-Installed Software: Microsoft Works 7.0, Microsoft Money 2004, Encarta
Online, Adobe® Acrobat® Reader™, Microsoft Media Player, Real Player, PowerDVD,
Internet Explorer, Roxio Easy CD & DVD Creator (DVD Edition), BigFix®, MSN®,
CompuServe®, AOL (with 3 months membership included**), Norton AntiVirus 2004
(90
day complimentary subscription)
Yeah, it's real fast. If I get
some free time, I might write up a review. For the time being, I've been
very pleased.
===================================
Copyright 2004, Van Smith
===================================