
IBM
J.
RES.
DEVELOP.
1
Complex CPl, relativity, and adiabatics
In
a previous subsection on complex
CPI,
the discussion
was limited to values of
CPI
in the first quadrant only.
Now that the discussion has turned to power, the other
quadrants in Figure
5
take
on
significant physical interest.
In
particular, points in quadrants
I1
and
Ill
have negative
real components. The only reasonable interpretation of
such points is that they represent the performance of a
processor that is running the program backward.
One possible interpretation of quadrants
I11
and IV,
which have negative complex components, is that they
represent a new paradigm in circuit performance; in
particular, they represent processors that run faster than
the speed of light. According to simple relativistic theory,
when the machine runs faster than light, time moves
backward relative to our inertial frame of reference.
According to this theory, quadrants
I
and
111
are
indistinguishable, since quadrant
111
has the computation
being run in reverse while time moves backward. As such,
quadrant
I11
is uninteresting.
Quadrants
I1
and
IV
are of real interest, particularly
with the recent advent of adiabatic computing. A
processor that can run adiabatically in quadrant I1 acts as
a power source, hence a perpetual motion machine.
In
quadrant IV, if a machine enters an adiabatic realm, it
becomes a black hole. If this happens, it will change the
world as we know it.
10.
Conclusion
In this paper, several points were made that are
antithetical to some of the modern philosophy in
processor microarchitecture. These points are based
on
simple observations relating to the machinations of
electronic von Neumann computers, which have been in
existence since the onset of this industry.
First, the
most
popular performance metric,
IPC
(instructions per cycle), is the reciprocal of the metric that
should be used,
CPI
(cycles per instruction). This is
primarily because
CPI
is a simple dot product of a few
numbers that any experienced designer should have at his
fingertips. It is intuitive, and it makes for remarkably
quick and remarkably accurate estimates.
On
the other hand,
IPC
does not yield to intuition.
Instead, it shrouds fundamental issues in mystery, and it
has much of the industry (and academia) running down
blind corridors
in
a state of general confusion.
Second, the separability of
CPI
into three independent
components was demonstrated. The three components
account for the intrinsic work done by the computer, the
pipeline structure of the computer, and the memory
hierarchy. It was argued that a solid grasp of each of these
three components is necessary in understanding the
performance of a superscalar processor, because the scalar
components are hard bounds for the analogous superscalar
IOL.
41
NO.
3
MAY
1
991
components. Essentially, the argument is that one must
have a grasp of the simple case before one can hope to
understand the general case.
Third, attention was focused on a trend in future
systems in which data bus utilizations cross a threshold
that will make queueing at the memory bus a limitation of
system performance. A new family of bus protocols that
can mitigate this effect was proposed. These protocols will
emerge in the coming decade because of the impending
delays due to queueing.
will drive the development of microarchitecture in
the coming decade, and that the aspects of a
microarchitecture that result in low power also result in
high performance. This is particularly true in
CMOS,
which is a wiring-driven technology. This trend will cause
the client microarchitecture and the server
microarchitecture to converge.
Finally, an argument was made that power consumption
*Trademark or registered trademark of International Business
Machines Corporation.
**Trademark
or
registered trademark of Standard
Performance Evaluation Corporation.
References
1.
2.
3.
4.
5.
6.
7.
8.
J. P. Hayes,
Computer Architecture and Organization,
McGraw-Hill Book Co., Inc., New York, 1988.
J. von Neumann,
Collected
Works,
Vol. 5,
Design
of
Computers, Theory
of
Automata and Numerical Analysis,
The Macmillan Company, New York, 1963.
P. Emma,
J.
Knight, J. Pomerene, R. Rechtschaffen, and
F.
Sparacio, “Components of Uniprocessor Performance,’’
Research Report RC-12203,
IBM Thomas
J.
Watson
Research Center, Yorktown Heights, New York, October
1986.
P.
Emma
and
E. Davidson, “Characterization of Branch
and Data Dependencies in Programs for Evaluating
Pipeline Performance,’’
ZEEE
Trans. Computers
C-36,
R. Tomasulo, “An Efficient Algorithm for Exploiting
Multiple Arithmetic Units,”
ZBM
J.
Res. Develop.
11,
No. 1,
25-33 (January 1967).
J. Liptay, “Computer System with Logic for Writing
Instruction Identifying Data into Array Control Lists for
Precise
Post
Branch Recoveries,” US. Patent 5,134,561,
July 1992.
J. Smith and A. Pleszkun, “Implementation of Precise
Interrupts
in
Pipelined Processors,” presented at
the
12th
Annual International Symposium on Computer
Architecture, June 1986.
G.
Sai-Halasz, “Performance Trends in High-End
Processors,”
Proc.
ZEEE
83,
20-36 (1995).
NO. 7, 859-875 (July 1987).
Received August
8,
1996; accepted for publication
February 19, 1997
231
P.
G.
EMMA