Moore, Dennard and Amdahl

This is my reseach and writing about Moore, Dennard and Amdahls laws and how they effect computing and software engineering. Mostly, I just find this stuff interesting.

We don’t throw away computer because they’re slow anymore, we throw them out because they’re broken or don’t have a feature.

Moore’s Law

the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years.

Dennard Scaling

as transistors get smaller, their power density stays constant, so that the power use stays in proportion with area; both voltage and current scale (downward) with length.

Has to do with power and efficiency growth as transistors get smaller and smaller, their power needs get less and less. So you can get equivalent performance with less power. this is a power law. This has started to breakdown though due to Static Power Dissipation.

Silvano Gai has a great write up on Dennard Scaling, that ventures into the practical as opposed to the theoretical.

Gai also has a book that looks really interesting: Building a Future-Proof Cloud Infrastructure: A Unified Architecture for Network, Security, and Storage Services.

Dennard scaling relates to Moore’s law by claiming that the performance per watt of computing grows exponentially at roughly the same rate. Dennard scaling, also known as MOSFET scaling, is based on a 1974 paper co-authored by Robert H. Dennard, a researcher at IBM. Dennard Scaling postulated that as transistors get smaller their power density stays constant, so that the power use stays in proportion with area. This allowed CPU manufacturers to raise clock frequencies from one generation to the next without significantly increasing overall circuit power consumption.

Static vs Dynamic Power Dissipation (loss)

Power is dissipated in a chip in 2 categories: dyanamic and static.

  • Dynamic Power dissipation: happens when a transistor switches, it’s the cost of doing business.
  • Static power dissipation: is all the other little things that can happen, unintentionally, which leads to heat dissipation. This static power dissipation is the core issues with the Moore’s law and Dennard Scaling law limitations.

threashold leakage current: it’s off, but it’s still dissipating power.

Dennard scaling relates to Moore’s law by claiming that the performance per watt of computing grows exponentially at roughly the same rate. Dennard scaling, also known as MOSFET scaling, is based on a 1974 paper co-authored by Robert H. Dennard, a researcher at IBM. Dennard Scaling postulated that as transistors get smaller their power density stays constant, so that the power use stays in proportion with area. This allowed CPU manufacturers to raise clock frequencies from one generation to the next without significantly increasing overall circuit power consumption.

He observed that voltage and current should be proportional to the linear dimensions of a transistor; thus, as transistors shrank, so did voltage and current. Because power is the product of voltage and current, power dropped with the square. On the other hand, the area of the transistors dropped with the square, and the transistor count increased with the square. The two phenomena compensated each other. Dennard scaling ended around 2004 because current and voltage couldn’t keep dropping while still maintaining the dependability of integrated circuits, and the leakage current and threshold voltage became the dominant factors in establishing a power baseline per transistor

The key effect of Dennard scaling was that as transistors got smaller the power density was constant – so if there was a reduction in a transistor’s linear size by 2, the power it used fell by 4 (with voltage and current both halving).

a scaling law which states roughly that, as transistors get smaller, their power density stays constant, so that the power use stays in proportion with area; both voltage and current scale (downward) with length.

as transistors shrank, so did necessary voltage and current; power is proportional to the area of the transistor.

as the size of the transistors shrunk, and the voltage was reduced, circuits could operate at higher, frequencies at the same power

Dennard scaling ignored the “leakage current” and “threshold voltage”, which establish a baseline of power per transistor.

Power Density?

Dennard Scaling Breakdown

Thermal Noise and Voltage

Became hard to scale the voltage due to thermal noise. As transisotrs got smaller, the voltage to signal them got smaller, but this cna only work up to a point. An electon at room temperature started to flip transistors, which makes an unreliable transistor.

An electron at room temperature has a certain amout of voltage which is about 25mV. Well the transistors started to get small enough that the voltage required to flip them was starting to collide with the ambient voltage of electroncs. 25mV. This made transistors start to turn on and off randomly.

\(kT/q = 25\text{mV at room temperature}\)

Amdahl’s Law

A formula for determining how much speedup can I get from parallelization?

Amdahl’s law is all about speedup by doing more work at the same time. Parallelization.

\(S(N)=\frac{1}{(1-P)+(P/N)}\)

Ew. Math. Gross.

The Amdahl’s Law inequality is sometimes treated as a performance model, instead of a performance limit. It’s not a very good performance model, since it ignores so many details, but as a limit it provides the same value as peak performance, that is, the performance (or speedup) that you cannot exceed. However, is it still a good limit? Should we still be comparing our performance against the speedup limit?

Amdahl and Heterogenous Computing

Heterogenous computing changes things. With heterogenous computing you’re saying that I don’t just have a CPU and I run all my workloads on that one chip. I have a GPU, so I can offload graphics processing to that. I have a Video encoder chip, so I can do video processing on that special chip.

Amdahl assumes you’re running your new multicore code, on the multiple of the same original chip. But maybe now you’re able to offload that computation to a special chip that solves that particular problem really well. Ex: GPU for graphics or a TPU for AI workloads.

Gustafson’s Law

Sun and Ni’s Law

Resources


Related Notes