Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?

Become or hire the top 3% of the developers on Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Underwater World

--

Chapters
00:00 Question
03:12 Accepted answer (Score 145)
12:07 Answer 2 (Score 15)
14:29 Answer 3 (Score 9)
16:20 Answer 4 (Score 6)
18:18 Thank you

--

Full question
https://superuser.com/questions/797454/w...

Question links:
[The
Free Lunch Is Over: A Fundamental Turn Toward Concurrency in
Software]: http://www.gotw.ca/publications/concurre...

Accepted answer links:
[this question]: https://electronics.stackexchange.com/qu...
[fuzzyhair2's answer]: https://electronics.stackexchange.com/a/...
[this AnandTech forum thread]: http://forums.anandtech.com/showthread.p...?
[Idontcare]: http://forums.anandtech.com/member.php?
[Idontcare]: http://forums.anandtech.com/member.php?
[overclocking world record]: http://valid.canardpc.com/records.php
[instruction pipelining]: http://en.wikipedia.org/wiki/Instruction...

Answer 2 links:
[Pentium D]: http://en.wikipedia.org/wiki/Pentium_D

Answer 4 links:
[related answer]: https://superuser.com/a/260029/194694

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#cpu #cpuarchitecture

#avk47

ACCEPTED ANSWER

Score 145

Summary

Economics. It's cheaper and easier to design a CPU that has more cores than a higher clock speed, because:
Significant increase in power usage. CPU power consumption increases rapidly as you increase the clock speed - you can double the number of cores operating at a lower speed in the thermal space it takes to increase the clock speed by 25%. Quadruple for 50%.
There's other ways to increase sequential processing speed, and CPU manufacturers make good use of those.

I'm going to be drawing heavily on the excellent answers at this question on one of our sister SE sites. So go upvote them!

Clock speed limitations

There are a few known physical limitations to clock speed:

Transmission time

The time it takes for an electrical signal to traverse a circuit is limited by the speed of light. This is a hard limit, and there is no known way around it¹. At gigahertz-clocks, we are approaching this limit.

However, we are not there yet. 1 GHz means one nanosecond per clock tick. In that time, light can travel 30cm. At 10 GHz, light can travel 3cm. A single CPU core is about 5mm wide, so we will run into these issues somewhere past 10 GHz.²
Switching delay

It's not enough to merely consider the time it takes for a signal to travel from one end to another. We also need to consider the time it takes for a logic gate within the CPU to switch from one state to another! As we increase clock speed, this can become an issue.

Unfortunately, I'm not sure about the specifics, and cannot provide any numbers.

Apparently, pumping more power into it can speed up switching, but this leads to both power consumption and heat dissipation issues. Also, more power means you need bulkier conduits capable of handling it without damage.
Heat dissipation/power consumption

This is the big one. Quoting from fuzzyhair2's answer:

Recent processors are manufactured using CMOS technology. Every time there is a clock cycle, power is dissipated. Therefore, higher processor speeds means more heat dissipation.

There's some lovely measurements at this AnandTech forum thread, and they even derived a formula for the power consumption (which goes hand in hand with heat generated):

^{Credit to Idontcare}

We can visualise this in the following graph:

^{Credit to Idontcare}

As you can see, power consumption (and heat generated) rises extremely rapidly as the clock speed is increased past a certain point. This makes it impractical to boundlessly increase clock speed.

The reason for the rapid increase in power usage is probably related to the switching delay - it's not enough to simply increase power proportional to the clock rate; the voltage must also be increased to maintain stability at higher clocks. ^{This may not be completely correct; feel free to point out corrections in a comment, or make an edit to this answer.}

More cores?

So why more cores? Well, I can't answer that definitively. You'd have to ask the folks at Intel and AMD. But you can see above that, with modern CPUs, at some point it becomes impractical to increase clock speed.

Yes, multicore also increases power required, and heat dissipation. But it neatly avoids the transmission time and switching delay issues. And, as you can see from the graph, you can easily double the number of cores in a modern CPU with the same thermal overhead as a 25% increase in clock speed.

Some people have done it - the current overclocking world record is just shy of 9 GHz. But it is a significant engineering challenge to do so while keeping power consumption within acceptable bounds. The designers at some point decided that adding more cores to perform more work in parallel would provide a more effective boost to performance in most cases.

That's where the economics come in - it was likely cheaper (less design time, less complicated to manufacture) to go the multicore route. And it's easy to market - who doesn't love the brand new octa-core chip? (Of course, we know that multicore is pretty useless when the software doesn't make use of it...)

There is a downside to multicore: you need more physical space to put the extra core. However, CPU process sizes constantly shrink a lot, so there's plenty of space to put two copies of a previous design - the real tradeoff is not being able to create larger, more-complex, single cores. Then again, increasing core complexity is a bad thing from a design standpoint - more complexity = more mistakes/bugs and manufacturing errors. We seem to have found a happy medium with efficient cores that are simple enough to not take too much space.

We've already hit a limit with the number of cores we can fit on a single die at current process sizes. We might hit a limit of how far we can shrink things soon. So, what's next? Do we need more? That's difficult to answer, unfortunately. Anyone here a clairvoyant?

Other ways to improve performance

So, we can't increase the clock speed. And more cores have an additional disadvantage - namely, they only help when the software running on them can make use of them.

So, what else can we do? How are modern CPUs so much faster than older ones at the same clock speed?

Clock speed is really only a very rough approximation of the internal workings of a CPU. Not all components of a CPU work at that speed - some might operate once every two ticks, etc..

What's more significant is the number of instructions you can execute per unit of time. This is a far better measure of just how much a single CPU core can accomplish. Some instructions; some will take one clock cycle, some will take three. Division, for example, is considerably slower than addition.

So, we could make a CPU perform better by increasing the number of instructions it can execute per second. How? Well, you could make an instruction more efficient - maybe division now takes only two cycles. Then there's instruction pipelining. By breaking each instruction into multiple stages, it's possible to execute instructions "in parallel" - but each instruction still has a well-defined, sequential, order respective to the instructions before and after it, so it doesn't require software support like multicore does.

There is another way: more specialised instructions. We've seen things like SSE, which provide instructions to process large amounts of data at one time. There are new instruction sets constantly being introduced with similar goals. These, again, require software support and increase complexity of the hardware, but they provide a nice performance boost. Recently, there was AES-NI, which provides hardware-accelerated AES encryption and decryption, far faster than a bunch of arithmetic implemented in software.

¹ Not without getting quite deep into theoretical quantum physics, anyway.

² It might actually be lower, since electrical field propagation isn't quite as fast as the speed of light in a vacuum. Also, that's just for straight-line distance - it's likely that there's at least one path that's considerably longer than a straight line.

ANSWER 2

Score 15

Physics is physics. We can't keep packing more transistors into ever smaller spaces forever. At some point it gets so small that you deal with weird quantum crap. At some point we can't pack twice as many transistors in a year as we used to (which is what moore's law is about).

Raw clockspeeds mean nothing. My old Pentium M was about half the clock speed of a contemporary desktop CPU (and yet in many respects faster) – and modern systems are barely approaching the speeds of systems 10 years ago (and are clearly faster). Basically 'just' bumping up the clockspeed does not give real performance gains in many cases. It may help in some singlethreaded operations, but you're better off spending the design budget on better efficiency in terms of everything else.

Multiple cores let you do two or more things at once, so you don't need to wait for one thing to finish for the next one. On the shorter term, you can simply pop two existing cores into the same package(for example with the Pentium Ds, and their MCM, which was a transitional design) and you have a system that's twice as fast. Most modern implementations do share things like a memory controller of course.

You can also build smarter in different ways. ARM does Big-Little – having 4 'weak' low power cores working alongside 4 more powerful cores so you have the best of both worlds. Intel lets you down throttle (for better power efficency) or overclock specific cores (for better single thread performance). I remember AMD does something with modules.

You can also move things like memory controllers (so you have lower latency) and IO related functions (the modern CPU has no north bridge) as well as video (which is more important with laptops and AIW design). It makes more sense to do these things than 'just' keep ramping up the clockspeed.

At some point 'more' cores may not work – though GPUs have hundreds of cores.

Multicores as such lets computers work smarter in all these ways.

ANSWER 3

Score 9

Simple answer

The simplest answer to the question

Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?

is actually found within another part of your question:

I would expect the conclusion to be "therefore, we'll have to have bigger computers or run our programs on multiple computers."

In essence, multiple cores is like having multiple "computers" on the same device.

Complex answer

A "core" is the part of the computer that actually processes instructions (adding, multiplying, "and"ing, etc). A core can only execute a single instruction at one time. If you want your computer to be "more powerful" there are two basic things you can do:

Increase throughput (increase clock rate, decrease physical size, etc)
Use more cores in the same computer

The physical limitations to #1 are primarily the need to dump heat caused by the processing and the speed of an electron in the circuit. Once you split off some of those transistors to a separate core, you alleviate the heat issue to a large degree.

There's an important limitation to #2: you have to be able to split your problem up into multiple independent problems, and then combine the answer. On a modern personal computer, this isn't really a problem, as there are loads of independent problems all vying for computational time with the core(s) anyway. But when doing intensive computational problems, multiple cores only really help if the problem is amenable to concurrency.

ANSWER 4

Score 5

Long story short: Speeding up single cores has reached its limits, so we keep shrinking them and adding more of them, until this reaches its limits or we can change to better materials (or achieve a fundamental breakthrough that overthrows the established tech, something like home-sized, actually working, quantum computing).

I think this problem is multi-dimensional and it will take some writing to paint the more complete picture:

Physical limitations (imposed by actual physics): Like speed of light, quantum mechanics, all that.
Manufacturing problems: How do we manufacture ever smaller structures with the needed precision? Raw material related problems, materials used to build circuits usw., durability.
Architectural problems: Heat, inference, power consumption etc.
Economical problems: What's the cheapest way to get more performance to the user?
Usecases and user perception of performance.

There may be many more. A multipurpose CPU is trying to find a solution to scramble all these factors (and more) into one, mass-producible chip that fits 93% of the subjects on the market. As you see, the last point is the most crucial one, customer perception, which is directly derived from the way the customer uses the CPU.

Ask yourself what is your usual application? Maybe: 25 Firefox tabs, each playing some ads in the background, while you are listening to music, all while waiting for your build job you started some 2 hours ago to finish. That is a lot of work to be done, and still you want a smooth experience. But your CPU can handle ONE task at the time! On single thing. So what you do is, you split things up and make a looong queue and every one gets his own share and all are happy. Except for you because all the things become laggy and not smooth at all.

So you speed your CPU up, in order to do more operations in the same amount of time. But as you said: heat and power consumption. And that's where we come to the raw material part. Silicon becomes more conductive as it gets hotter, meaning more current flows through the material as you heat it up. Transistors have a higher power consumption as you switch them faster. Also high frequencies make crosstalk between short wires worse. So you see, the speed things up approach will lead to a "meltdown". As long as we do not have better raw materials than silicon or much better transistors, we are stuck where we are with single core speed.

This gets us back to where we started. Getting stuff done, in parallel. Let's add another core. Now we can actually do two things at one time. So let's cool things down a bit and just write software that can split its work over two, less powerful but more functional cores. This approach has two main problems (besides that it needs time for the software world to adapt to it): 1. Make the chip larger, or make individual core smaller. 2. Some tasks simply cannot be split into two parts that run simultaneously. Keep adding cores as long as you can shrink them, or make the chip larger and keep the heat problem at bay. Oh and let's not forget the customer. If we change our usecases, the industries have to adapt. See all the shiny "new" things the mobile sector has come up with. That is why the mobile sector is considered so crucial and everyone wants to get their hands on it.

Yes, this strategy WILL reach its limitations! And Intel knows this, that's why they say the future lies somewhere else. But they will keep doing it as long as it is cheap and effective and doable.

Last but not least: physics. Quantum mechanics will limit chip shrinking. Speed of light is not a limit yet, since electrons cannot travel at the speed of light in silicon, actually it's much slower than that. Also, it is the impulse speed that puts the hard cap on the speed offered by a material. Just as sound travels faster in water than in air, electric impulses travel faster in, for example, graphene than in silicon. This leads back to raw materials. Graphene is great as far as its electrical properties go. It would make a much better material to build CPUs of, unfortunately it is very hard to produce in large quantity.