Today I read that Graphcore, the AI chip maker from the UK, unveiled a new computer chip that packs a remarkable 60 billion transistors and almost 1,500 processing units into a single silicon wafer. This Bristol based “startup”, founded in 2016 and now valued at $2bn, is taking on Nvidia with a new chip designed specifically for running advanced AI algorithms.
However big or small a computer chip is in actual physical size, 60 billion transistors sounds like a lot. And it is. But what does this stunning transistor count mean in practice?
First of all it’s good to remember that while computing power has grown exponentially, the basic architecture of computer chips, such as central processing units (CPUs), hasn’t changed much during the past 65 years. In other words, the form, design, and implementation of chips have changed over time, but their fundamental operation remains almost unchanged. This will continue to be the case until we adopt a new processor architecture, for example through quantum processors.
A “traditional” processor is simply “the electronic circuitry within a computer that executes instructions that make up a computer program” and traditional computing relies on zeros and ones. Transistors are essential in processors because they work as switches. A chip can contain hundreds of millions or even billions of transistors, each of which can be switched on or off individually. Since each transistor can be in two distinct states, it can store two different numbers, zero and one.
The basic rule is that with more transistors, a processor can perform increasingly more complicated instructions than before. That in turn results in several benefits, such as faster processing speeds and increased memory capacity.
Many are familiar with Moore’s law, which often is heard in its simplified version, i.e. that processor speeds, or overall processing power for computers, will double every two years. In reality, Moore’s law is the observation that the number of transistors in a dense integrated circuit (IC, or a “chip”) doubles about every two years. Here’s what Wikipedia has to say about it:
The observation is named after Gordon Moore, the co-founder of Fairchild Semiconductor and CEO and co-founder of Intel, who in 1965 posited a doubling every year in the number of components per integrated circuit, and projected this rate of growth would continue for at least another decade. In 1975, looking forward to the next decade, he revised the forecast to doubling every two years, a compound annual growth rate (CAGR) of 40%. While Moore did not use empirical evidence in forecasting that the historical trend would continue, his prediction held since 1975 and has since become known as a "law”.
Let’s take a look at how the transistor count in processors has evolved up until 2019. Whereas chips in the 1970s only had a few thousand transistors, the 1 billion mark was hit in 2006 – and now we’re indeed packing 60 billion transistors into a chip.
It’s seems clear that at some point we’ll hit physical limitations when it comes to transistor scaling, despite our best efforts to come up with new methods for packing stuff even denser on a silicon wafer. As a matter of fact, processor architects reported already 10 years ago that semiconductor advancement has slowed industry-wide below the pace predicted by Moore's law. Brian Krzanich, former CEO of Intel, also noted that “our cadence today is closer to two and a half years than two”.
So, what can you do with 60 billion transistors packed on the new Graphcore chip, which they creatively call an “intelligence processing unit” (IPU)? The short answer is: you can execute even more advanced deep learning algorithms and faster than before. According to the article about Graphcore:
The second generation of AI chips are designed specifically to handle the very large machine-learning models that are being used for breakthroughs in image processing, natural language processing, and other fields. For instance, San Francisco AI research company OpenAI's latest language model, called GPT-3, takes in 175 billion different variables.
Graphcore also said that in benchmark tests, its new chips performed up to 16-times faster than those from Nvidia, whose graphics processing units (GPUs) are widely used in AI and machine learning (ML) solutions. GPUs turned out to be excellent for training deep learning models and much of the progress we’ve made in AI/ML during the past 10 years has come thanks to cheaper and faster chips by Nvidia and others.
It’ll be exciting to see what kind of ML magic people will be able to do with these new Graphcore chips. One also has to wonder how long it will take before we’re able to pack 120 billion transistors in a chip. Might be that we’ll have to wait longer than those two and a half years mentioned by Krzanich.