Computer chips are a hot commodity. Nvidia is now one of the most valuable companies in the world, and Nvidia’s Taiwanese chip maker, TSMC, has been called a geopolitical force. So it should come as no surprise that a growing number of hardware startups and established companies are trying to get a gem or two from the crown.
Of these, Cerebras is one of the strangest. The company makes tortilla-sized computer chips with nearly a million processors, each connected to its own local memory. Processors are small but lightning fast because they don’t transfer information to and from shared memory located far away. And the connections between processors—which in most supercomputers require interconnecting individual chips across room-sized machines—are fast, too.
This means that the chips are great for specific tasks. Recent preprint studies of two of these—one simulating molecules and the other training and running large language models—show that the wafer-scale advantage can be formidable. The chips outperformed Frontier, the world’s top supercomputer, in the first instance. They also showed that the AI model can use a third of the usual power without sacrificing performance.
Table of Contents
ToggleMolecular matrix
The materials from which we make things are a fundamental driver of technology. They open up new possibilities by overcoming old limits of strength or heat resistance. Take fusion energy. If scientists can prove it will work, the technology promises to be a new, clean source of energy. But liberating this energy requires materials that can withstand extreme conditions.
Scientists are using supercomputers to model how the metals that line fusion reactors might deal with the heat. These simulations zoom in on individual atoms and use the laws of physics to govern their movements and interactions on a large scale. Today’s supercomputers can model materials containing billions or even trillions of atoms with high accuracy.
But while the scope and quality of these simulations have advanced greatly over the years, their speed has stalled. Because of the way supercomputers are designed, they can only model so many interactions per second, and making the machines bigger only makes this problem worse. This means that the total length of molecular simulations has a hard practical limit.
Cerebras worked with Sandia, Lawrence Livermore and Los Alamos National Laboratories to see if a chip in wafer form could speed things up.
The team allocated one simulated atom to each processor. So they could quickly exchange information about their position, motion and energy, processors modeling atoms that would be physically close in the real world were also neighbors of the chip. Depending on their properties at the time, atoms could jump between processors as they moved.
The team modeled 800,000 atoms in three materials – copper, tungsten and tantalum – that could be useful in fusion reactors. The results were quite stunning, with tantalum simulations yielding a 179x speedup over the Frontier supercomputer. This means the chip could cut a year’s worth of work on a supercomputer into a matter of days, and significantly increase the length of a simulation from microseconds to milliseconds. It was also much more efficient at the task.
“I have been working in atomistic simulation of materials for more than 20 years. During that time I have been involved in massive improvements in both the size and accuracy of the simulations. However, despite all this, we were not able to increase the actual simulation rate. The wall clock time required to run simulations has barely been budgeted for the past 15 years,” Sandia National Laboratories’ Aidan Thompson said in a statement. “With the Cerebras Wafer-Scale Engine, we can suddenly go hypersonic.
Although the chip increases modeling speed, it cannot compete in scale. The number of simulated atoms is limited to the number of processors on the chip. Next steps include assigning more atoms to each processor and using new wafer-scale supercomputers to link the 64 Cerebras systems together. The team estimates that these machines could model up to 40 million tantalum atoms at speeds similar to those reported in the study.
AI light
While simulating the physical world might be a core competency for wafer-scale chips, they have always focused on artificial intelligence. The latest AI models have grown exponentially, meaning the energy and cost of training and running them has exploded. Wafer-scale chips may be able to make AI more efficient.
In a separate study, researchers from Neural Magic and Cerebras worked to reduce the size of Llama Meta’s 7 billion parameter language model. To achieve this, they created what is called a “sparse” model of artificial intelligence, where many parameters of the algorithm are set to zero. In theory, this means they can be skipped, making the algorithm smaller, faster and more efficient. But today’s leading AI chips—called graphics processing units (or GPUs)—read algorithms piecemeal, meaning they can’t skip over every reset parameter.
Since the memory is distributed across the wafer-scale chip, it be able to read each parameter and skip zeros wherever they occur. Even so, extremely sparse models usually do not perform as well as dense models. But here the team found a way to restore lost performance with a little extra training. Their model maintained performance—even when 70 percent of the parameters were zeroed out. Running on the Cerebras chip, it drew a paltry 30 percent of the power and ran in a third of the time of the full-size model.
Will Wafer-Scale Win?
While all of this is impressive, Cerebras is still niche. Nvidia’s more conventional chips remain firmly in control of the market. At least for now, that doesn’t seem to be changing. Companies have invested heavily in the expertise and infrastructure built around Nvidia.
But wafer-scale can continue to prove itself in marginal but still crucial applications in research. And it may be that this approach becomes more common overall. The ability to produce wafer-scale chips is only now being perfected. In a sign of what’s to come for the field as a whole, the world’s largest chip maker, TSMC, recently said it is developing its wafer-scale capabilities. This could make chips more common and capable.
For their part, the team behind the molecular modeling work says the wafer-scale effect could be more dramatic. Like GPUs before them, adding wafer-scale chips to the supercomputing mix could yield formidable machines in the future.
“Future work will focus on extending the strong scaling efficiency demonstrated here to device-level deployments, potentially leading to an even bigger paradigm shift in the top 500 supercomputers than the one introduced by the GPU revolution,” the team wrote in their paper.
Image credit: Cerebras