Nuclear fusion is a process that creates energy by smashing two atoms together. Scientists believe a fusion-based power plant could supply almost limitless energy while creating very little waste.
The problem: No one knows whether it’s safe to create the superhot gas needed to trigger fusion.
“These problems all involve complex underlying equations,” Bhattacharjee says. “We need exascale simulations in order to be able to make predictions in these systems where there’s a lot going on.”
“Reaching exascale means achieving over 10x improvement in computing capacity, without increasing the power consumption, in less than five years”
Building an exascale computer isn’t simply a matter of adding more processing cores to a system, the classic path to more speed. At a certain point, computers stop being able to move data from memory to processors, and from processor to processor, fast enough to take advantage of the raw power.
One of the biggest challenges: moving data around a supercomputer uses a lot of power and generates enormous heat. An exascale system built with today’s technology would consume about 650 megawatts of power—or a little less than a small nuclear power plant generates.
Memory and processors together
Each processor in a computer requires a dedicated chunk of memory, called its working memory, to perform calculations. The communication bandwidth between the processor and its working memory has a big impact on overall system speed. Today’s systems generally house memory in components called DIMMs, or double in-line memory modules, which connect to the processors via copper wiring in the circuit board. Exascale systems will need to move the vast majority of working memory far closer to the computing elements, in an organization called “co-packaged memory”, where the fastest memory tier is designed inside the physical package that also contains the processor chip.
“An exascale system built with today’s technology would consume about 650 megawatts of power—or a little less than a small nuclear power plant generates.”
The HPE approach is straightforward: Co-packaged components mean data travels only a short distance between processor and memory, rather than moving a longer distance between separated components. This silicon-connected memory replaces DIMMs and also eliminates the overhead of attaching routing information to the data to notify the system where that data is headed.
4 places data movement slows down today's computers
Silicon traffic cops
Data moves between processors and other components as they coordinate their work on big calculations. Inside a large-scale supercomputer this movement is directed by fabric routers, chips that act like traffic cops for data. Without efficient routing, data can be sent on a circuitous path or collide with other data trying to use the same path. Routers are also responsible for tolerating network failures by dynamically adjusting paths to go around faults.
The inner topology
Routers are part of a broader data transmission fabric inside the system. The pattern in which these connections are laid out is called a topology, and the topology determines how efficiently data inside a system reaches its destination.
Optical components and fabric
Copper has served faithfully as the conductor of data—transmitted as an electrical signal—since the 1940s. At exascale computing speeds, though, copper’s shortcomings move to the forefront. The more data, the more electricity required to move it.
“In earlier times, there were two pillars of scientific discovery: experiment and theory. Now computer simulation is becoming the third pillar”
In current optical architectures, each transmission cable needs its own VCSEL, and the system needs a huge number of cables to achieve a resilient optical fabric of interconnected components.
Wait, there’s more
Exascale computing, and data movement in particular, presents plenty of other challenges. For example, sometimes the most efficient solution is to not move data at all. Instead of copying a big block of data from one area of memory to be closer to a distant processor, an exascale system might sometimes just want to tell that processor where the data is located—giving it the right address—and let the processor order up a specific calculation based on that data without moving the data itself.
To address this problem, Labs has designed a huge pool of energy-efficient nonvolatile memory that can hold frequent snapshots of work in progress across the whole system—a process known as ‘checkpointing.’ If a processor fails, the system can quickly restore the last snapshot and proceed, rather than having to restart a complex computation from the beginning. For this purpose, HPE is taking advantage of many of the technologies that were prototyped in The Machine, a new system based on the Memory-Driven Computing concept.
“Simulation is the bridge between what we know today and what we can predict in the future. And simulation is what exascale computing is all about.”