In this edition of Voices of the Industry, Eric Jacobs, Chief Revenue Officer of Aligned, shares insights on how data centers can overcome the primary obstacle to densification – cooling.
Two decades ago, graphics processing units, or GPUs, were used primarily to accelerate the rendering of 3D graphics for applications such as video gaming. However, it wasn’t before long that computer scientists realized that GPUs showed immense promise in solving some of the world’s most intractable computing problems. Thereafter, developers began to tap the power of GPUs to dramatically accelerate workloads in artificial intelligence (AI), machine learning (ML), and deep learning, as well as high performance computing (HPC).
A central processing unit (CPU) processes computations sequentially, not in parallel, and whereas a CPU may have as many as 16 or 17 processing cores, a GPU might have thousands running simultaneously and performing parallel operations on multiple sets of data, with each core focused on making an efficient calculation.
GPUs may be integrated into a computer’s CPU or offered as a discrete hardware unit. In fact, many of today’s deep learning technologies rely on GPUs working in conjunction with CPUs. While GPUs deliver an extraordinary amount of computational capability and acceleration of workloads, that added processing power comes at the cost of additional energy consumption and heat creation. Working together, CPUs and GPUs consume significantly more power and generate far more heat than most data centers were designed to accommodate.
A single GPU-accelerated server can produce more than 3 kW of heat, and depending on the application, some servers are packed with as many as 16 GPUs. Legacy data centers offer only one type of solution, 10kW per rack, and therein lay the GPU density crisis — a challenge that extends from supporting densely packed GPU-accelerated servers, to mission-critical testing environments and R&D laboratories, and the simulation and emulation testing platforms that are indispensable to the development of HPC, AI/ML and deep learning applications.
AI: Changing the World and the Legacy Data Center
Whether a developer is designing a new System on Chip (SoC) device, autonomous vehicle technology, or deep learning-based image recognition platform for early cancer detection, these applications are first tested by emulators in laboratory environments that run a tremendous number of computations to verify, validate, and debug the design.
To succeed in the verification task, an emulator must provide a wide range of capacity and scalability to accommodate the next generations of designs. Moreover, as designs scale up, the emulation platform needs to scale upwards in capacity without compromising its performance, which is necessary to execute AI and ML frameworks and benchmarks. Emulation platforms, for example, have become essential to the design of next generation GPUs for the gaming market, as well as SoC units for the mobile computing and automotive markets.
Mission-critical testing and R&D laboratories deploying these types of testing processes are typically high density environments. And even as advances in AI and GPUs are transforming yet another corner of our world, among the racks and aisles of the data center, every day, they’re also the cause of very steep technical challenges.
“Traditional data centers are over-challenged because many are configured at a static density. That is, the space layout, power and cooling systems are designed to support a certain density per square foot.” Eric Jacobs, Chief Revenue Officer of Aligned
To alter the density the footprint can support requires reconfiguring (and usually horizontally extending) the layout, including power and cooling systems setup. While most data centers can support high density servers, the only way to ensure proper cooling is to either over-provision, half-fill the racks, or spread them apart, all of which creates stranded capacity, reduces efficiency and increased TCO. For these reasons, the foundational challenge of increasing power density within the data center is solving for cooling.
Solving the High-Density Dilemma
To solve for this challenge, data center infrastructure must be more adaptive, supporting density changes (both unknown and planned) with a flexible, non-exotic cooling solution and efficient cost structure.
In order to successfully address rising data center densification, it is key to understand that the data center cooling problem is actually a heat removal problem. Rather than the traditional solution of pushing cold air into the data hall, data centers can now leverage advanced cooling technologies that captures and removes heat at their source. This approach takes far less energy than making outside air cold and blowing it into the data center to mix with the hot air. Such a cooling system should be able to accommodate both new data centers and retrofit facilities, improve the efficiency of existing infrastructure, and enable companies and laboratory environments to expand on demand.
For starters, a mechanical, electrical and plumbing (MEP) design that decouples space from power and can support density increases vertically within the rack or horizontally with additional rack positions, by supplementing the data hall with additional cooling units (in this case fan arrays) without impacting live load. For more effective use of data center space, infrastructure should also support higher watts per square foot of capacity without requiring expensive and space-hogging Computer Room Air Handlers (CRAHs). For example, Aligned’s cooling technology provides 350 kW of heat rejection in just four feet, compared to 38 feet of traditional CRAH units.
The key is utilizing tight airflow controls to maintain enough of a negative pressure in the hot aisles that both high density and mixed density racks can be cooled efficiently. Having a cooling system that dynamically adapts to IT loads and supports high, mixed, and variable rack densities in the same row, ranging anywhere from of 1-50kW, can provide a hyper-scalable and ultra-efficient environment.
Being able to effectively cool these high-density environments is a critical element. Just as critical, however, is the ability to efficiently expand them as your business and technology demand it. The way to address this is by having a modular cooling system; as density increases, additional cooling units should be added to increase cooling capacity on demand without impacting live load. With most legacy data center operators, a customer would need to build or purchase more data hall space if more power capacity were needed. With this alternative approach, a lab environment has the ability to densify the same footprint over time without stranding unused space or capacity. This would allow a testing and R&D laboratory environment to start at a lower density, perhaps 100W per square foot, and in time densify in the same footprint, for example, 400W, upwards to 1,000W per square foot.
This type of cooling design is also more efficient than traditional systems, delivering lower energy costs, utilizing far less (and sometimes zero) water, and lowering power usage effectiveness (PUE) ratings to industry-leading figures.
While the GPU has been the workhorse of AI — that wondrous technology that is quietly helping to make our cities safer and smarter, crop yields bigger, and medical diagnoses faster and more accurate — in recent years, a new wave of technologists has set out to design a sui generis computer chip that is optimized specifically to unlock AI’s limitless potential. That work, too, will best be performed in a data center test laboratory that is energy efficient, scalable and sustainable.
Eric Jacobs is Chief Revenue Officer of Aligned. Contact Aligned to learn more about their cooling technology that provides 350 kW of heat rejection in just four feet, compared to 38 feet of traditional CRAH units.