By
As Artificial Intelligence (AI) seems to be infiltrating every industry, enhancing our connectivity and convenience like never before, this surge in AI has driven an unprecedented demand for high-performance computing solutions. And, as a result, data centers – the backbone of these technological advancements – are facing unique challenges in adapting their infrastructure to handle these demanding workloads.
AI applications, spanning machine learning (ML) and deep learning algorithms, demand extensive computational power to process vast amounts of data and perform complex tasks. This computational intensity translates into significant heat generation within data centers causing advanced cooling technologies to play a pivotal role in facilitating this evolution. Without the right thermal management in place, data centers won’t be able to deliver the computing power necessary to support the AI-driven digital transformation that we are witnessing today.
Rethinking Traditional Systems
Air cooling, once a standard for managing data center temperatures, is increasingly seen as insufficient in the face of modern high density workload demands. Traditional air-cooling systems, while effective for earlier, less intensive workloads, can struggle to keep up with the heat generated by high-performance computing and AI applications. As servers and other equipment become more powerful and densely packed, the inefficiencies of air cooling – such as uneven temperature distribution and significant energy consumption – are becoming more pronounced.
This has led to a growing shift toward more advanced cooling solutions, like liquid cooling, which offer better thermal management and energy efficiency to support the next generation of data center infrastructure.
It’s important to recognize that when it comes to cooling there is no “one size fits all” so data center providers should be designing facilities to accommodate multiple types of cooling technologies within the same environment. And, whilst liquid cooling has emerged as the preeminent solution for addressing the thermal management challenges posed by AI workloads, it’s important to understand that air cooling systems will continue to be part of the data center infrastructure for the foreseeable future.
Liquid Cooling Techniques
By directly engaging with heat-producing components, liquid cooling systems offer superior efficiency and performance compared to their air-based counterparts. This approach not only enhances cooling effectiveness but also significantly reduces energy consumption and operational costs. When looking at liquid cooling, operators can consider:
Immersion Cooling: Immersion cooling involves submerging specially designed IT hardware (servers and graphics processing units (GPUs)) in a dielectric fluid, such as mineral oil or synthetic coolant. The fluid absorbs heat directly from the components, providing efficient and direct cooling without the need for traditional air-cooled systems. This method significantly enhances energy efficiency and reduces the running costs, making it ideal for AI workloads that produce substantial heat.
Direct-to-Chip Cooling: Direct-to-chip cooling, also known as microfluidic cooling, delivers coolant directly to the heat-generating components of servers, such as central processing units (CPUs) and GPUs. This targeted approach maximizes thermal conductivity, efficiently dissipating heat at the source and improving overall performance and reliability. By directly cooling critical components, the direct-to-chip method helps to ensure that AI applications operate optimally, minimizing the risk of thermal throttling and hardware failures. This technology is essential for data centers managing high-density AI workloads.
A mix and match approach should be considered for thermal management, combining different types of solutions in order to:
- Optimize Efficiency: Each cooling technology has unique strengths and limitations and different types of liquid cooling can be deployed in the same data center, or even the same hall. By combining immersion cooling, direct-to-chip cooling and / or air cooling, providers can leverage the benefits of each method to achieve optimal cooling efficiency.
- Address Varied Cooling Needs: AI workloads often consist of diverse hardware configurations with varying heat dissipation characteristics. A mix-and-match approach allows providers to customize cooling solutions based on specific workload demands.
- Enhance Scalability and Adaptability: As AI workloads evolve and data center requirements change, a flexible cooling infrastructure that supports scalability and adaptability becomes essential. Integrating multiple cooling technologies provides scalability options and facilitates future upgrades without compromising cooling performance.
Considerations
With innovation comes inevitable challenges. One of the primary hurdles is the initial investment required to implement this advanced infrastructure. While liquid cooling offers substantial long-term benefits in terms of efficiency and performance, the upfront costs for installation and set-up can be significant. Overcoming this barrier often involves careful consideration of the return on investment (ROI) and the potential for reduced operational expenses. Despite these challenges, the continual advancements in liquid cooling technology are driving its integration into modern data centers, promising enhanced thermal management and greater sustainability in the face of growing computational demands.
Another challenge is the complexity involved in designing and integrating liquid cooling systems. Unlike traditional air cooling, liquid cooling requires precise engineering to ensure that the system is both effective and reliable. The complexity increases with the need for custom solutions that fit specific data center layouts and equipment configurations. Scalability is also a crucial factor; as data centers expand and evolve, the liquid cooling infrastructure must be adaptable to accommodate growing demands and changes in technology. Addressing these complexities is essential for maximizing the benefits of liquid cooling while maintaining operational flexibility and efficiency.
The adoption of advanced liquid cooling technologies not only optimizes heat management and reuse but also contributes to reducing environmental impact by enhancing energy efficiency and enabling the integration of renewable energy sources into data center operations.
About the Author
David Watkins is the solutions director at VIRTUS Data Centres, heading up the Solutions Team that works with customers to provide customised solutions. He has been at VIRTUS since 2009, where he has previously held the roles of service delivery director and head of operations. David has a technical and commercial background and can often be found speaking about sustainability at data centre industry events as well as authoring articles on the topic, which he is passionate and knowledgeable about. Prior to joining VIRTUS, David spent more than 15 years at Unisys. His last role with the company was head of data centres UKMEA.