The idea of using liquids to cool IT hardware, exemplified by technologies such as cold plates and immersion cooling, is frequently hailed as the ultimate solution to the data center’s energy efficiency and sustainability challenges. If a data center replaces air cooling with direct liquid cooling (DLC), chilled water systems can operate at higher supply and return water temperatures, which are favorable for both year-round free cooling and waste heat recovery.
Indeed, there are some larger DLC system installations that use only dry coolers for heat rejection, and a few installations are integrated into heat reuse schemes. As supply chains remain strained and regulatory environments tighten, the attraction of leaner and more efficient data center infrastructure will only grow.
However, thermal trends in server silicon will challenge engineering assumptions, chiefly DLC coolant design temperature points that ultimately underpin operators’ technical, economic and sustainability expectations of DLC. Some data center operators say the mix of technical and regulatory changes on the horizon are difficult to understand when planning for future capacity expansions — and the evolution of data center silicon will only add to the complications.
Uptime Institute Intelligence has repeatedly noted the gradual but inescapable trend towards higher server power — barring a fundamental change in chip manufacturing technology (see Silicon heatwave: the looming change in data center climates). Not long ago, a typical enterprise server used less than 200 watts (W) on average, and stayed well below 400 W even when fully loaded. More recent highly performant dual-socket servers can reach 700 W to800 W thermal power, even when lightly configured with memory, storage and networking. In a few years, mainstream data center servers with high-performance configurations will require as much as 1 kilowatt (kW) in cooling, even without the addition of power-hungry accelerators.
The underlying driver for this trend is semiconductor physics combined with server economics for two key reasons. First, even though semiconductor circuits’ switching energy is dropping, the energy gains are being outpaced by an increase in the scale of integration. As semiconductor technology advances, the same area of silicon will gradually consume (and dissipate) ever more power as a result. Chips are also increasing in size, compounding this effect.
Second, many large server buyers prefer highly performant chips that can process greater software payloads faster because these chips drive infrastructure efficiency and business value. For some, such as financial traders and cloud services providers, higher performance can translate into more direct revenue. In return for these benefits, IT customers are ready to pay hefty price premiums and accept that high-end chips are more power-hungry.
The escalation of silicon power is now supercharged by the high demand for artificial intelligence (AI) training and other supercomputing workloads, which will make the use of air cooling more costly. Fan power in high-performance servers can often account for 10% to 20% of total system power, in addition to silicon static power losses, due to operating near the upper temperature limit. There is also a loss of server density, resulting from the need to accommodate larger heat sinks and fans, and to allow more space between the electronics.
In addition, air cooling may soon see restrictions in operating temperatures after nearly two decades of gradual relaxation of set points. In its 2021 Equipment thermal guidelines for data processing environments, US industry body ASHRAE created a new environmental class for high-density servers with a recommended supply temperature maximum of 22°C (71.6°F) — a whole 5°C (9°F) lower than the general guidelines (Class A1 to A4), with a corresponding dip in data center energy efficiency (see New ASHRAE guidelines challenge efficiency drive).
Adopting DLC offers relief from the pressure of these trends. The superior thermal performance of liquids, whether water or engineered fluids, makes the job of removing several hundred watts of thermal energy from compact IT electronics more straightforward. Current top-of-the-line processors (up to 350 W thermal design power) and accelerators (up to 700 W on standard parts such as NVIDIA data center GPUs) can be effectively cooled even at high liquid coolant temperatures, allowing the facility water supply for the DLC system to be running as high as 40°C (104°F), and even up to 45°C (113°F).
High facility water temperatures could enable the use of dry coolers in most climates; or alternatively, the facility can offer valuable waste heat to a potential offtaker. The promise is attractive: much reduced IT and facility fan power, elimination of compressors that also lower capital and maintenance needs, and little to no water use for cooling. Today, several high-performance computing facilities with DLC systems take advantage of the heat-rejection or heat-reuse benefits of high temperatures.
Achieving these benefits is not necessarily straightforward. Details of DLC system implementation, further increases in component thermal power, and temperature restrictions on some components all complicate the process further.
The net effect of all these factors is clear: widespread deployment of DLC to promote virtually free heat rejection and heat reuse will remain aspirational in all but a few select cases where the facility infrastructure is designed around a specific liquid-cooled IT deployment.
There are too many moving parts to accurately assess the precise requirements of mainstream DLC systems in the next five years. What is clear, however, is that the very same forces that are pushing the data center industry towards liquid cooling will also challenge some of the engineering assumptions around its expected benefits.
Operators that are considering dedicated heat rejection for DLC installations will want to make sure they prepare the infrastructure for a gradual decrease in facility supply temperatures. They can achieve this by planning increased space for additional or larger heat rejection units — or by setting the water temperature conservatively from the outset.
Temperature set points are not dictated solely by IT requirements, but also by flow rate considerations — which has consequences for pipe and pump sizing. Operating close to temperature limits means loss of cooling capacity for the coolant distribution units (CDU), requiring either larger CDUs or more of them. Slim margins also mean any degradation or loss of cooling may have a near immediate effect at full load: a cooling failure in water or single-phase dielectric cold-plate systems may have less than 10 seconds of ride-through time.
Today, temperatures seem to be converging around 32°C (89.6°F) for facility water — a good balance between facility efficiency, cooling capacity and support for a wide range of DLC systems. Site manuals for many water-cooled IT systems also have the same limit. Although this is far higher than any elevated water temperature for air-cooling systems, it still requires additional heat rejection infrastructure either in the form of water evaporation or mechanical cooling. Whether lower temperatures will be needed as server processors approach 500 W — with large memory arrays and even higher power accelerators — will depend on a number of factors, but it is fair to assume the likely answer will be “yes”, despite the high cost of larger mechanical plants.
These considerations and limitations are mostly defined by water cold-plate systems. Single-phase immersion with forced convection and two-phase coolants, probably in the form of cold-plate evaporators rather than immersion, offer alternative approaches to DLC that should help ease supply temperature restrictions. For the time being, water cold plates remain the most widely available and are commonly deployed, and mainstream data center operators will need to ensure they meet the IT system requirements that use them.
In many cases, Uptime Intelligence expects operators to opt for lower facility supply water temperatures for their DLC systems, which brings benefits in lower pumping energy and fewer CDUs for the same cooling capacity, and is also more future proof. Many operators have already opted for conservative water temperatures as they upgrade their facilities for a blend of air and liquid-cooled IT. Others will install DLC systems that are not connected to a water supply but are air-cooled using fans and large radiators.
The switch to liquid to cool IT electronics offers a host of energy and compute performance benefits. However, future expectations based on the past performance of DLC installations are unlikely to be met. The challenges of silicon thermal management will only become more difficult as new generations of high-power server and memory chips develop. This is due to stricter component temperature limits, with future maximum facility water temperatures to be set at more conservative levels. For now, the vision of a lean data center cooling plant without either compressors or evaporative water consumption remains elusive.
The post Performance expectations of liquid cooling need a reality check appeared first on Website Host Review.
Cloud storage makes it easy to store and access files from anywhere, but it also…
Flare researchers have identified a threat actor known as TeamPCP behind a large-scale campaign targeting…
Threat actors are using artificial intelligence (AI) to accelerate cloud intrusions. In a recent incident…
A Windows Admin Center Azure SSO flaw could let attackers pivot from one compromised machine…
A newly discovered vulnerability in Traefik’s experimental ingress-nginx provider silently disabled TLS certificate verification for…
Effective fire prevention in data centers requires a coordinated approach that adapts to evolving hazards…