Data is growing so fast that current technologies may not be able to keep up. That’s why Microsoft is working on new storage technologies to house massive amounts of data in DNA and holograms. These storage technologies could disrupt the world of data centers, and Microsoft says they are closer than we think.
At the recent Ignite conference, Microsoft Azure Chief Technical Officer Mark Russinovich showed off working prototypes for data storage systems based on DNA and holography. These approaches need extensive engineering work to commercialize at scale, but Russinovich said real-world data growth makes a compelling case for these new technologies.
“The amount of data being generated is really escaping our grasp,” said Russinovich. “There are some types of data that we simply won’t be able to efficiently store with today’s technologies. This is where we’re exploring novel ways to store data very efficiently and at very high scale to close this gap.”
The implications for digital infrastructure are profound. Storing an exabyte of data currently requires two Azure data centers, each about the size of a Walmart, Russinovich said. DNA storage could house that exabyte in just one cubic centimeter of space.
“It’s sustainable it’s organic, and it’s durable,” said Russinovich. “It lasts hundreds of thousands to millions of years. On Earth we’ve found 700,000-year-old biologic DNA. Stored under the right conditions, it can survive virtually forever.”
Holographic storage concepts are not new, but has been difficult to execute in a practical way.
“It was invented in 1960s, but it’s never been successfully commercialized,” said Russinovich. “We actually believe that now is the right time to leverage this cool storage technology. We also believe Microsoft is in the best position to do so because we’ve been working on optical technologies in this area for many years on HoloLens, and the optical technologies inside of it are something that we can leverage for holographic storage.”
These new approaches to storage seem like science fiction. What’s clear is that Microsoft is willing to invest considerable resources in “moonshot” research that brings dramatic change to infrastructure – as has clearly been seen in Project Natick, its initiative to operate edge data centers beneath the sea.
Exploring DNA Storage
In traditional data storage systems, information is stored in a binary series of ones and zeroes that are written onto spinning disks or tape using magnetics, or to DVDs or Blu-Ray media using lasers. In a DNA system, data is stored in a liquid solution containing DNA, and “read” using systems that combine electronic and molecular components.
Microsoft’s DNA storage initiative uses synthetic DNA created by companies like Twist Bioscience, rather than repurposing DNA from humans or animals.
“You might be saying ‘so when I archive through DNA my family photos or my photos from my last vacation, what kind of creatures that can create?’” Russinovich said. “Fortunately, these things are going to be inert and not actually grow, so not to worry about weird creatures based off of your holiday photos walking around and attacking people.”
Microsoft has worked with researchers from the University of Washington to create a system that encodes data in synthetic DNA that is stored in a liquid solution and can then be read and used in data processing.
The automated DNA data storage system uses software that converts the ones and zeros of digital data into the building blocks of DNA, expressed as a string of As, Ts, Cs and Gs. Then it uses lab equipment to flow the necessary liquids and chemicals into a synthesizer that builds manufactured snippets of DNA and pushes them into a storage vessel.
When the system needs to retrieve the information, it adds other chemicals to prepare the DNA and uses microfluidic pumps to push the liquids into a system that “reads” the DNA sequences and converts it back to information that a computer can understand.
Here’s a video that explains the process in more detail.
It’s cool science. But in practice, working with DNA and solutions is ideal for laboratories, not data storage facilities.
“The process thus far has been incredibly manual,” said Luiz Ceze, a professor at the University of Washington’s Molecular Information Systems Lab. “It’s literally people moving around with pipettes in their hands. The only way we’re going to make DNA storage scale up to be useable and go mainstream is by automating it. What we’ve done is show that it’s possible to automate the entire process – from bits to molecules and back to bits.”
“We’ve actually gotten to the point where we’ve got a first fully automated DNA storage system,” Russinovich said in his presentation at Insight. “It’s not just a proof of concept. We’ll take the binary data and synthesize the DNA molecule. We’ve got the storage and prep part inside that vial of DNA, and then we can sequence it out and get the resulting data all in one testbed.”
Here’s what it looks like:
Those beakers and flasks wouldn’t be at home in most data center environments. Microsoft is working on that.
“Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service — bits are sent to a data center and stored there and then they just appear when the customer wants them,” said Microsoft principal researcher Karin Strauss. “To do that, we needed to prove that this is practical from an automation perspective.”
“It might be pointing to a new kind of system that has an electronic component and a molecular component, and allow us to build a system that integrates wet molecules and dry electronics and together they do something amazing,” said Ceze.
“There’s incredibly hard engineering problems to solve to make this scale, and really be cost efficient. But we’re well on our way down this path.”
Mark Russinovich, CTO of Microsoft Azure
Russinovich believes the massive data storage potential of DNA makes a compelling case for the research.
“This is kind of the next step in showing that this technology really is viable, and there’s incredibly hard engineering problems to solve to make this scale, and really be cost efficient. But we’re well on our way down this path.”
Holographic Data Storage
For the current generation of data center professionals, holographic storage might seem an easier leap than DNA. In a holographic system, data within optical light beams is stored as an image inside a crystal of lithium niobate.
Holographic storage is capable of recording and reading millions of bits in parallel, enabling data transfer rates greater than those attained by traditional optical storage.
Project HSD is a collaboration between Microsoft Research Cambridge and Microsoft Azure to create a holographic storage system as a cloud-first design, building on recent advances in optical technologies such as smartphone cameras.
“To implement holographic storage, we need very high-resolution cameras,” said Russinovich. “If you take a look at the cameras coming out of commodity smartphones today, they’re up at the resolutions that we need – in the 10s of megapixels range – to commercialize a technology like this. We’ve actually made some really great progress.
“We’ve been able to read and write to unlock access rates that are comparable to hard disks,” he said. “We’ve also been able to leverage software compensation via deep learning to be able to read out with high degrees of accuracy, the data that’s been stored in the holographic storage. We’ve got proof of concepts now.”
Here’s a video that explains the details:
From a data center design perspective, holographic storage would be an easier adaptation that DNA, as it simply substitutes a different storage media and perhaps new form factors.
These technologies are making rapid progress, but are still likely years away from implementation at scale. Even after new technologies are viable, they are adopted gradually over time, providing data center operators time to adapt designs and best practices.
As an example, cloud computing is among the most disruptive IT trends. Amazon Web Services launched its cloud platform in 2006, and 14 years on Gartner estimates that about 30 percent of enterprise IT shops use some form of cloud storage. Effective liquid cooling solutions have been available for years, and remain lightly adopted. Change comes most slowly to the most mission-critical areas of the data center, which certainly includes storage.
But Microsoft’s interest in these technologies underscores the innovation premium of working with hyperscale data center customers, who drive a large and growing chunk of the data center business. Service providers targeting this market must reckon with the technologies these companies are embracing, and the specifications required to support them.
Here’s a look at the portion of Russinovich’s Ignite presentation where he discusses Microsoft’s approach to the future of storage.