DNA data storage is now a reality, thanks to research from the EMBL-European Bioinformatics Institute (EMBL-EBI). Researchers there have devised a process to encode data into DNA, offering the possibility of a storage material that can potentially last for hundreds of thousands of years without significant degradation. The method makes it possible to store hundreds of millions of hours of extremely high-definition video in just the amount of DNA that would fit in a small cup.
The process is detailed in a new article published January 23rf in the journal Nature.
“There is a lot of digital information in the world – about three zettabytes’ worth (that’s 3000 billion billion bytes) – and the constant influx of new digital content poses a real challenge for archivists.”
It is very expensive and inefficient to store data on hard disks, they need a constant supply of electricity and over time the machinery degrades and ultimately fails. While the best archiving material currently available, that doesn’t rely on a power supply, degrades in only a decade or so.
“This is a growing problem in the life sciences, where massive volumes of data – including DNA sequences – make up the fabric of the scientific record.”
“We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it,” says Nick Goldman of EMBL-EBI. “It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.”
Decoding and ‘reading’ the DNA is actually pretty simple, but the process of encoding and ‘writing’ data has been until now something of a problem.
“There are two challenges: first, using current methods it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.”
So the researchers, Nick Goldman and co-author Ewan Birney, set out to address these problems.
“We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats. That way, you would have to have the same error on four different fragments for it to fail – and that would be very rare,” says Ewan Birney.
“The new method requires synthesising DNA from the encoded information: enter Agilent Technologies, Inc, a California-based company that volunteered its services. Ewan Birney and Nick Goldman sent them encoded versions of: an .mp3 of Martin Luther King’s speech, ‘I Have a Dream’; a .jpg photo of EMBL-EBI; a .pdf of Watson and Crick’s seminal paper, ‘Molecular structure of nucleic acids’; a .txt file of all of Shakespeare’s sonnets; and a file that describes the encoding.”
“We downloaded the files from the Web and used them to synthesise hundreds of thousands of pieces of DNA – the result looks like a tiny piece of dust,” explains Emily Leproust of Agilent.
The sample was then sent to EMBL-EBI, and the DNA was sequenced and decoded without any errors.
“We’ve created a code that’s error tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer,” notes Nick Goldman. “As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA.”
There are still some kinks to work out, but overall the technology looks very promising, especially as a means to cut down on the cost of storing large amounts of data. It should definitely end up as a cost saver when accurate long-term storage is needed.
The researchers say that their next step is to work on perfecting the coding process, and exploring the steps needed to commercialize the technology.
Image Credits: EMBL Photolab; DNA via Wikimedia Commons