Scientists learn to store data on DNA molecules

Because the amount of data people are generating and processing continues to grow at a remarkable pace, storage technologies are obliged to keep up with the world’s ravenous desire for information. While conventional hard drives have become ever cheaper, smaller and more capacious, researchers have been looking to alternative technologies to help shoulder the burden. Now a report has come from the European Bioinformatics Institute (EBI), a research station in Cambridge, United Kingdom, that scientists there have successfully transferred data onto DNA molecules. Specifically, they encoded several files including an MP3 file containing Martin Luther King Jr.’s “I Have a Dream Speech,” an important paper on nucleic acids, and several Shakespearean sonnets.

In a press release, Dr. Nick Goldman, who works for the EBI, stated, “We already know that DNA is a robust way to store information because we can extract it from woolly mammoth bones, which date back tens of thousands of years, and make sense of it. It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.” DNA has been used for data storage for billions of years — every living organism’s basic genetic information is stored that way — and so it should prove a reliable long-term storage solution.

Only about 739.3 kilobytes were encoded onto the DNA — about the same amount as one twelfth of a typical MP3-encoded song — but that amount alone took sophisticated techniques and difficult work to achieve. A writer for the Economist writes, “DNA uses four chemical “bases” — adenosine (A), thymine (T), cytosine (C) and guanine (G) — to encode information. Previous approaches have often mapped the binary 1s and 0s used by computers directly onto these bases.” This approach, however, proved untenable because of repetitions in bases (for instance, AAAA) that could be easily misread by DNA-sequencing machines while reading the information.

Another difficulty that presents itself is that artificial DNA can only be created in short strings or sequences, meaning that larger files needed to be carried on multiple strands. This also required an ingenious solution, as Dr. Goldman explained, “We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let’s break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn’t allow repeats.”

Other than its longevity — the scientists claim that the molecules used to store the information could last for 10,000 years in readable condition — another advantage of using DNA as a storage medium is its density. One gram of DNA can store around 2.2 petabytes (2.2 million gigabytes), an incredible leap over current systems. Although the data can only be read back slowly and cost issues prevent DNA from being used on a massive scale, the density and longevity of the biological material makes it ideal for long-term storage.