Team:Shanghai City/Description

Description

Inspiration

For billions of years, DNA has carried and inherited information of life. Now, with advances in DNA synthesis and sequencing, attempts have been made to use synthetic DNA for data storage and information exchange. Compared with hard disk information storage, DNA as data storage has many advantages, such as DNA has high information density specific (data bits per gram), longer storage time and other characteristics. Using different algorithmic strategies, people have encoded images, text, movies, software, and even computer operating systems into synthetic DNA for storage. As the world's information grows at an exponential rate and information will soon have nowhere to put it, DNA as a high-density storage medium provides a solution for future data storage, although the technology of DNA storage still needs to make breakthroughs in reading and writing speed.

In addition to big data storage applications, DNA also has potentially significant value in the exchange of classified information, such as industrial and military fields. There are three common ways to secure the exchange of DNA information, including DNA cryptography, DNA steganography, and a combination of the two. DNA encryption method using the digital information technology encryption method, that is to say, with a particular key, and the corresponding method, will be translated into understandable seemingly meaningless DNA content, for example, the encryption method of Virginia (Vigenere cipher), DNA-based PFC, AES, RSA, etc., were reviewed and compared the advantages and disadvantages of these methods in DNA information communication.

DNA steganography is the practice of hiding a piece of DNA that is truly meaningful in a large number of other sequences of DNA that are meaningless or have false meanings. Due to the characteristics of DNA with high density of information, high complexity and high randomness, steganography has an important application value in DNA confidential communication. As early as 1999, Clelland et al. mixed a small piece of information DNA with human genome DNA, so that to decrypt, a pair of primers, that is, the decrypted key, were needed to obtain the corresponding information bands through PCR amplification, and then the sequencing was carried out to translate the real information. As with digital information security, more complex encryption and decryption methods can further improve security, such as public key system, more complex Encoding Encoding, or DNA structure-based gel electrophoresis Encoding, etc. However, there are some problems in these methods, such as the reduction of the operation convenience of encryption method and the reduction of information stock.

DNA is a good media for storage information

The main character in our invention is DNA, or Deoxyribonucleic Acid. You may be surprised: why it is not a high-tech hard drive, but just such a small and general DNA that people often mentioned in the daily life? Yes, it is the DNA. But do not belittle this small polymer, in Jurassic Park, this little guy has copied a whole dinosaur world with only the DNA preserved in a mosquito body. It's true that duplicate a whole Jurassic world just by a mosquito’s DNA is impossible. However, the ability of storing the information in DNA is true. Guess what? According to the scientist, in 2020 the world total data storage will reach 40ZB! Well, what is ZB?

According to the image above, 40ZB equals to 42,949,672,960TB=43,980,465,111,040 GB! Man, it's almost 43 Trillion GB! What a impressed number. If stored these data in 4TB Hard Disk Drives, which weighted 250g each, we need 10,737,418,240 pieces of HDD, which weighted 2,684,354,560 Kg! With the further development of the tecnology, there gona be one day that people have no more space to store these data! How about storing these data in DNA? Only a Kilograms of DNA is definitely enough.
DNA or deoxyribonucleic acid, is stored as a code made up of four chemical bases: adenine(A), thymine(T) cytosine(C), and guanine(G). Compare to the HDD and Flash Memory, DNA has larger data retention, power usage, and data density, which means a larger storage capacity.

What is encryption?

Encryption is what keeps your personal data secure when you're shopping or banking online. It scrambles data like your credit card details and home address to ensure hackers can't misuse this information. Today, encryption involves powerful computers and some equally powerful brains. 

The History of encryption can be followed up to Circa 600 BC. In Circa 600 BC ancient Spartans use a device called a scytale to send secret messages during battle. This consists of a leather strap wrapped around a wooden rod. The letters on the leather strip are meaningless when it's unwrapped. Thus the recipient have to use a rod of the same diameter on which the parchment is wrapped to read the message. If the enemy intercept the strap but with a inappropriate rod, they would not able to read the information on it.
In 1553, Giovan Battista Bellaso envisions the first cipher to use a proper encryption key - an agreed-upon keyword that the recipient needs to know if he or she wants to decode the message, which underlay the base of modern encryption technology.
Currently, people are still encrypting information by adding keys, but just using computers to add complexity.

DNA is an opional for secret communication

Since 2007, Canadian Company Dwave invented the first Quantum Computer, the decyption ability of Quantum Computation has increased rapidly; The faster Quantum Computation can reach, the less time needed for decryption. It’s no exaggeration to say that even Tony Stark's safe can be easily opened by Quantum Computer. More and more scientists and politicians began to employ biological encryption to protect the primary information. There even get some people who apply the most conventional preserve method: writing the information on a paper and stored it. Although the new methods seem to be safer than the computer encryption, it's still existed someway to decrypt. Thus, to storage the information in the safest way, we invent a new method by combining all three encryptions method.

Common paper-based text information or electronic information may be intercepted and cracked, and the information security for commercial interests, national security is critical. Life information has been stored in DNA for billions of years, and it can also be used as storing and communicating any information. DNA has a very high density of information per unit mass, and can be stored in paper which is hard to find. The method of Cas12a-assisted DNA steganography (CADS) is based on the specific capture of binding primers of Cas12a, which enables the correct information of DNA to be stored in a large number of junk and false DNA information, and further enhances the security of key. CADS utilizes the trans activity of the Cas12a ternary complex to cleave the false primers in the system, leaving the true primers which are protected by the complex. After PCR amplification, true DNA information could be obtained.

Our Approach

Our iGEM team is dedicated to DNA encryption and storage of information and secret communication. The following four levels describe our topic and compare it

with previous technologies.
1. The first encryption level is the combination of DNA encryption and computer technology. How do you translate that information into the sequence of DNA (ATCG)? We adopted the method corresponding to the password table or further strengthened combined with computer science. We also designed a 4W (who, where, when, what) one-step information assembly method similar to golden gate.
2. The second level is the application of steganography. Clelland et al. first used biosteganalysis to hide a meaningful piece of DNA in genomic DNA, then decoded it by polymerase chain reaction (PCR) using a pair of primers (keys), and then sequenced. We also tested this.
3. The third level is encryption or steganography of the secret key (in this case, the Primer) itself. (1) Encryption: we perform asymmetric secret key encryption similar to computer Encryption on two primers.(2) Steganography: on the basis of CADS method (long primers), add more complex information interference items, including single and double stranded DNA, etc.
4. The fourth level is the preservation of DNA. The DNA is stored on paper, making it harder for a thief to decipher. As far as I know, this is the first time DNA has been used for encryption on paper.

Key References

Li, S. Y.; Liu, J. K.; Zhao, G. P.; Wang, J., CADS: CRISPR/Cas12a-Assisted DNA Steganography for Securing the Storage and Transfer of DNA-Encoded Information. ACS synthetic biology 2018, 7 (4), 1174-1178.

References

1 Carlson, R. The changing economics of DNA synthesis. Nature biotechnology 27, 1091 (2009).
2 Medini, D. et al. Microbiology in the post-genomic era. Nature Reviews Microbiology 6, 419-430 (2008).
3 Carr, P. A. & Church, G. M. Genome engineering. Nature biotechnology 27, 1151-1162 (2009).
4 Bornholt, J. et al. A DNA-Based Archival Storage System. 637-649, doi:10.1145/2872362.2872397 (2016).
5 Castillo, M. From hard drives to flash drives to DNA drives. AJNR Am J Neuroradiol 35, 1-2, doi:10.3174/ajnr.A3482 (2014).
6 Cox, J. P. Long-term data storage in DNA. TRENDS in Biotechnology 19, 247-250 (2001).
7 Bancroft, C., Bowler, T., Bloom, B. & Clelland, C. T. Long-term storage of information in DNA. Science 293, 1763-1765 (2001).
8 Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950-954, doi:10.1126/science.aaj2038 (2017).
9 Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628, doi:10.1126/science.1226355 (2012).
10 Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77-80, doi:10.1038/nature11875 (2013).
11 Ball, P. Material witness: Gene memories. Nat Mater 16, 393, doi:10.1038/nmat4887 (2017). 12 Marwan, S., Shawish, A. & Nagaty, K. DNA-based cryptographic methods for data hiding in DNA media. Biosystems 150, 110-118, doi:10.1016/j.biosystems.2016.08.013 (2016).
13 Brunet, T. D. Aims and methods of biosteganography. J Biotechnol 226, 56-64, doi:10.1016/j.jbiotec.2016.03.044 (2016).
14 Kar, N., Majumder, A., Saha, A., Deb, S. & Pal, M. C. Data security and cryptography based on DNA sequencing. International Journal of Information Technology & Computer Science (IJITCS) 10 (2013).
15 Gao, Q. BioCryptography. Journal of Applied Security Research 5, 306-325 (2010). 16 Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533-534, doi:10.1038/21092 (1999).
17 Tanaka, K., Okamoto, A. & Saito, I. Public-key system using DNA as a one-way function for key distribution. Biosystems 81, 25-29, doi:10.1016/j.biosystems.2005.01.004 (2005).
18 Zakeri, B., Carr, P. A. & Lu, T. K. Multiplexed Sequence Encoding: A Framework for DNA Communication. PLoS One 11, e0152774, doi:10.1371/journal.pone.0152774 (2016).
19 Halvorsen, K. & Wong, W. P. Binary DNA nanostructures for data encryption. PLoS One 7, e44212, doi:10.1371/journal.pone.0044212 (2012).
20 Leier, A., Richter, C., Banzhaf, W. & Rauhe, H. Cryptography with DNA binary strands. Biosystems 57, 13-22 (2000).