Difference between revisions of "Team:Tsinghua-A/Encryption"

 
Line 10: Line 10:
 
<center style="height: 100%;">
 
<center style="height: 100%;">
 
<div class="top_pic" style="height: 100%;margin-top: 0;">
 
<div class="top_pic" style="height: 100%;margin-top: 0;">
<div style="background-image: url(https://static.igem.org/mediawiki/2019/1/13/T--Tsinghua-A--Main3.png);height: 100%;
+
<div style="background-image: url(https://static.igem.org/mediawiki/2019/0/0f/T--Tsinghua-A--MIAN3.png);height: 100%;
 
     width: 100%;background-attachment: fixed;background-size: cover;">
 
     width: 100%;background-attachment: fixed;background-size: cover;">
 
     </div>
 
     </div>

Latest revision as of 02:04, 10 December 2019

 

 

Introduction

For its high storage density and stability, we can expect that DNA storage will be mainly and widely used by enterprise users and governments. And they care most is safety.

In our project, we begin with the traditional chaotic encrpytion method, and explored encryption embedded in DNA channel encoding.

Chaotic Encryption: Theory

Symmetric-key cryptography

Symmetric-key cryptography system uses a same key for encryption and decryption. In such an algorithm, only those who own the key can encrypt plaintext or decrypt ciphertext. Once your key is hacked, hackers can decrypt ciphertext and get your data, too. However, instead of stealing the key itself, hackers have many other ways to infer you key when you are using a static key, such as known-plaintext attacks, chosen-plaintext attacks, differential cryptanalysis and linear cryptanalysis. For example, if hackers get several plaintext-ciphertext pairs, it's easy for them to infer your key and your encryption algorithm.

In order to prevent those attacks listed above, we build a dynamic symmetric-key cryptography system to encrypt our plaintext and protect DNA fountain process using two different algorithm. The word 'dynamic' means we use a different key in every encryption process, so that those attacks listed above are theoretically impossible.

Chaotic encryption algorithm

We use chaotic encryption algorithm to generate dynamic keys. Chaotic encryption algorithm is an application of the chaos theory. In math or physics, chaos usually refers to the behavior of dynamical systems that are highly sensitive to initial conditions. Small differences in initial conditions will cause widely diverging outcomes just like butterfly effect. Chaotic system has many great feature such as aperiodicity, high sensitivity and noise similarity. So there many similarities between chaotic maps and cryptographic systems.

Logistic map is a simple but useful chaotic map.

 

[Bifurcation diagramfor the logistic map. The attractor for any value of the parameter r is shown on the vertical line at that r. from wikipedia]

Logistic map can be used to generate key with a tuple of float (u,x). After tens of iterations, the key will be similar to a pseudo-random array.

The cyphertext will be the answer of key XOR plaintext.

 

How it works

Storage end

  1. get your FILE (txt, jpg or binary file) to be encrypted,

  2. choose a KEY-TUPLE (two float: u,x),

  3. generate KEY-LIST from KEY-TUPLE using Logistic map.

  4. XOR your FILE with KEY-LIST, get ENC-FILE

  5. dna-fountain ENC-FILE, get FT-ENC-FILE

  6. look up your KEY in KEY-MAP(a float-DNA map) to get KEY-DNA(DNA sequence, with a determined primer)

  7. pack FT-ENC-FILE with KEYDNA, get the DNA-PACKAGE

     

Receiving end

  1. from DNA-PACKAGE get KEY-DNA,
  2. look up KEY-DNA in KEY-MAP to get KEY-TUPLE
  3. de-fountain FT-ENC-FILE, get ENC-FILE
  4. generate KEY-LIST from KEY-TUPLE
  5. de-encrypt ENC-FILE with KEY-LIST, get FILE

 

Evaluation criterion of cryptography algorithm

Two method are used for evaluating the security of cryptography method.

Sensitivity: If the cyphertext can’t be decrypted by a wrong key which is very close to the real key, then we consider the algorithm is of high sensitivity.

Randomness: We use information entropy to measure the randomness of ciphertext. Higher the information entropy is, less information can be read from cyphertext.

Example

Take <lena.jpg> for an example:

We use (3.8765, 0.5678) as real key tuple (u,x), and the wrong key is (3.8765, 0.567800000000001)

 

 originalencrypteddecrypt with wrong keyrandom noise
information entropy7.39467.97137.96407.9890
2D information entropy10.300413.626113.664413.8275

 

Hackers can hardly get any information form cyphertext for its information entropy is very close to random noise image.

Without the key-tuple, hackers can only search out the whole key-tuple space to get the original file. If using 17 significant digits computational accuracy (default setting of python), we can have more than 4.3x10E33 key-tuples, which is totally unsolvable by brute-force key search attack"!

Encryption in the process of encoding

Source encryption methods like chaotic encryption can provide encryption function in DNA data storage, and entropy of the file can also be increased in the process, which is a compelling feature for encoding (as bio constrains like GC content are more likely to be met)

However, these methods are unlikely to solve the problem of long repeats in sequences, and can not endure noise in the channel, so they must be combined with channel encoding methods for practical DNA data storage, thus requires more computation. Can encryption be directly embedded in the encoding process, so the data can be encrypted just in the process of encoding with no extra effort?

Luckily for us, both DNA Fountain and DNA Mask provide embedded encryption.

DNA Fountain encryption

​ When we obtain a number of DNA droplets and wish to recover the original data, we have to know from which chunks the droplet come, and this is done by putting the same seed into the same random number generator to get the same chunk numbers. So if we want to encrypt the original file we can:

  • Encrypt the seed: Encrypt the seed (an integer) so the seed hackers get is not the real seed, so they can't get the correct chunk numbers if they don't know how the seed is encrypted.
  • Encrypt the random number generator: Even hackers can get the right seed, if they can't set up the same random number generator the chunk numbers will still be wrong. This can be done with multiple method, like using the traditional linear congruential way (A * x + C) %M to generate random number and set different random number generator with parameters A, C, M, adjusting the degree distribution, or abandon D random numbers generated first.

DNA Mask encryption

Decoding in DNA Mask requires the original mask used in encoding, so the mask can be used in encryption: if the wrong mask is used it will be impossible to obtain the original DNA from the masked DNA, thus the original file can not be retreived. As the mask is actually randomly generated, it will be hard to guess the same set of masks for a harker

Can we actually encode a file into DNA, encrypt it and retreive it back? Now let's move from our in silco encoding design to in vivo large scale DNA data storage experiment.