Difference between revisions of "Team:Tsinghua-A/Model"

Latest revision as of 02:03, 10 December 2019

Model

Limitation in length

Currently large scale DNA data storage relays on oligo pool chip synthesis technology, and a file must be divided to several data chunks and encoded into oligos of 100-200 bases. To recover the data, index must be added to each data chunk so the file can be assembled with all these fragments.

Channel noise: Error within a sequence

Error may happen in one sequence, including substitution, insertion and deletion in the process of synthesis, decay and sequencing. So a file is directly encoded to DNA with base-bits mapping, it may can't be fully recovered after sequencing due to the noise introduced in the DNA data storage channel, as shown below.

Channel noise: Lost of sequence

Besides errors within a sequence,whole sequence might also be lost due to a couple of reasons:

•decay and PCR are sequence dependent, which will lead to uneven distribution

•Sampling and sequencing amounts to drawing from a pool and some sequences might be lost

Bio constrains

It is well known that sequences with high GC content and long repeats are hard for synthesis and sequencing, so these patterns biology related patterns must be avoided in encoding. However, these patterns can be frequently observed if binary files are transformed to bases directly:

The majority of DNA encoded from most of the files have GC content between 0.4-0.6 by nature, except for the text file which have extremely high gc content as will explained later.
4/5 of the encoded sequences have repeats longer than 4 bases

In summary, to encode a file to DNA and perfectly recover it back, the file must be spilt into small data chunks and encoded to DNA sequences which fit bio constrains, and the encoding method must endure channel noises, both errors within a sequence and lost of whole sequence.

In addition, some extra requirements for a data storage method should also be taken into account, like information density (how many bits can be stored in one base) and data security.

Can we put forward a method which fits all these requirement? Click here to see our methods.

@@ Line 1: / Line 1: @@
 {{Tsinghua-A}}
 <html>
+<head>
+	<link rel="stylesheet" href="https://2019.igem.org/Template:Tsinghua-A/CSSTest?action=raw&ctype=text/css">
+</head>
+<body>
+	<section style="background-size: 100%;height: 60em;margin-top: 0em;padding-top: 0em;">
+		<div style="height: 100%;">
+			<center style="height: 100%;">
+				<div class="top_pic" style="height: 100%;margin-top: 0;">
+					<div style="background-image: url(https://static.igem.org/mediawiki/2019/4/43/T--Tsinghua-A--MAIN1.png);height: 100%;
+    width: 100%;background-attachment: fixed;background-size: cover;">
+    				</div>
+				</div>
+			</center>
+		</div>
+	</section>
+	<section style="padding-top: 6em; padding-bottom: 50px; outline: none;">
+		<div class="my_container">
+			<div class="row">
+				<div class="right_part">
+					<p>&nbsp;</p>
+<p>&nbsp;</p>
+<html>
+<head>
+<meta charset='UTF-8'><meta name='viewport' content='width=device-width initial-scale=1'>
+<title>Model</title></head>
-<div class="clear"></div>
+<body><h3>Limitation in length </h3>
+<p>Currently large scale DNA data storage relays on oligo pool chip synthesis technology, and a file must be divided to several data chunks and encoded into oligos of 100-200 bases. To recover the data, index must be added to each data chunk so the file can be assembled with all these fragments.</p>
+<h3>Channel noise: Error within a sequence</h3>
-<div class="column full_size">
+<div class="demo" style="text-align: center;">
-<h1> Modeling</h1>
+	<img src="https://static.igem.org/mediawiki/2019/0/09/T--Tsinghua-A--Error1.png">
-<p>Mathematical models and computer simulations provide a great way to describe the function and operation of BioBrick Parts and Devices. Synthetic Biology is an engineering discipline, and part of engineering is simulation and modeling to determine the behavior of your design before you build it. Designing and simulating can be iterated many times in a computer before moving to the lab. This award is for teams who build a model of their system and use it to inform system design or simulate expected behavior in conjunction with experiments in the wetlab.</p>
 </div>
-<div class="clear"></div>
+<p>	Error may happen in one sequence, including substitution, insertion and deletion in the process of synthesis, decay and sequencing. So a file is directly encoded to DNA with base-bits mapping, it may can&#39;t be fully recovered after sequencing due to the noise introduced in the DNA data storage channel, as shown below.</p>
+<div class="demo" style="text-align: center;">
-<div class="column full_size">
+	<img src="https://static.igem.org/mediawiki/2019/1/1a/T--Tsinghua-A--Error2.png">
-<h3> Gold Medal Criterion #3</h3>
-<p>
-Convince the judges that your project's design and/or implementation is based on insight you have gained from modeling. This could be either a new model you develop or the implementation of a model from a previous team. You must thoroughly document your model's contribution to your project on your team's wiki, including assumptions, relevant data, model results, and a clear explanation of your model that anyone can understand.
-<br><br>
-The model should impact your project design in a meaningful way. Modeling may include, but is not limited to, deterministic, exploratory, molecular dynamic, and stochastic models. Teams may also explore the physical modeling of a single component within a system or utilize mathematical modeling for predicting function of a more complex device.
-</p>
 </div>
+<p>&nbsp;</p>
-<div class="column two_thirds_size">
+<h3>Channel noise: Lost of sequence</h3>
-<h3>Best Model Special Prize</h3>
+<p>&nbsp;</p>
+<div class="demo" style="text-align: center;">
-<p>
+	<img src="https://static.igem.org/mediawiki/2019/b/b5/T--Tsinghua-A--SequenceLost.png">
-To compete for the <a href="https://2019.igem.org/Judging/Awards">Best Model prize</a>, please describe your work on this page  and also fill out the description on the <a href="https://2019.igem.org/Judging/Judging_Form">judging form</a>. Please note you can compete for both the Gold Medal criterion #3 and the Best Model prize with this page.
-<br><br>
-You must also delete the message box on the top of this page to be eligible for the Best Model Prize.
-</p>
 </div>
+<p>	Besides errors within a sequence,whole sequence might also be lost due to a couple of reasons:</p>
+<p><strong>•decay and PCR</strong> are sequence dependent, which will lead to uneven distribution</p>
+<p><strong>•Sampling and sequencing</strong> amounts to drawing from a pool and some sequences might be lost</p>
+<h3>Bio constrains</h3>
+<p>It is well known that sequences with high GC content and long repeats are hard for  synthesis and sequencing, so these patterns biology related patterns must be avoided in encoding. However, these patterns can be frequently observed if binary files are transformed to bases directly:</p>
+<ul>
+<li>The majority of DNA encoded from most of the files have GC content between 0.4-0.6 by nature, except for the text file which have extremely high gc content as will explained later.</li>
+<li>4/5 of the encoded sequences have repeats longer than 4 bases  </li>
-<div class="column third_size">
-<div class="highlight decoration_A_full">
-<h3> Inspiration </h3>
-<p>
-Here are a few examples from previous teams:
-</p>
-<ul>
-<li><a href="https://2018.igem.org/Team:GreatBay_China/Model">2018 GreatBay China</a></li>
-<li><a href="https://2018.igem.org/Team:Leiden/Model">2018 Leiden</a></li>
-<li><a href="https://2016.igem.org/Team:Manchester/Model">2016 Manchester</a></li>
-<li><a href="https://2016.igem.org/Team:TU_Delft/Model">2016 TU Delft</li>
-<li><a href="https://2014.igem.org/Team:ETH_Zurich/modeling/overview">2014 ETH Zurich</a></li>
-<li><a href="https://2014.igem.org/Team:Waterloo/Math_Book">2014 Waterloo</a></li>
 </ul>
+<div class="demo" style="text-align: center;">
+	<img src="https://static.igem.org/mediawiki/2019/3/38/T--Tsinghua-A--Bio.png">
 </div>
-</div>
+<p>&nbsp;</p>
+<p>In summary, <strong>to encode a file to DNA and perfectly recover it back</strong>, the file must be <strong>spilt into small data chunks</strong> and encoded to DNA sequences which fit <strong>bio constrains</strong>, and the encoding method must <strong>endure channel noises</strong>, both errors within a sequence and lost of whole sequence. </p>
+<p>In addition, some extra requirements for a data storage method should also be taken into account, like <strong>information density</strong> (how many bits can be stored in one base) and <strong>data security</strong>. </p>
+<p>Can we put forward a method which fits all these requirement? <a href = 'https://2019.igem.org/Team:Tsinghua-A/encode_decode' target = '_blank' >Click here </a>to see our methods.</p>
+</body>
 </html>