A more precise approximation for multi-mismatch situation
How RNA combines with the target DNA
The RNA migrates randomly inside the cell and interact with the target DNA. If they match, the corresponding enzyme will be activated, which make the corresponding DNA sequence be inserted into the matching site. Theorically, only the DNA with exact match should be cut. but miscuting will occur in this process. The current theoretical explanation is that when the RNA moves inside the cell, the segments of RNA and DNA segments will randomly combine. If the bases match each other, the energy will be reduced, from time $t_0$ to time $t_0 + δt$, there will be a non-zero probability that one more pair of base pairs each other, for the case where n base pairing has occurred, after δt
There will be a probability of $p_0$ that the system will remain current situation (this probability $p_0 $ depends on $δt$), while the probability of $p_{m1}$ (p minus 1) causes the nth base pair to unravel, becoming n-1 base paired, and the probability of $p_a1$(p add 1) is that the (n+1)th base is combined, which means n+1 base pairing occurs, and the sizes of $p_0$, $p_{m1}$, and $p_{a1}$ satisfy that the sum of $p_0\ p_{a1}\ p_{m1}$ is 1 (normalization condition) And the relative values of $p_{m1}$ and $p_{a1}$ are determined by the binding energy of different sites (the probability of distribution of different energy states is different and conforms to the Boltzmann distribution)
Why we can ignore the possibility in which state changes from n base into n+2 base
Although there is a probability of a state transition from n-1 base pairs to n-2 base pairs in $δt$ time, also the probability of pm2 and n+1 base pair n+2 base pairs evolves. However, as long as the time scale of δt is small enough, then $p_{m2}$, $p_{m3}$, etc and $p_{a2}$, $p_{a3}$, etc will be higher-order small quantities than $p_{m1}$ and $p_{a1}$. Based on this condition, the papers we quoted easily give a single-point mismatch possibility. When it is turn to the probability of mismatching occurs in the CRISPR system (see the unit-point matching part for detailed analytical derivation), it is difficult to give an analytical solution when the multi-site does not match. Based on our own capabilities, We chose a relatively simple method: give a numerical solution.
The numerical solution follows the above idea, matching a given RNA with the target DNA, starting from the k sites of the target DNA, and the iteration of each matching process follows the above description, and the matching corresponding to each site is repeated N times (if The requirement is 0.1% accuracy, and the small-scale test shows that the number of repetitions will need to be 10^5 times. If this is calculated directly, considering that the calculation amount in each repetition process is proportional to the square of the RNA length L, The final calculation amount will reach $n×N×L^2$ times. Considering that n~$10^9$, N~$10^5$, L~$10^1$ in the cell, the final calculation amount will be unacceptable. In order to reduce the amount of calculation, so that the whole calculation can be done by our weak personal computer, we have taken such an optimization, first, we consider the physical constraints, the sequences in the seed area must match each other, otherwise there is a great probability of being not combined (this probability is infinitely close to 1, So that we can ignore the sequence of these whose seed regions do not match.) both the numerical tests made by us and experiments done by our teammates together show that in the real state, the probability of off-targeting of regions with more than one site mismatch is extremely small, besides, this result is consistent with and the literature that we quote. So more than 4 sites are ignored. In this case, the number of sequences to be calculated is greatly reduced (about $10^{-8}$ of the original states). In this case, numerical calculations can effectively obtain probability that the off-target occurs .
Note: In order to make mapping easily, for the schematic, we use non-real experimental parameters.
The hypothetical site and the corresponding matching energy diagram at the time of the test are given below.
Schematic Diagram 1
The figure below shows the relationship between the off-target probability and the probability of an iterative transition occurring when the mismatch is adjusted (the transition probability for a fixed match is 0.99):
Here we will show how to analyze the pictures: firstly, from the picture we can see that if jumping possibility for dismatched base is close to 0, then off-target probability is close to 0. This shows a extreme situation that only in the condition that base is matched that RNA can match the target DNA. Obviously, under this circumstance off-target probability is zero for only matched base are allowed. Besides, the picture also shows the continuity of probability-parameter relationship, which reduces the concern that little change in parameters cause chaos in results: off-target probability
Schematic Diagram 2
The figure below shows the relationship between the off-target probability and the probability of an iterative transition occurring when adjusting the match (the transition probability is 0.05 for fixed mismatch):
Here we will show how to analyze the pictures: firstly, from the picture we can see that the higher that jumping possibility for matched base is, then the higher that off-target probability will be. This shows that the higher base combine energy is for matched base, then the higher the off-target probability will be. Obviously, the higher combine energy is, the higher possibility that base will combine, which will cause a higher off-target possibility. Besides, the picture also shows the continuity of probability-parameter relationship, which reduces the concern that little change in parameters cause chaos in results: off-target probability.
SJTU-BioX-Shanghai
Contact us: sjtuigem@gmail.com
Bio-X Institute, Shanghai Jiao Tong University, Dongchuan Rd. 800