<!DOCTYPE html>

Modeling Metal<span>-</span>hunters— Brief description

Modeling our Project

Basic Idea

Here we provide a simple but efficient mathematical model to interpret our data. In this model, we tried to exclude the influence of cross interactions and get high accurate result.

Assumption

Any abstract model requires proper assumptions to approximate real system. Here are our basic assumptions.

Fluorence intensity is proportional to the binding efficiency of ion with vector
Any responsing curve fits Hill equation.
When different kinds of ion exist in the detection system, they will not influence others’ responsing behaviour.

The first assumption is natural. Consider the whole responsing process as a computation module which consists of input, processing layer, and output. Therefore, the input is the number of ions. Processing layer is to produce fluorence protein when the ions successfully bound with vector. Output is the number of fluorence protein. The only thing affect the output is the probability of ion binding, which is determined by input. In average, we have the following:

I \propto H (x)

$I \propto H(x)$
Here

I

$I$ refers to intensity and

H (x)

$H(x)$ refers to the probability of binding.

x

$x$ stands for the concentration of ion.

The second assumption helps us to calculate the probability in assumption one, which is based on a simplified but well-established biochemical model. In short, Hill equation reflects the binding of ligands to macromolecules, as a function of the ligand concentration. It can be written as:

P_{a c t i v e} = H_{a} (x) = \frac{x^{n}}{k_{a}^{n} + x^{n}} P_{i n h i b i t i v e} = H_{i} (x) = \frac{k_{i}^{n}}{k_{i}^{n} + x^{n}}

$P_{active}=H_a(x)=\frac{x^n}{k_a^n+x^n}\\ P_{inhibitive}=H_i(x)=\frac{k_i^n}{k_i^n+x^n}$

To explain the physical meaning of those parameters, here we briefly introduce the model. Suppose an “on” state of processing layer requires $n$ binding ions, and corresponding chemical reaction can be written as:

P + n I ⇌ n P I

$P+nI \rightleftharpoons nPI$
And the dissociation constant can be expressed as:

K_{d} = \frac{[P] [I]^{n}}{[n P I]}

$K_d=\frac{[P][I]^n}{[nPI]}$
What we focus on is the probability of binding, which can be calculated by the proportion of

[n P I]

$[nPI]$ over all states. To simplify, we only consider two states: all binding(

[P]

$[P]$ ) and no binding(

[n P I]

$[nPI]$ ).Therefore the probability is:

P = \frac{[n P I]}{[P] + [n P I]} = \frac{\frac{[P] [I]^{n}}{K_{d}}}{[P] + \frac{[P] [I]^{n}}{K_{d}}} = \frac{[I]^{n}}{K_{d} + [I]^{n}} = \frac{[I]^{n}}{k_{a}^{n} + [I]^{n}}

$P=\frac{[nPI]}{[P]+[nPI]}\\ =\frac{\frac{[P][I]^n}{K_d}}{[P]+\frac{[P][I]^n}{K_d}} \\=\frac{[I]^n}{K_d+[I]^n} \\=\frac{[I]^n}{k_a^n+[I]^n}$
Since

K_{d}

$K_d$ is a positive constant, it can be rewritten in power form:

K_{d} = k_{a}^{n}

$K_d=k_a^n$ . So the probability is the function of ion concentration. This is exactly the second assumption.

By integrating assumption 1 and 2, it is easy to find out the function of intensity:

I = I_{m a x} \frac{[I]^{n}}{k_{a}^{n} + [I]^{n}}

$I=I_{max}\frac{[I]^n}{k_a^n+[I]^n}$

I_{m a x}

$I_{max}$ is the maximum intensity the system can get.

The third assumption ensures a linear model system. Since each kind of ion has its unique responsing curve to each detector, the final intensity will be the addition of intensity from all kinds of ion. This assumption ensure that the intensity for each ion will only depend on the concentration of their own and will not be influenced by other ions. Therefore, the intensity for $i^{th}$ detector should be:

I_{i} = \sum_{j} (I_{m a x})_{i j} H_{i j} (x_{j})

$I_i=\sum_j(I_{max})_{ij}H_{ij}(x_j)$
Here

H_{i j}

$H_{ij}$ is the responsing curve of

j^{t h}

$j^{th}$ ion to

i^{t h}

$i^{th}$ vector.

Model

Based on the analysis in assumption, the model system is clear. For $i^{th}$ the detector should be:

I_{i} = \sum_{j} (I_{m a x})_{i j} H_{i j} (x_{j})

$I_i=\sum_j(I_{max})_{ij}H_{ij}(x_j)$
Here

H_{i j}

$H_{ij}$ is the responsing curve of

j^{t h}

$j^{th}$ ion to

i^{t h}

$i^{th}$ vector.

To analyze our data, we should first determine the all coeffcients in the expression and then use those standard functions to calculate the actual concentrations in an unknown sample.

Determine Coefficients

During the experiment, we have tested how the detectors response to ions by a concentration gradient. To couple the experiment with model, it is necessary to do some transformation on fitting equation.

Generally, $I_j=(I_{max})H(x)$ can be rewritten as:

\log (\frac{I}{I_{m a x} - I}) = n \log (k_{a}) - n \log (I)

$\log(\frac{I}{I_{max}-I})=n\log(k_a)-n\log(I)$
In this form, we can easily get a linear relation between our input concerntration and output. The question is how to find out

I_{m a x}

$I_{max}$ in this equation because this value determine the reprocessed data of output. Another question is, due to the large scale of our data, to ease the workload of proceesing such data. To meet the needs of these two question, define the ratio between output data and the maximum of all output data as the standard output. As following shows:

o u t p u t = I_{1}, I_{2}, \cdot \cdot \cdot, I_{n}

${output}={I_1,I_2,···,I_n}$

I_{o u t p u t}^{'} = {I_{1}^{'}, I_{2}^{'}, \cdot \cdot \cdot, I_{n}^{'}} w h i c h I_{i}^{'} = \frac{I_{i}}{max I_{o u t p u t}}

$I'_{output}=\{I_1',I_2',···,I_n'\}\quad which\quad I_i'=\frac{I_i}{\max{I_{output}}}$

The elements in $I'_{output}$ fit following equation:

\log \frac{I_{i}^{'} max I_{o u t p u t}}{I_{m a x} - I_{i}^{'} max I_{o u t p u t}} = n \log x_{i} - n \log k

$\log{\frac{{I_i'}\max{I_{output}}}{I_{max}-{I_i'}\max{I_{output}}}}=n\log{x_i}-n\log{k}$
We define the value of

\frac{I_{m a x}}{max I_{o u t p u t}}

$\frac{I_{max}}{\max{I_{output}}}$ as a parameter

P I_{m a x}

$PI_{max}$ . So the equation we actually simulate is following one:

\log \frac{y_{i}^{'}}{P I_{m a x} - y_{i}^{'}} = n \log x_{i} - n \log k

$\log{\frac{y_i'}{PI_{max}-y_i'}}=n\log{x_i}-n\log{k}$
Use Mathematica, the following code is shown:

outputdata = {Output1, Output2, Output3, Output4, Output5, Output6, Output7};
Processeddata = outputdata/Max[outputdata] // N;
data' = {{Log10[10^(-10)], Processeddata[[1]]}, {Log10[10^(-9)], 
    Processeddata[[2]]}, {Log10[10^(-8)], 
    Processeddata[[3]]}, {Log10[10^(-7)], 
    Processeddata[[4]]}, {Log10[10^(-6)], 
    Processeddata[[5]]}, {Log10[10^(-5)], 
    Processeddata[[6]]}, {Log10[10^(-4)], Processeddata[[7]]}};
data = {{data'[[1, 1]], data'[[1, 2]]}, {data'[[2, 1]], 
    data'[[2, 2]]}, {data'[[3, 1]], data'[[3, 2]]}, {data'[[4, 1]], 
    data'[[4, 2]]}, {data'[[5, 1]], data'[[5, 2]]}, {data'[[6, 1]], 
    data'[[6, 2]]}, {data'[[7, 1]], data'[[7, 2]]}};
solu = Flatten[
   Solve[Log10[(y*PImax)/(1 - (y*PImax))] == n*x - n*logk, y]];
fitparameter = (FindFit[data, y /. solu, {PImax, logk, n}, x])
fit = y /. solu /. fitparameter;
Show[ListPlot[data, PlotStyle -> Red], Plot[fit, {x, -11, 0}], 
 PlotRange -> {0, 1}]

Data Analysis

In last section, we successfully got statistics of each detector. Now they will be used to analyze an unknown sample.

Denote the concentration of each ion in the sample is $X_n$ .

For each detector:

I_{i} = \sum_{j} (I_{m a x})_{i j} H_{i j} (X_{j}) f o r i = 1, 2, 3, \cdot \cdot \cdot, n

$I_i=\sum_j(I_{max})_{ij}H_{ij}(X_j) \\for\ i=1,2,3,···,n$
Which

{(I_{m a x})_{i j} H_{i j} (x)}

$\{(I_{max})_{ij}H_{ij}(x)\}$ has been determined for all

i, j

$i,j$ .

{I_{i}}

$\{I_i\}$ is the output of unknown sample in

i^{t h}

$i^{th}$ detector.

Now we have $n$ equation for $n$ variables, it should determined the value of all variables. But unfortunately, this is not a linear system and more importantly, the technique we used to get linear form in last section cannot be transplanted here. A general way to solve such an nonlinear system is so-called “Netwon Iteration Method”.

New Form

First rewrite the model as:

\sum_{j} (I_{m a x})_{i j} H_{i j} (X_{j}) - I_{i} = 0

$\sum_j(I_{max})_{ij}H_{ij}(X_j)-I_i=0$
Define:

F (X) = (\begin{matrix} F_{1} (X) \\ F_{2} (X) \\ \cdot \cdot \cdot \\ F_{n} (X) \end{matrix}); X = (\begin{matrix} x_{1} \\ x_{2} \\ \cdot \cdot \cdot \\ x_{n} \end{matrix}) F_{i} (X) = \sum_{j} (I_{m a x})_{i j} H_{i j} (x_{j}) - I_{i}, f o r i = 1, 2, 3, \cdot \cdot \cdot, n ∴ F (X) = 0

$F(X)=\left(\begin{array}{}F_1(X)\\F_2(X)\\···\\F_n(X)\end{array}\right);X=\left(\begin{array}{}x_1\\x_2\\···\\x_n\end{array}\right) \\ F_i(X)=\sum_j(I_{max})_{ij}H_{ij}(x_j)-I_i,for\ i=1,2,3,···,n \\\therefore F(X)=0$
Now calculate the Jacobian matrix:

J (X) = (\begin{matrix} \frac{\partial F_{1} (X)}{\partial x_{1}} & \frac{\partial F_{1} (X)}{\partial x_{1}} & \cdot \cdot \cdot & \frac{\partial F_{1} (X)}{\partial x_{n}} \\ \frac{\partial F_{2} (X)}{\partial x_{1}} & \frac{\partial F_{2} (X)}{\partial x_{1}} & \cdot \cdot \cdot & \frac{\partial F_{2} (X)}{\partial x_{n}} \\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot \\ \frac{\partial F_{n} (X)}{\partial x_{1}} & \frac{\partial F_{n} (X)}{\partial x_{1}} & \cdot \cdot \cdot & \frac{\partial F_{n} (X)}{\partial x_{n}} \end{matrix}) \frac{\partial F_{i} (X)}{\partial x_{j}} = \frac{\partial}{\partial x_{j}} (\sum_{j} (I_{m a x})_{i j} H_{i j} (x_{j}) - I_{i}) = (I_{m a x})_{i j} \frac{\partial}{\partial x_{j}} H_{i j} (x_{j})

$J(X)=\left(\begin{array}{}\frac{\partial F_1(X)}{\partial x_1}&\frac{\partial F_1(X)}{\partial x_1}&···&\frac{\partial F_1(X)}{\partial x_n}\\\frac{\partial F_2(X)}{\partial x_1}&\frac{\partial F_2(X)}{\partial x_1}&···&\frac{\partial F_2(X)}{\partial x_n}\\···&···&···&···\\\frac{\partial F_n(X)}{\partial x_1}&\frac{\partial F_n(X)}{\partial x_1}&···&\frac{\partial F_n(X)}{\partial x_n}\end{array}\right) \\ \frac{\partial F_i(X)}{\partial x_j}=\frac{\partial }{\partial x_j}(\sum_j(I_{max})_{ij}H_{ij}(x_j)-I_i)=(I_{max})_{ij}\frac{\partial }{\partial x_j}H_{ij}(x_j)$
Once Jacobian matrix is determined, take iteration:

x_{n + 1} = x_{n} - J^{- 1} (x_{n}) F (x_{n})

$x_{n+1}=x_n-J^{-1}(x_n)F(x_n)$
Theoretically we have

lim_{n \to + \infty} x_{n} = (\begin{matrix} X_{1} \\ X_{2} \\ \cdot \cdot \cdot \\ X_{n} \end{matrix})

$\\\lim_{n\to+\infty}x_n=\left(\begin{array}{}X_1\\X_2\\···\\X_n\end{array}\right)$

This is exactly the newton iteration method.

Notice

The iteration method does not always works well, which is depending on the property of iteration functions. For some nonlinear system, the approximation could be useful in a very ting neighborhood. Such neighborhood determines the flexibility of intial value choosing. If the initial value is choosen far from the solution and the solving system is bad, then the iteration will be deficient. Therefore, to choose proper position and proper inital value to start iteration algorithm is critial to final result.

How to choose initial value

To get a good initial value for iteration, a proper range of solution should be guessed. Based on the biological property of detectors, the cross talk between target ion and untarget ion should be low. In extreme, the response of a wonderful detector to untarget ion should be a constant. First get rid of all posible background, and then solve the equation only with target ion.

That means to solve $x^0_i$ in following:

(I_{m a x})_{i i} H_{i i} (x_{i}) = I_{i} - \sum_{i \neq j} (I_{m a x})_{i j}

$(I_{max})_{ii}H_{ii}(x_i)=I_i-\sum_{i \neq j}(I_{max})_{ij}$
For all

i = 1, 2, \cdot \cdot \cdot, n

$i=1,2,···,n$

Then we got initial value for iteration:

x^{0} = (\begin{matrix} x_{1}^{0} \\ x_{2}^{0} \\ \cdot \cdot \cdot \\ x_{n}^{0} \end{matrix})

$x^0=\left(\begin{array}{}x^0_1\\x^0_2\\···\\x^0_n\end{array}\right)$

How to choose data?

In our test, the sample can be diluted to 10X or even 100X according to its quality. Therefore, we can choose the data that benefits to the data processing. From analysis, we want to make a robust approximation, which means a small perturbation will not lead to a big change of result. So the region with large derivative should be avoided. One thing should be noticed that since the inputs in algorithm are the detector results and outputs are deduced from response curves, the derivative of the inverse functions of the curve, not response curve, should be considered. Therefore, we choose the detector results with biggest change as the “proper” data. To define the “Change”, here considers the difference between adjacent data points.

Example

Here we provide an example for 2 variables:

H[n_, k_, x_, V_] := V*x^n/(k^n + x^n)(*Hill Function*); 
DH[n_, k_, x_, V_] := (
 V*k^n n x^(-1 + n))/(k^n + x^
   n)^2(*Derivative of Hill Function*);
M = {{0.003,0.0007}}
(*Initial Value,Determine by "Solve[5*x^2/((10^-3)^2+x^2)\\[Equal]4.5,x];Solve[7*x^2/((10^-3.5)^2+x^2)\[Equal]6,x]"*);

For[i = 1, i < 10, i++, x1 = M[[i, 1]]; x2 = M[[i, 2]];
 g1 = H[2, 10^(-3), x1, 5] + H[2, 10^(-1), x2, 4] - 4.5;
 g2 = H[2, 10^(-2), x1, 3] + H[2, 10^(-3.5), x2, 7] - 6;
 {x1d, y1d} = {x1, x2} - 
   Inverse[{{DH[2, 10^(-3), x1, 5], 
       DH[2, 10^(-1), x2, 4]}, {DH[2, 10^(-2), x1, 3], 
       DH[2, 10^(-3.5), x2, 7]}}].{g1, g2}; 
 AppendTo[M, {x1d, y1d}]](*Newton Method*); Print[M[[10]]]

Result:{0.00299939,0.000679022}

Compared to original:{0.003,0.0007}

Good approximation.

Team:SBS NY/Model