Team:UPNAvarra Spain/Results

Menu

Results

Wet Lab
For our first involvement in iGEM competition, we constructed new expression vectors, containing sensitive promoters, an RBS, different chromoproteins and a transcription terminator. We used different sequences acting as sensitive promoters to detect the presence of nitrate and heavy metals, like cooper, mercury and cadmium (Figure 1). Those plasmids, transformed in E. coli cells, make up our system of bacterial biosensors.





Figure 1. Construction of nitrate and heavy metal biosensors vectors.


For each construction different inductions were performed in order to standardize and optimize the expression of the chromoproteins. As it can be seen in the pictures, we obtained a gradual rise in the expression of the chromoproteins according to the concentration of each contaminant (Figure 2).





Figure 2. Chromoprotein expression levels under increasing concentrations of each contaminant.


It has to be noted that the results of the cadmium sensor showed no differences at first sight between the different concentrations (Figure 2). This might be due to the CadA weak promoter strength or most likely, the need of other regulatory elements. So, for the usage of this sensor, it has to be optimized.

Mathematical Model
The mathematical model generated from the experimental results obtained in the lab might look like a classical regression model, but it is not. Normally, regression models are used to predict results from an a priori unknown set of data. Hence, the underlying assumption is that the data is differentiable (predictable), and the quest is to produce a mathematical model able to do it.

In the present experiments, our quest is rather the reverse. We intend to proof that the data produced in the experiments is in fact learnable. This made our focus not on making an as-complex-as-possible mathematical model, able to predict future data. Instead, we intend to prove that the (imaging) data we have gathered in the lab is in fact learnable by simple regression models.

This somehow non-standard idea has driven our experiments. We have, in the data analysis part of the process, attempted to learn the data we have gathered using a simple regression model. We have opted out by a standard Least-Square error (linear) regression model, which has been run on the dataset obtained in the imaging part. This dataset consists of the average RGB color in the colored part of the pellets used at different concentrations of each heavy metal or nitrates.

For each color, we have subselected the channels that we are interest for the problem. That is:
  • Nitrate (blue): Red channel
  • Nitrate (yellow): Blue channel
  • Copper: Red and Green channels
  • Mercury: Red and Green channels
  • Cadmium: Red, Green and Blue channel


Hence, the problem shifts from 1D regression to 3D regression, depending on the material we are working with at each specific experiment. For each of the experiments, 5 or 6 samples have been used:




Figure 3. Modeling a Nitrate biosensor. A) Input data; B) Regression model.




Figure 4. Modeling a Nitrate biosensor. A) Input data; B) Regression model.




Figure 5. Modeling a Cooper biosensor. A) Input data; B) Regression model.




Figure 6. Modeling a Mercury biosensor. A) Input data; B) Regression model.




Figure 7. Modeling a Cadmium biosensor. A) Input data; B) Regression model.


It can be seen how the data is easily learnable by linear regression models in the case of the Nitrate (Figure 3 and 4). Also, the accuracy in learning the Copper (Figure 5) and Mercury (Figure 6) models is rather good (it is to be noted that the figure above is a 1D representation of a 2D models, hence not displaying the linearity of the model itself). The model for Cadmium (Figure 7), however, was not successful. As it can be seen from the images, the changes in the coloring w.r.t. the concentration are hardly perceivable. Note also that the concentration learn is the squared root of the actual concentrations, in order to better distribute the measures over the testing range.

As it can be seen from the models displayed above, the error in the model training is rather small. Specifically, the average error is the following:

Contaminant Avg. Error Testing concentration range
Nitrate (Blue) 0.18 [0,4]
Nitrate (Yellow) 0.17 [0,4]
Cooper 1.88 [0,22.36]
Mercury 0.96 [0,12]
Cadmium 0.38 [0,10]
Table 1. Error in the model training .


We can observe that the data is hence learnable from both the perspective of the visual model (Figures 3-7) and from the results in the error measurement (Table 1). Even in the case of the Cadmium the error seems to be low, although that is misleading, since the data is clearly not learnable from the visual display in Figure 7.

Note that the data could be better fit if using other types of regression, specifically higher order regression. However, this would hamper our main point here, which is proving that the data is learnable by simple means.

Machine Learning Model
We have proved that data can be learnable by a simple linear regression model (OLS), but to obtain more accurate predicted values, we should train our model using more data to be more precise. These extra data are extracted from images taken from pellets induced by a known concentration of each specific substance.

We will consider every channel for each specific substance, since we don´t have a great amount of data and we consider that a dimensionality reduction is not necessary in order to reduce the computational complexity of our model. Nevertheless, in order to present a complete study, we will apply a dimensionality reduction (Principal Components Analysis) in the next section.

In order to test our model, we have applied a cross validation. We divided our data in n partitions, each one with 25% of the data. Our training data will be n-1 partitions whereas one will be our test data. Every partition is selected once to be the test data so the training and testing is done 4 times.

For each specific substance we have collected data from 4 different samplings, under different imaging conditions, of the same concentration. Hence, the number of data is 24 for Nitrate (Blue), Nitrate (Yellow) and Mercury, with 6 different concentrations, and 20 for Cooper with 5 different concentrations:



Figure 8. Modeling a Nitrate biosensor.


Figure 9. Modeling a Nitrate biosensor.


Figure 10. Modeling a Cooper biosensor.


Figure 11. Modeling a Mercury biosensor.


Applying our model the R2 (mean of the 4 samplings R2) and the average error (difference between the actual value and the predicted value) are the following:

Contaminant R2 Avg. Error Testing concentration range
Nitrate (Blue) 0.930 0.289 [0,4]
Nitrate (Yellow) 0.949 0.236 [0,4]
Cooper 0.833 2.520 [0,22.36]
Mercury 0.760 2.137 [0,15.5]
Table 2. Error in the model during cross validation.


We can conclude that providing our model with more data, we could predict the specific substance concentration obtaining reliable results. In our case, the results in the error measurement (Table 2) are due to the lack of more concentration values uniformly distributed. Nevertheless, we demonstrated that the data is easily learnable by linear regression models (Figures 8-11), obtaining accurate predicted values.

Conclusion
As a general conclusion, we have made our point in illustrating how the data gathered in the experiment is learnable, hence the product (colored material) is sensitive to (and gradual w.r.t.) the concentration in water. This means that our biological system can be used to detect contamination in water, as well as to measure the severity of such contamination.


Contact us


equipo.igem@unavarra.es

Avenida de Pamplona 123, Mutilva
31192 Navarra, España

Follow us on


Our Sponsors