What's the impact?

Proteins have become a staple in the iGEM community, but there is very little iGEM teams can do to understand their protein’s atomic behaviour. We wanted to generate a quantitative way to allow other teams to characterize each amino acid of their proteins. This allows for more informed protein engineering and utilization through an understanding of the dynamics that accompany the protein.

Figure 1: An example of a dynamic protein model.


What did we quantify?

To assist the dynamic characterization of proteins by other teams, we looked to develop a methodology that allows for the calculation and aggregation of Brownian motion measurements for each amino acid in a sequence. Brownian motion refers to the erratic movement of particles in fluid. The Brownian motion measurement chosen was the Root Mean Square Fluctuation (RMSF) calculated in ten picosecond intervals for every atom of a protein. RMSF indicates the torsion and flexion associated with every atom of a protein. Therefore using RMSF is a great indicator of the flexibility of a protein and the changes it experiences within a realistic environment.

The RMSF data was calculated from a nanosecond Molecular Dynamic Simulation (MDS) completed within GROMACS, an industrial MDS software. This MDS supplies the data used in the characterization of the protein. Due to the computational origin of the data it can be calculated by teams without requiring excessive amounts of laboratory experience and equipment. Within GROMACS the RMSF values per atom are then clustered based on the amino acid in which they exist and then an average for the entire amino acid is taken over all atoms. This was done to ensure that the observable unit of measurement is on a level that is modifiable in the lab. These values are conducted every ten picoseconds in the software resulting in a hundred measurements per amino acid. These measurements can then be aggregated to form a unique time series for every amino acid. When every amino acid of the protein is plotted together it results in a dense block of time series representing the overall trend of the protein. Below is the trend generated to characterize the total dynamics of ModGIX.

Figure2: RMSF characterization of the ModGIX protein.

Above is a complete view of the movement attributed to every amino acid of the ModGIX protein. Using these a team can observe the upper and lower limit of their proteins overall dynamics. In the case of the protein above the upper limit of the bulk of the protein is a reasonable 0.3 nm which is indicative of an active protein buffering the dynamics of a simulation. The lower limit for this protein is around 0.05 nm which is extremely low indicating that the protein has incredible stability within its most stable regions. Having all of the curves present at the same time is incredibly useful, but for teams interested in specific amino acids may find assistance in using the single amino acid curves. The use of RMSF curves for smaller sample sizes of amino acids allows teams to characterize and identify amino acids that are believed to be problematic to the overall structure. Below is the dynamics of the 25th, 80th, and 90th amino acid of the 6GIX protein.

Figure 3: molecular dynamics of 25th, 80th, and 90th amino acid in the 6GIX protein.

These three amino acids demonstrate how the abstraction of singular amino acids can offer unique insight into the subtle changes experienced by the protein. The general methodology for obtaining this measurement are available here.



After characterizing a protein by its dynamics RMSF values, there are numerous possibilities for further analysis and modification. In our case, it was utilized in the development of ModGIX, our modified chlorophyll binding protein. ModGIX was developed through the use of functional Principal Component Analysis (fPCA), made possible by the functional properties of the RMSF curves. The results were then clustered using an expectation maximization algorithm, the clusters obtained allowed for the determination of amino acids that attributed to the highest variance from crystal structure. This analysis was made possible by the measurement of RMSF. Read more about the development of ModGIX here.

This measurement also allowed our team to have a metric for the stress exerted on the protein in our system, this allowed for our team to identify the amino acids responsible for the problematic flexibility. Through RMSF curves teams are able to characterize their proteins based on atomic movements, thereby generating meaningful representations of their proteins dynamics. Opening this door allows teams to make informed modifications of proteins and develop a nanoscale understanding of their system.

This measurement has been successfully conducted for the characterization of our 6GIX protein (BBa_K3114006) and our ModGIX protein (BBa_K3114006).


Lemkul J.A. (2018). "From Proteins to Perturbed Hamiltonians: A Suite of Tutorials for the GROMACS-2018 Molecular Simulation Package, v1.0" Living J. Comp. Mol. Sci. In Press.

Abraham M.J., van der Spoel D., Lindahl E., Hess B., and the GROMACS development team (2018). GROMACS User Manual version,

Palm, D. M., Agostini, A., Averesch, V., Girr, P., Werwie, M., Takahashi, S., . . . Paulsen, H. (2018). Chlorophyll a/b binding-specificity in water-soluble chlorophyll protein. Nature Plants,4(11), 920-929.

Páll, S., Abraham, M. J., Kutzner, C., Hess, B., Lindahl, E. (2015).Tackling exascale software challenges in molecular dynamics simulations with GROMACS. In: Solving Software Challenges for Exascale. Vol. 8759. Markidis, S., Laure, E. eds. Vol. 8759. . Springer Inter- national Publishing Switzerland London 3–27.