Team:MADRID UCM/Model

Modeling bueno – iGem Madrid

MODELING

This webpage is made exclusively to apply to the modeling award. the information contained here are just the answers to the key questions for good modeling project.

However, our aptamer folding software have much more characteristics than the ones explained here. If you find this content interesting, we encourage you to dive deeper in our aptamer folding software in the dedicated webpage.

1 How impressive is the modeling?

One of the main challenges of the SELEX process is the selection of the initial library of aptamer molecules and the ones that will take part in every round. Traditional approaches - experimental ones, such as crystallography or nuclear magnetic resonance - are not suitable due to the enormous quantity of time and money they require. That is why we decided to use Computational Folding, a technique that is becoming more and more important every day. This is still challenging, due to the high number of possible combinations and the fact that it is currently very computationally demanding to simulate three-dimensional structures.
There currently exist some softwares, such as AlphaFold (only for protein folding), Mfold (which only computes the secondary structure) and Rosetta (that gives a folding for a given aptamer given, but does not select the best one). But none of them can compute the 3D modeling correctly.

We therefore propose to use an artificial-intelligence algorithm, mixing ideas from these three softwares, in order to develop a tool, written in Python and scalable to other biomedical molecules, that can correctly perform the folding of aptamers. The concept is simple: we use a generative adversarial network (GAN) to improve the structure of aptamers. The algorithm uses two neuronal networks that act as student and teacher - the student learns to make aptamers, and the teacher gives feedback to improve the folding aptamer creation.

2 Did the model help the team understand a part, device, or system?

Yes, the model helped the team to understand the aptamer folding, the involucrated parts in these 3D structures and the docking of the aptamer with the target protein.

This model enables the reduction of steps in the SELEX process by improving the algorithms used, so the team could better understand the SELEX process and the initial steps, how to improve them and how to apply computational tools in biological problems.

The model also aids understanding of artificial-intelligence algorithms like GAN used in aptamer folding, as well as existing software like Rosetta or ViennaRNA. The team learnt how to work the Rosetta scoring, the low-energy calculation of proteins, computer representation of DNA and RNA, the use and manipulation of formats like FASTA, PDBs or secondary structures in a computer. The team studied the application of optimization tools such as terminal use or thread use in order to improve a code and its computational time.

Finally, the team understood the structure of aptamers and their important parts, how to improve them computationally in order to simulate the reality of the aptamers in nature, and how to model them and use this knowledge in the achievement of their objectives or challenges.

3 Did the team use measurements of a part, device, or system to develop the model?

Yes, we used the Rosetta scoring, the most commonly used software in protein folding and its evaluation, in order to evaluate our results. We evaluate our results through database creation, through GAN training of parts and through the testing of our Networks. These checks show us the measurements and the improvement of the aptamers in all the parts of the process. The obtained measurements and the Rosetta scoring is based on the free-energy of the three-dimensional folding, so the smaller the better (more stability).
We also obtained measurements of time, in order to check that our algorithm spent less time than the previous algorithms and we use, for the calculation times, Python tools that measure the times and compare these times between codes.

4 Does the modeling approach provide a good example for others?

Yes, because our software, being in Python, is easily accessible to the whole scientific and programming community. It permits database creation in several operating systems, such as macOS and Linux, as well as use of the GAN parts on computers with Python dependencies. We also added the database to the team’s GitHub i to help users with no computational power and to eliminate the compilation of the database creation.
The GAN is easier to compute on computers with Python dependencies, but only with the installation of Rosetta (or pyrosetta open-source) and ViennaRNA (open-source). The computational power needed is not so high and can be compiled in computers with low GAN memory because it only performs CNN in data type formats (no images for example), and each entry in the database is only constituted by a sequence, a matrix with the degrees of the nucleotides (5 per nucleotide) and the scoring (a number).
The uploaded archive is totally commented and each part is understandable in order to allow future programmers to change the desired parts and to improve the code.

In essence, the code is simply executed; all the component parts are understandable and can be reprogrammed easily. The use of the software resolves an existing problem in the development and improvement of aptamers, allows us to minimise the SELEX process and so reduce the time exponentially compared with experimental techniques.