Team:CSMU Taiwan/Model

Stack Multipurpose HTML Template

     Click on the buttom to view the corresponding section.

I. Target protein modeling

1. Sequence Searching

Before beginning the experiment, we searched nucleoprotein information from database of the NCBI. After comparing the different influenza virus types of epidemic every year and the severity of outbreak, the following two proteins were chosen as target candidate:

Nucleocapsid protein of influenza A virus (A/Michigan/297/2017(H1N1))
>Accession: AUH28512.1

Nucleoprotein of influenza B virus (B/Colorado/16/2017)
>Accession: ASK82205.1

 ( *Hereinafter referred to as NPA and NPB )

An effective biomarker should have the characteristic of being distinguished from other one, the structural model of the target protein can be useful for analyzing the difference between two target proteins and providing the visible evidence.

2. Sequence Alignment

BLAST, the abbreviation of basic local alignment search tool, is the most popular algorithm to align sequence in the world. It can used for searching similar sequences from the NCBI database. Also, it can align two sequence to get their similarity.

We ran the Blastp program, which is designed for peptide sequence alignment by BLAST algorithm. The result showed the identity of the two sequence is 38%. ( see Fig.1 ) It meant that there are enough variations between NPA and NPB on the primary structure, but we still need to check their tertiary structures that the aptamers actually interact with. Therefore, the next step is to get the 3D structural coordinates for alignment.

“figure 1
Figure 1. The report of the sequence alignment between two chosen proteins.




3. Homology Structure modeling

After searching the information of NPA and NPB from RCSB Protein Data Bank, we were conscious of the fact that there was no crystal structure of the two proteins. Thus, the best way is to build the models from existing PDB data. The method we used is called homology modeling, which means the target structure is obtained by simulating from the crystal structure of similar protein in database.

The SWISS-MODEL online tool was used to complete the task by the following procedure:


  1. Upload the peptide sequence file of target protein
  2. Search for protein that can match the target sequence.
  3. Choose the one that having most high identity score as template.
  4. Build structural model and download the PDB file.

Document 1. The report of the homology modeling of NPA.
Document 2. The report of the homology modeling of NPB.


4. Structure Alignment

After the PDB files of target proteins were obtained, we used SWISS PDB viewer (or SPDBV) to analyze the structural difference between them and make result visible. SPDBV can match two PDB structures by “magic fit” function and compute the root-mean-square deviation (or RMSD), which means the average distance between the atoms of two tertiary structures. The equation is presented below:

 RMSD
  n= pairs number of equivalent atoms : di= the distance between the two atoms in the ith pair  ( see Ref. 1 )



The NPA and NPB structures were aligned by backbone atoms, and the computed RMSD value of them was 36.41 Å. With this model, we can ensure that the two proteins differ greatly, and they can be effective targets in our experiment. Next step, the sequences of NPA and NPB were inserted into the expression system of E.coli for recombinant proteins production.

“Structure alignment gif
Figure 2. The model of structure alignment by Swiss PDB viewer.

II. Optimization of Protein Sequence

After series of preprocessing, we started producing NPA and NPB. However, the protein productivity is always low and cannot supply the use of SELEX. To solve this problem, the DNA sequences were optimized by online tools. As a result, the protein yield increased significantly. The description below is the method we used for optimization.

Because the growth of E.coli was normal, we suspected the problem may be the sequence inserted to plasmid. The difference of codon usage between eukaryote and prokaryote or the signal peptide effect might cause the low protein productivity.

As an online tool used to optimize the sequence, ATGme server can replace the rare codon into commonly used one according the expression system. This tool replaced the rare codon in the target protein sequence. Presented in Fig.3 and Fig.4, most of rare codons were replaced but the peptide sequence stayed the same. In addition, ATGme is also used to examine unexpected restriction site. The PstI sites in both NPA and NPB sequences were observed, and the codon containing PstI site were replaced to prevent that the restriction enzyme reaction affects the protein expression. After this procedure, we could ensure the codon won’t be the block of protein expression.

Figure 3

Figure 3. The codon optimization report of NPA and NPB.
(codon usage:  orange < 10% ;  red <5% )

Figure 4

Figure 4. The codon optimization report of NPA and NPB.
( codon usage:  orange < 10% ;  red <5% )



Next, SignalP-5.0 server was used to examine the existing signal peptide in the target protein sequence. Signal peptide, which can mark the secretory protein and help cells to transport the protein to correct position, might affect the recombinant protein generation and purification. SignalP-5.0 server could help us to analysis peptide sequence and compute the existence probability of signal protein.

The optimized DNA sequences were translated to peptide sequence by translate tool from ExPASy and submitted to SignalP-5.0 to analysis the probability of peptide sequence existing. You can see the result presented in Fig.6, the “OTHER” group in chart means no signal peptide. Luckily, the probabilities of no signal peptide in NPA and NPB were 0.9988% and 0.9996% respectively.

After that, the protein production was restarted with the optimized sequence, and the improvement of productivity is presented in Result.

Figure 5
Figure 5. The signal peptide analysis report of NPA peptide sequences.
( Sec/SPI: Sec signal peptide; CS: the cleavage site; OTHER: the probability that the sequence does not have any kind of signal peptide.)

Figure 6
Figure 6. The signal peptide analysis report of NPB peptide sequences.
( Sec/SPI: Sec signal peptide; CS: the cleavage site; OTHER: the probability that the sequence does not have any kind of signal peptide. )

III. Aptamer 3D Structure Modeling

After 6 rounds of SELEX, the selected aptamers were sent to MISSION BIOTECH Inc. for sequencing. The sequencing results could be used to analysis by BLAST and build structural models.

1. BLAST Analysis

There were four types of aptamer that sent for sequencing, including aptamer of NPA (or AptNPA), aptamer of NPB (or AptNPB), aptamer of HA1 (or AptHA1), and aptamer of HA3(or AptHA3). We will receive about 8-10 sequences as result in each type after sequencing.

These sequences were analyzed by BLAST by following method: First, the forward and reverse primers of aptamers were marked for finding out the aptamer sequence. Then, the length of all aptamers was examined for striking out abnormal aptamers. The remained aptamers became templates of structure modeling.

2. DNA Secondary Structure Modeling

The aptamer sequences were sent to Mfold server for simulating the 2D structure of DNA and RNA at specific temperature, and ion concentration.

Although Mfold was suitable for both DNA and RNA, we still need to turn the DNA sequence into RNA form due to the online tool using next step for getting tertiary structure file only provide RNA prediction service.

The DNA strand was “transformed” into RNA form by Nucleic Acid Converter. After inputting the sequence and setting the condition of refolding, several models generated and the one with lowest Gibbs free energy was picked out as the most stable structure. The dot-bracket format of the best structure model would be applied in the next step.





3. DNA Tertiary Structure Modeling

After simulating the 2D structure of aptamer, we could use the sequence in dot-bracket format to build a 3D structure. The result of tertiary structure prediction would be a series of 3D structure coordinates in PDB format, and the PDB file could be used to get the visible aptamer 3D structure by SPDBV.

In the beginning, the FARFAR program of Rossie web server was applied for 3D modeling, but the server had sequence uploading limitation of 32 nucleotides and our aptamer were 87 nucleotides long, obviously exceeding the limitation.

So, we asked other iGEM teams for suggestion, and got the information of other DNA structural model tools. The RNAcomposer server was chosen to complete this task. This system operated on the RNA FRABASE database and could fully automated predict the RNA 3D structure. But it could only predict the RNA structure, it’s why we need to transformed the target sequence.

At last, 3D structures of AptNPA and AptHA3 were obtained, and using for aptamer-protein docking to demonstrate how they binding to each other.

Figure 9 Figure 9. The 3D structure homology model of AptNPA. Figure 10 Figure 10. The 3D structure homology model of AptHA3.

IV. PCR Condition Modeling

In the experiment of SELEX, the PCR product were analyzed after every round of SELEX. It was been found that there was many by-products generated in PCR. The sequence of by-product was always longer than 87nts and would become longer after every PCR turn. To make things worse, by-products seemed to replace the products in the range of 1-2 rounds of SELEX and cause the disappearance of normal aptamer.

To understand what happened in PCR, series of experiment was designed and conducted to build a model that could be guideline in the SELEX. Many conditions of variables had been tested and summarized into the following results.

Figure 11
Figure 11. The electrophoresis result of the random sequence diversity, template concentration, and dNTP concentration test.

You can see the result of DNA electrophoresis after PCR presented in Fig.11. It was the test of three condition: template concentration, dNTP concentration, and sequence diversity.


  1. The collection in the “wash” step of SELEX was defined as the low diversity random sequence, and the “probe” (raw material of aptamer) of SELEX was defined as high diversity random sequence.
  2. The DNA samples were diluted in 5, 25, and 125 times respectively for testing of template concentration effect.
  3. Adding normal amount (see experiment Protocol) and two times of dNTP (test group) to test the effect of dNTP concentration.



First, it was observed that there were more obvious by-product bands in the “probe group” than in “wash group.” Second, the signal of by-product decreased with concentration of template, but the amount of dNTP seems having no effect on it.
We suspected that the random sequence was the reason of by-product generation. The random sequence cannot be controlled, so we tried to find out other condition that causing by-product generation. In this result, template concentration seemed to be a good target.

Figure 12
Figure 12. The electrophoresis result of template concentration test.

As a result showed in Fig.12, the effect by template concentration was tested further. When template amount under 57 ng, there were no second bands except the 87 bp ones; when template concentration is up to 114 ng, the second band appeared; when template concentration is up to 228 ng, the dark ladder appeared and original band disappeared.

This result proved that the template concentration is an important factor that give rise to by-product generation. In addition, the original band at 87 bp would be replace by high expression of by-product.

We suspected that the key of product replacement might be the high expression of by-product. To preserve the normal aptamer, avoiding the by-product generation seems to be an important factor. Controlling the template concentration is useful, but we still want to know if other condition will cause the similar effect.

Many other variables were tested and the PCR cycle number was found to be another effective condition to control the by-product generation.

Figure 13
Figure 13. The electrophoresis result of PCR cycle number test.

As the result showed in Fig.13, the same sample were separated into three groups, and the three groups were run the PCR in different cycles respectively, 10 cycles, 15 cycles and 20 cycles. It shows that the second band could not be found in the group of 10 cycles; and slight second band appeared in the group of 15 cycles; and the ladder appeared in the group of 20 cycles. The result indicated that by-product generation become more significant with PCR cycles increasing.

After the condition test above, the phenomenon that is observed can list as the following description:

  1. The template concentration and PCR cycle number affect the production of by-product significantly.
  2. The diversity of random sequence may affect the production of by-product.
  3. The by-product can replace original aptamer ssDNA and becomes longer in PCR process.


Trying to realize the reason for these phenomena, we searched related research papers and found some clues. The Fabian Tolle’s research team ( see Ref. 2 ) proved that the random sequence in probe of SELEX may exist primer-like sequence, and the sequence can interact with primer in PCR process leading incorrect DNA copies. This phenomenon may become worse after wrong copies of sequence appearing.

Afterward, to get professional advice in the SELEX research field, we visited Dr. Yuh-Ling Chen at National Cheng Kung University and got advice from her experience in SELEX operating. What is worth mentioning that she suggested the cycle number of PCR should be under 15 cycles ( 30 cycles in our original method ).

In conclusion, our findings in the PCR condition test conformed with the research paper and the advice from Dr. Yuh-Ling Chen. According to the model built-in series of experiment, there are some adjustments in PCR protocol that listing below:


  1. The template concentration should be controlled under standard. But the DNA concentration in elution product is too low to quantitate. As alternative plan, we adding 1 µl elution product in 20 µl PCR reaction mixture as new formula. (original formula is 5 µl of elution product in 20 µl mixture)
  2. The PCR cycle number should be controlled under 15 cycles.
  3. If the PCR product is too few to run the next SELEX, take 3 ul of the product to run PCR again; if the by-product appears, this PCR product should be abandoned, and take the elution product to restart PCR in lower template concentration or cycle number.

V. Aptamer-Protein Docking

After a series of ELISA tests, the most effective aptamers were picked out and given name as the AptNPA-4 and AptHA3-4. They would be used to build docking structure models of aptamer-protein binding.

NPdock and GRAMM

NPdock server was applied to build docking model. It combines process including GRAMM for global macromolecular docking, Scoring with a statistical potential, clustering of best-score structures, and local refinement.

The reason for using NPdock was the feature of GRAMM. GRAMM, the Global RAnge Molecular Matching, is a program that can predict the structure of a complex by only the atomic coordinates of two molecules. The principle of GRAMM is to smooth the intermolecular energy function by changing the range of the atom-atom potentials and gives the features of a complex. ( see Ref. 3 )

Because of the limit of equipment and time, we could not define the active residues site in aptamer-protein binding by experiment. Therefore, we abandoned those tools that need to offer information about active residues and passive residues. The feature that simulating by only structure coordinates made NPdock the most suitable tool for us.

Structural Information Preparation

To start modeling, obtaining the structure information of aptamer and protein is essential. The DNA tertiary structure had been simulated and exported in PDB form as described above. As for protein structures, NPA structure had been simulated at earlier period of experiment, and the crystal structure of HA3 could be found in Protein Data Bank.

Although everything seemed to be ready, we still faced a problem: NPdock server had an uploading limitation of 10000 atoms for PDB file. There are 10974 atoms in the structure of NPA, so its PDB file cannot be uploaded.

To fit the rule of NPdock server, a method was applied to reduce the structure of NPA under 10000 atoms. The template of NPA homology structure was used as substitute (its structure is under 10000 atoms) to docking with AptNPA-4. Then, we referred to the docking result to modify the position information in NPA PDB file. Some amino acids that located at the margin of sequence and did not react with AptNPA-4 were deleted until the whole structure was under 10000 atoms.

Docking Structure Modeling

Finally, the PDB files were completed, then our request was submitted to NPdock. After about 20 hours of waiting, the best scored structure was refined from 20000 decoys generate with GRAMM. With this result, we can analysis the interaction between atoms in docking model, and optimize our aptamer function by sequence mutagenesis or computational improvement in the future.

Figure 14
Figure 14. Docking structural model of AptNPA with NPA. (visualization by iCn3D)
Figure 15
Figure 15. Docking structural model of AptHA3 with HA3. (visualization by iCn3D)






Reference

  1. Irina Kufareva and Ruben Abagyan. Methods of protein structure comparison. PMC, 2015, Feb 9. doi: 10.1007/978-1-61779-588-6_10
  2. Fabian Tolle, Julian Wilke, Jesper Wengel, Günter Mayer. By-Product Formation in Repetitive PCR Amplification of DNA Libraries during SELEX. PLoS ONE, December 9, 2014. doi:org/10.1371/journal.pone.0114693
  3. GRAMM-Vakser Lab: GRAMM v1.03



Model Tools

  1. NCBI
    The database that provides access to biomedical and genomic information
  2. BLAST
    An algorithm for sequence alignment
  3. RCSB Protein Data Bank
    The database for the 3D structural data of biological molecules
  4. SWISS-MODEL
    A fully automated protein structure homology-modelling server
  5. SWISS PDB Viewer
    A software for multi-platform protein structure visualization
  6. ATGme
    An application for rare codon identification
  7. SignalP
    A server for signal peptide prediction
  8. Mfold
    The server for optimal and suboptimal secondary structure prediction for RNA or DNA
  9. Nucleic Acid Converter
    A server that can convert DNA and mRNA sequences and find restriction sites
  10. Rossie FARFAR Protocol
    The main RNA structure modeling algorithm in rossetta
  11. RNAcomposer
    A fully automated RNA structure modeling server
  12. NPdock
    A web server for modeling of RNA-protein and DNA- protein complex structures
  13. iCn3D
    A WebGL-based 3D structure viewing program for macromolecular structures