Team:Freiburg/Labbook/In Silico

Lab Book: in silico

Goal of this subgroup was to find a real binding D-peptide ligand in silico. Therefore molecular dynamic simulations would be crucial and a docking program would provide the important scoring function. First a library out of existing helices should be docked as first try. In the next step a Genetic Algorithm should be implemented to ameliorate automated a given or random set of peptides.

Click on an item for a detailed description

Literature research: modeling methods

Molecular dynmaics simulaitons

To get an idea of what is possible with molecular dynamics (MD)) and what is doable an intense literature research was done. In this way two programs for MD were chosen to test them: VMD/QwikMD/NAMD and GROMACS.

Docking simulations

In the same way programs for docking have been figured out to test. They were Rosetta and AutoDock VINA.

Testing software

VMD/QwikMD/NAMD1

Visual Molecular Dynamics (VMD) is a graphical program for MD simulations. Due to the graphical user interface it was quite easy to run the first MD after a short orientation. VMD provides an explicite and an implicite water model where te explicite water model took much more time to be calculated. With this program all simulations were performed on our own machines. In this way we simulated our target PSMα3 starting from the crystal structure 6NIV. Different sets of parameters have been tried out: MD simulations over 0.1 ns have been performed with implicite and explicite water and at several temperatures (27°C, 134°C, 200°C). The computation times varied from several minutes up to a few hours.

VMD

Fig. 1: Screenshot of a VMD session

GROMACS2

GROMACS is a command line MD program which comes with plentiful options for the simulation. One not only have the choice between 15 force fields but also between 7 water models. Due to many options it is hard for beginners especially when you're not familiar with the command line but there are pretty nice tutorials which make the program accessible. For us this lysozymtutorial helped a lot with getting confident with GROMACS. Running a complete GROMACS MD requires many single steps. To simplify this we used bash scripts which executed all the commands consecutively.

#!/bin/bash

# This script runs the commands from the Lysozyme tutorial:
# http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html

name='5kgy_1'
ff='amber99sb-ildn-fme'
water='spce'
bt='dodecahedron' #triclinic, cubic, dodecahedron, octahedron

grep -v HOH ${name}.pdb > ${name}_clean.pdb
gmx pdb2gmx -f ${name}_clean.pdb -o ${name}_processed.gro -water $water -ff $ff -ignh
gmx editconf -f ${name}_processed.gro -o ${name}_newbox.gro -c -d 1.0 -bt $bt
gmx solvate -cp ${name}_newbox.gro -cs spc216.gro -o ${name}_solv.gro -p topol.top
gmx grompp -f ions.mdp -c ${name}_solv.gro -p topol.top -o ions.tpr
gmx genion -s ions.tpr -o ${name}_solv_ions.gro -p topol.top -pname NA -nname CL \
  -neutral <<eof
13
eof
gmx grompp -f minim.mdp -c ${name}_solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -gcom 4 -deffnm em
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -gcom 4 -deffnm nvt
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
gmx mdrun -gcom 4 -deffnm npt
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr
gmx mdrun -gcom 4 -deffnm md_0_1
gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center <<eof
1
0
eof
gmx trjconv -s md_0_1.tpr -f md_0_1_noPBC.xtc -o ${name}_MD.pdb <<eof
1
eof

Fig. 2: Bash script to perform a MD simulation of the pdb file 5kgy_1.

Our target PSMα3 has got an N-formylation what is implemented in the used amber99sb-ildn force field by default. So we had to do this before we were able to simulate our target.

How to implement a modified residue in the amber99sb-ildn force field?

  1. Copy the force field folder (in our case: amber99sb-ildn.ff) and „resresiduetypes.dat“ from the share/top directory in $GMXLIB (e. g. gromacs-2019.3/share/top) to your working directory.

  2. The following files have to be changed:

    • residuetypes.dat Add the new residues to the file within the existing pattern

    • amber99sb-ildn.ff/aminoacids.arn (C- or N-terminal residue) Add the new residues to the file within the existing pattern

    • amber99sb-ildn.ff/aminoacids.hdb Add the new residue as normal, N- and C-terminal AA and define its bond dihedral angels and bond types (2nd col)

      NFME    6       
      1       1       H       N       CN      CA
      1       1       HF      CN      N       OF
      1       5       HA      CA      N       C       CB
      2       6       HB      CB      CG      CA
      2       6       HG      CG      SD      CB
      3       4       HE      CE      SD      CG
      
    • amber99sb-ildn.ff/aminoacids.r2b Add the following line to the file

      FME    -     NFME  -     -
      
    • amber99sb-ildn.ff/aminoacids.rtp Now it becomes intersting: In this file some kind of partial charges are defined. Here is how it works: the first column contains the atom name how they appear in the pdb file. Thats why a consistent naming of all atoms is crucial. Some pdb files contain different namings and the MD simulation will break. The second column keeps the assigned atom types from the file „atomtypes.atp“. If you introduce a new atom type you have to add it to this file. The third column now contains the partial charges. The sum of all partial charges has to be a round number (e. g. -1, 0 or 1). The last column contains the atom numbers within a residue.

      [ NFME ]
       [ atoms ]
           N    N           -0.572522    1
           H    H            0.322879    2
          CA    CT          -0.017921    3
          CN    C            0.662279   18
          HF    H5           0.034579   19
          OF    O           -0.607721   20
          HA    H1           0.105079    4
          CB    CT           0.028048    5
         HB1    HC           0.02410     6
         HB2    HC           0.02410     7
          CG    CT           0.00180     8
         HG1    H1           0.04400     9
         HG2    H1           0.04400    10
          SD    S           -0.27370    11
          CE    CT          -0.05360    12
         HE1    H1           0.06840    13
         HE2    H1           0.06840    14
         HE3    H1           0.06840    15
           C    C            0.59730    16
           O    O           -0.56790    17
       [ bonds ]
           N    CN
           N     H
           N    CA
          CN    OF
          CN    HF
          CA    HA
          CA    CB
          CA     C
          CB   HB1
          CB   HB2
          CB    CG
          CG   HG1
          CG   HG2
          CG    SD
          SD    CE
          CE   HE1
          CE   HE2
          CE   HE3
           C     O
       [ impropers ]
          HF    CN    OF     N
          CA     H     N    CN
          CA    +N     C     O
          N     CA     C    +N    105.4       0.75       1 
      
    • amber99sb-ildn.ff/ffbonded.itpIn this file bond types are defined. For FMET we need 3 new ones:

        CN N          1    0.13350   410032.0 ; FME
        CN O          1    0.12290   476976.0 ; FME
        CN H          1    0.10800   307105.6 ; FME
      

Having the extended force field many different test MDs have been performed to examine the best set of parameters for our system.

Rosetta3

Rosetta is a very powerful docking program. It contains many useful tools including some secondary structure prediction tools. Before es docking can be run the molecules have to be relaxed. The docking itself is as two stage docking implemented where the first stage is a global docking. After this a refined docking is done to increase the binding energy. The placing of the ligand around the target is a random Monta Carlo movement. To yield a nice binding energy in this way many, many runs need to be done which requires a lot of computational power. Further Rosetta doesn't do it's calculations in any thermodynamic unit but in Roseatta Energy Units (REUs). This can't be translated to any common thermodynamic unit and it varies depending on the set of parameters. There are many of those so one can justify a lot but nonetheless electrostatic interactions are almost not treated.

AutoDock Vina4

AutoDock Vina is a quite simple docking program. It's based on empiric parameters and leads to very good results but without being computational expansive. To run Vina MGLTools was used to generate pdbqt files which contain some additional information required by Vina. Vina gets along with only 3 parameters which are exhaustiveness, flexibility and the gridbox which must be defined. Exhaustiveness defines the number seeds that will be used for independent docking runs thus the computational time scales proportional with the exhaustiveness. The default value for it is 8 which leads already to good results. While generating the pdbqt files from the pdb files flexibility of all bonds get set. Standard options are "rigid" without any rotatable bond and side chains flexible where only the backbone is kept rigid. With the gridbox a docking space is determined to increase the number of tested poses in a known binding pocket and ignores the rest of the target. It is set via the center point and the dimensions in X-, Y- and Z-direction. MGLTools provides a graphical tool for that purpose but this is kept very basic. That's why we coded two bash scripts which draw a gridbox automatically around the whole target and another one which draws respective pdb files to enable visualization in PyMOL. When drawing gridboxes keep in mind that the whole ligand has to fit into and the cartoon representation lacks of side chains.

flow diagram vina

Fig. 2: Steps of a regular simulation with AutoDock Vina

Decision

After trying the different simulation programs we had to chose one of each.

Molecular dynamics

Among the MD programs we decided us for GROMACS because of it's many parameters and a very good documentation. Further this program is predestined for automation which was important for us because we planned to perform many simulations.

Docking

The decision for a docking program was not easy because Rosetta promised to have many parameters and easy accessible from command line which is advantageous for our purpose. But it's still extreme computational expensive because one need many repetitions where as Vina is extreme fast. Intuitively Rosetta seemed to be a good choice but a little bit more research on literature convinced us to use Vina due to the meager implementation of electrostatics and much better performance.

Production simulations

Manual runs

For our manual simulations we implemented a powerful pipeline which enabled us to run many simulations on the cluster within a very short time.

main.sh - the pipeline
#!/bin/bash

Genetic Algorithm

In the same way programs for docking have been figured out to test. They were Rosetta and AutoDock VINA.

Verification of unsliced and sliced helices

MD for verification

50 ns MDs of random chosen safe and unsafe folders have been performed.

Results

To analysis the folding over time we plotted all peptides with ramachadran plots. With the help of this plots one can determine if the peptide leaves it's helical conformation or not.

Fig.1: Ramachadranplots of 13 safe and unsafe folders. The color indicates the timepoint of the MD: Black is the beginning and blue the End at 50 ns.

In the group of the 6 safe foders the avarage number of peptide bonds that break out the helical area is at 5.17 bonds/peptide. Among the 7 unsafe folder it is 5.86 bonds/peptide. This numbers might indicate a difference between both groups but is of no degree significant, also due to a low number of samples.