Goal of this subgroup was to find a real binding D-peptide ligand in silico. Therefore molecular dynamic simulations would be crucial and a docking program would provide the important scoring function. First a library out of existing helices should be docked as first try. In the next step a Genetic Algorithm should be implemented to ameliorate automated a given or random set of peptides.
Click on an item for a detailed description
Molecular dynmaics simulaitons
To get an idea of what is possible with molecular dynamics (MD)) and what is doable an intense literature research was done. In this way two programs for MD were chosen to test them: VMD/QwikMD/NAMD and GROMACS.
Docking simulations
In the same way programs for docking have been figured out to test. They were Rosetta and AutoDock VINA.
VMD/QwikMD/NAMD1
Visual Molecular Dynamics (VMD) is a graphical program for MD simulations. Due to the graphical user interface it was quite easy to run the first MD after a short orientation. VMD provides an explicite and an implicite water model where te explicite water model took much more time to be calculated. With this program all simulations were performed on our own machines. In this way we simulated our target PSMα3 starting from the crystal structure 6NIV. Different sets of parameters have been tried out: MD simulations over 0.1 ns have been performed with implicite and explicite water and at several temperatures (27°C, 134°C, 200°C). The computation times varied from several minutes up to a few hours.
Fig. 1: Screenshot of a VMD session
GROMACS2
GROMACS is a command line MD program which comes with plentiful options for the simulation. One not only have the choice between 15 force fields but also between 7 water models. Due to many options it is hard for beginners especially when you're not familiar with the command line but there are pretty nice tutorials which make the program accessible. For us this lysozymtutorial helped a lot with getting confident with GROMACS. Running a complete GROMACS MD requires many single steps. To simplify this we used bash scripts which executed all the commands consecutively.
#!/bin/bash # This script runs the commands from the Lysozyme tutorial: # http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html name='5kgy_1' ff='amber99sb-ildn-fme' water='spce' bt='dodecahedron' #triclinic, cubic, dodecahedron, octahedron grep -v HOH ${name}.pdb > ${name}_clean.pdb gmx pdb2gmx -f ${name}_clean.pdb -o ${name}_processed.gro -water $water -ff $ff -ignh gmx editconf -f ${name}_processed.gro -o ${name}_newbox.gro -c -d 1.0 -bt $bt gmx solvate -cp ${name}_newbox.gro -cs spc216.gro -o ${name}_solv.gro -p topol.top gmx grompp -f ions.mdp -c ${name}_solv.gro -p topol.top -o ions.tpr gmx genion -s ions.tpr -o ${name}_solv_ions.gro -p topol.top -pname NA -nname CL \ -neutral <<eof 13 eof gmx grompp -f minim.mdp -c ${name}_solv_ions.gro -p topol.top -o em.tpr gmx mdrun -gcom 4 -deffnm em gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr gmx mdrun -gcom 4 -deffnm nvt gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr gmx mdrun -gcom 4 -deffnm npt gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr gmx mdrun -gcom 4 -deffnm md_0_1 gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center <<eof 1 0 eof gmx trjconv -s md_0_1.tpr -f md_0_1_noPBC.xtc -o ${name}_MD.pdb <<eof 1 eof
Fig. 2: Bash script to perform a MD simulation of the pdb file 5kgy_1.
Our target PSMα3 has got an N-formylation what is implemented in the used amber99sb-ildn force field by default. So we had to do this before we were able to simulate our target.
How to implement a modified residue in the amber99sb-ildn force field?
Copy the force field folder (in our case: amber99sb-ildn.ff) and „resresiduetypes.dat“ from the share/top directory in $GMXLIB (e. g. gromacs-2019.3/share/top) to your working directory.
The following files have to be changed:
residuetypes.dat Add the new residues to the file within the existing pattern
amber99sb-ildn.ff/aminoacids.arn (C- or N-terminal residue) Add the new residues to the file within the existing pattern
amber99sb-ildn.ff/aminoacids.hdb Add the new residue as normal, N- and C-terminal AA and define its bond dihedral angels and bond types (2nd col)
NFME 6 1 1 H N CN CA 1 1 HF CN N OF 1 5 HA CA N C CB 2 6 HB CB CG CA 2 6 HG CG SD CB 3 4 HE CE SD CG
amber99sb-ildn.ff/aminoacids.r2b Add the following line to the file
FME - NFME - -
amber99sb-ildn.ff/aminoacids.rtp Now it becomes intersting: In this file some kind of partial charges are defined. Here is how it works: the first column contains the atom name how they appear in the pdb file. Thats why a consistent naming of all atoms is crucial. Some pdb files contain different namings and the MD simulation will break. The second column keeps the assigned atom types from the file „atomtypes.atp“. If you introduce a new atom type you have to add it to this file. The third column now contains the partial charges. The sum of all partial charges has to be a round number (e. g. -1, 0 or 1). The last column contains the atom numbers within a residue.
[ NFME ] [ atoms ] N N -0.572522 1 H H 0.322879 2 CA CT -0.017921 3 CN C 0.662279 18 HF H5 0.034579 19 OF O -0.607721 20 HA H1 0.105079 4 CB CT 0.028048 5 HB1 HC 0.02410 6 HB2 HC 0.02410 7 CG CT 0.00180 8 HG1 H1 0.04400 9 HG2 H1 0.04400 10 SD S -0.27370 11 CE CT -0.05360 12 HE1 H1 0.06840 13 HE2 H1 0.06840 14 HE3 H1 0.06840 15 C C 0.59730 16 O O -0.56790 17 [ bonds ] N CN N H N CA CN OF CN HF CA HA CA CB CA C CB HB1 CB HB2 CB CG CG HG1 CG HG2 CG SD SD CE CE HE1 CE HE2 CE HE3 C O [ impropers ] HF CN OF N CA H N CN CA +N C O N CA C +N 105.4 0.75 1
amber99sb-ildn.ff/ffbonded.itpIn this file bond types are defined. For FMET we need 3 new ones:
CN N 1 0.13350 410032.0 ; FME CN O 1 0.12290 476976.0 ; FME CN H 1 0.10800 307105.6 ; FME
Having the extended force field many different test MDs have been performed to examine the best set of parameters for our system.
Rosetta3
Rosetta is a very powerful docking program. It contains many useful tools including some secondary structure prediction tools. Before es docking can be run the molecules have to be relaxed. The docking itself is as two stage docking implemented where the first stage is a global docking. After this a refined docking is done to increase the binding energy. The placing of the ligand around the target is a random Monta Carlo movement. To yield a nice binding energy in this way many, many runs need to be done which requires a lot of computational power. Further Rosetta doesn't do it's calculations in any thermodynamic unit but in Roseatta Energy Units (REUs). This can't be translated to any common thermodynamic unit and it varies depending on the set of parameters. There are many of those so one can justify a lot but nonetheless electrostatic interactions are almost not treated.
AutoDock Vina4
AutoDock Vina is a quite simple docking program. It's based on empiric parameters and leads to very good results but without being computational expansive. To run Vina MGLTools was used to generate pdbqt files which contain some additional information required by Vina. Vina gets along with only 3 parameters which are exhaustiveness, flexibility and the gridbox which must be defined. Exhaustiveness defines the number seeds that will be used for independent docking runs thus the computational time scales proportional with the exhaustiveness. The default value for it is 8 which leads already to good results. While generating the pdbqt files from the pdb files flexibility of all bonds get set. Standard options are "rigid" without any rotatable bond and side chains flexible where only the backbone is kept rigid. With the gridbox a docking space is determined to increase the number of tested poses in a known binding pocket and ignores the rest of the target. It is set via the center point and the dimensions in X-, Y- and Z-direction. MGLTools provides a graphical tool for that purpose but this is kept very basic. That's why we coded two bash scripts which draw a gridbox automatically around the whole target and another one which draws respective pdb files to enable visualization in PyMOL. When drawing gridboxes keep in mind that the whole ligand has to fit into and the cartoon representation lacks of side chains.
Fig. 2: Steps of a regular simulation with AutoDock Vina
Decision
After trying the different simulation programs we had to chose one of each.
Molecular dynamics
Among the MD programs we decided us for GROMACS because of it's many parameters and a very good documentation. Further this program is predestined for automation which was important for us because we planned to perform many simulations.
Docking
The decision for a docking program was not easy because Rosetta promised to have many parameters and easy accessible from command line which is advantageous for our purpose. But it's still extreme computational expensive because one need many repetitions where as Vina is extreme fast. Intuitively Rosetta seemed to be a good choice but a little bit more research on literature convinced us to use Vina due to the meager implementation of electrostatics and much better performance.
Manual runs
For our manual simulations we implemented a powerful pipeline which enabled us to run many simulations on the cluster within a very short time.
#!/bin/bash
Genetic Algorithm
In the same way programs for docking have been figured out to test. They were Rosetta and AutoDock VINA.
MD for verification
50 ns MDs of random chosen safe and unsafe folders have been performed.
Results
To analysis the folding over time we plotted all peptides with ramachadran plots. With the help of this plots one can determine if the peptide leaves it's helical conformation or not.
Fig.1: Ramachadranplots of 13 safe and unsafe folders. The color indicates the timepoint of the MD: Black is the beginning and blue the End at 50 ns.
In the group of the 6 safe foders the avarage number of peptide bonds that break out the helical area is at 5.17 bonds/peptide. Among the 7 unsafe folder it is 5.86 bonds/peptide. This numbers might indicate a difference between both groups but is of no degree significant, also due to a low number of samples.