Rational Design

Overview

Rational design is a common approach to improve protein activity (1). This technique has previously been applied to isPETase (2), but never resulted in an increase in the catalytic rate sufficient to render the isPETase system relevant on a large scale. Additionally, the rational design process has never been done with the assistance of computational tools, nor did previous approaches inform their rational design process according to the results of computational studies. In this approach, we sought to apply a more rigorous rational design process to isPETase than had been used previously.

Docking and Binding Pose Optimization

Computational modeling of protein-ligand complexes typically starts with docking the ligand to the protein system (3). In silico, the ligand’s interaction with the protein surface is used to guide gradual changes in the ligand’s position until an energy minimum is reached; then, the amino acid side chains of the protein partner are rotated, again until an energy minimum is reached. This cycle is repeated a few times, for many starting positions. Each pose is then scored according to the difference in energy between the final protein conformation with and without the ligand, over the total energy of the protein/ligand complex. This process aims to identify a single optimal binding model.

An initial search of the literature found precedent for 2PET as a model ligand (4). Additionally, it seemed likely that the protein would interact with PET plastics while minimally changing the conformation of individual PET polymers. This would require long subunits of the polymer chain to undergo significant conformational change upon interaction with the protein, and it was deemed more likely that short . Thus, 2PET was chosen as a model substrate. Two features which were previously identified with increased PETase activity by isPETase, as compared to other related PETases, were a smaller distance between the carbonyl carbon of PET and the catalytic serine of isPETase, and a more bent binding position for 2PET in isPETase as compared to related enzymes (4).

It was assumed that, upon a favorable binding of the protein to a PET dimer, the dimer would be drawn into the catalytic site due to locally favorable interactions, and that a local conformational strain would be induced by the restricted geography of the binding site. Then, with the carbonyl carbon having moved closer to the catalytic triad, and the ester bond being strained by a “ridge” found at the end of the site (Fig. 1), electron transfer to the catalytic triad and reaction catalysis would be more favorable. This ridge was not found in related hydrolases which showed inferior PETase activity to isPETase.

Figure 1: A comparison of isPETase (A) and TfCut2 (C), a cutinase with known PETase activity, showing the contorted binding pose of 2PET to isPETase. Adapted from Fecker et al. (4).

The docking was done on the ROSIE server, using the Ligand Docking protocol (3, 5-7). Docking was validated by taking an isPETase crystal structure co-crystallized with the HEMT model ligand, separating the ligand from the protein, and then re-docking the ligand without repositioning either component. This is the recommended method of validation for the algorithm. The validation method found an RMSD (root-mean-square deviation) – a measure of the average difference between the atom positions in one structure and the positions of their counterparts in another – well below 2 angstroms, which is the recommended threshold implying the system can be handled by the docking algorithm.

Figure 2: Validation of ligand docking for isPETase. A) Superimposed structures for the determined isPETase:HEMT structure and the top 4 results from the computational re-docking. B) Measurement of the RMSD between the known structure and each re-docked structure.

The 5XJH crystal structure of isPETase, as found on the PDB database, was used for docking. A minimized 2PET structure was created in the Avogadro software suite. For the binding pose determination, input parameters were left at the default settings, with the exception that generation of 2000 ligand conformers was ordered; 400 total strutures were ordered; the number of low-resolution sampling steps was set to 2000; and repacking was done every 2 cycles of high-res docking, instead of every 3 cycles. The resulting docking data, along with configuration files and input structures, can be downloaded here.

Binding poses were scored by change in Rosetta energy units upon binding over the total energy of the bound model, during execution of the Rosetta algorithm. Afterwards, the data was additionally processed: the distance between the carbonyl carbon and the catalytic serine oxygen was calculated in angstroms using Biopython. The points were plotted in three dimensions. It was noted that proximity to the catalytic triad correlates with a greater energy decrease, suggesting that binding poses with the bond closer to the triad are more favorable. This is what would be expected, given data by Austin et al. showing a correlation between binding catalytic distance and catalytic rate (2).

Figure 3: Results of molecular docking of 2PET to the 5XJH isPETase structure. Energy measurements are in Rosetta energy units, an arbitrary unit used internally by the Rosetta protein design suite. Lower energies are more favorable, analogous to typical Gibbs free energy espressions. A) An isometric view of the data. B) A side-on view, showing that distance inversely correlates with energy of binding. The indicated points with distance 20Å are dummy points, and represent duplicate points from the docking algorithm, which could not be removed in data processing and had their distance set to an arbitrary high value instead.

The ten points with the smallest distance and the greatest decrease in energy upon binding were evaluated in the Avogadro software, and the highest-energy conformer was selected.

Figure 4: The final 2PET conformation selected from ligand docking.

Moving to a 4PET model

After the binding pose had been determined in this manner, a review of the literature showed that it was more likely that isPETase binds four subunits of PET, and that a tetramer ligand should be evaluated instead of the dimer. Joo et al. proposed an extended binding cleft, and showed that a mutation at the opposite end from the catalytic triad improved degradation of a long substrate, while degradation of a short substrate (which did not interact with the distal end of the cleft) was not affected (8).

To accommodate this discovery, docking of a 4PET structure was attempted, using ROSIE and the MTiAutoDock environment (9). However, the length of the molecule made it impossible to correctly set parameters on either server. Neither approach was capable of correctly positioning the tetramer in the cleft. Additionally, Joo et al. did not respond to a request for their 4PET docking structure.

Therefore, the dimer docking pose was retained for two of the subunits, and the other two subunits were manually positioned in the cleft using a Biopython script, in a pose replicating that found by Joo et al. The manually-positioned ligand was docked without allowing for changes in position, with only amino acid side chain repositioning allowed.

Figure 5: a comparison of the docking pose put forth by Joo et al. (A) and the 4PET docking pose that was assembled (B), using the previously determined optimal 2PET pose and two PET subunits that were positioned using structure A as a template.

Protein Design

Amino acids peripheral to the PET ligand were considered for mutation, aiming to improve their interactions with the matching moieties on the ligand. It was noted that the extended binding model meant that a long stretch of the bound PETase was adjacent to amino acids which did not match the chemical properties of the corresponding areas of the PET polymer. For example, two phthalate rings, in the far end of the binding cleft from the catalytic triad, did not have any adjacent aromatic or nonpolar interactions. Initially, T88, W185, I208, S238, N241, S245, and R280 were considered for mutation.

Figure 6: Candidate sites for mutation in isPETase.

Two tools were integrated to solve this problem in two different ways.

FlexddG is a protein design protocol for the Rosetta protein modeling software (10). It is an ensemble-based approach, meaning it scores protein-mutant interactions based on the averaged result of multiple simulated outcomes from a common starting point. Specifically, FlexddG’s key feature is the dependence on a “backrub” step. In addition to side-chain rotations, the protein backbone is flexed for 3-12 residues around each candidate amino acid, generating up to 50 structures. Repacking and minimization is done – as in docking – for the WT and the specified mutant variants, and the energy difference between the wild type and the mutant is quantified and averaged. The consideration of backbone flexibility cuts down on error and increases the predictivity of the model. In our applications of FlexddG, suggested parameters – from the sample procedure found on the Kortemme lab GitHub page – were used, with two differences. In all cases, 12 cores were used simultaneously, as we had access to a high power computer cluster.

FuncLib is an online tool which is capable of generation and high-throughput screening of mutants, to stabilize a protein or to improve its interaction with a ligand (11). As an input, the program takes a PDB file and a list of positions to be considered for mutation. If the amino acids to be considered at each position aren’t manually provided, the program calculates the most frequent substitutions in a similar sequence context. Candidate amino acids are then further filtered by the amount of individual destabilization that amino acid substitution would contribute. Finally, a Rosetta model calculates the energy difference between the wild type and each mutant, and the mutants are returned in order of most negative energy change.

Figure 7: Outlines of the two computational biology tools used in the analysis. A) An outline of the Rosetta FlexddG algorithm. B) An outline of the FuncLib algorithm. Panel A was adapted from Barlow et al. (10), and Panel B was taken from Khersonsky et al. (11)

Using these tools, two approaches were applied to obtain sequences. These approaches utilized the complementary strengths and weaknesses of FlexddG and FuncLib. One approach depended on a rational approach focusing on maximizing interactions which were expected to increase the binding of the protein to PET, while externalities such as destabilization were not considered. The other approach was more conservative, relying on FuncLib’s inherent avoidance of destabilizing mutations.

First, a rational design pipeline was applied. Because of FlexddG’s inability to perform large-scale screening, and because of the many combinations that would be possible given several candidate mutants at multiple positions, it was necessary to narrow down the search space if the FlexddG protocol was to be used. To this end, a combination of limited FlexddG runs, alongside the FuncLib server, was used to abstract the results of a large-scale screen of mutant candidates.

FlexddG analysis was performed at each site, to narrow down the candidate amino acids that would later be used as input for FuncLib. For example, at site 245, isoleucine, phenylalanine, and tyrosine were considered, to see how the resulting nonpolar / aromatic amino acid would interact with the PET phthalate ring in that region, and which mutations would be promising enough to evaluate in combination with mutations at other sites. Any candidates which had a significantly lower average ddG than other candidates at that site were removed from consideration. Due to time constraints, for these FlexddG runs, the number of structures was set to 20.

The refined list of substitutions was then plugged into FuncLib. A frequency-based scoring system was used to evaluate the top 700 substitutions, all of which showed a free energy change of more than 20 Rosetta energy units. The top-scoring substitutions were used to assemble a diverse set of isPETase multiple mutants. Each one of these sequences was then assayed by FlexddG again. The top 4 sequences were selected and assayed for activity. The details of this scoring method, the output of the FlexddG algorithm, and the modified components used to run the FlexddG protocol may be found in this zip file. (The rest of the FlexddG files were not modified, and can be sourced from the Kortemme lab GitHub site (12), along with instructions for use.)

After the rational approach was complete, a FuncLib job was performed, without changing any parameters. This time, the FuncLib algorithm was allowed to select its own candidate mutants at each site, meaning the impact of the structural context was preserved, likely increasing the stability of the resulting protein sequences. However, given that we are looking to find a novel function in isPETase, it was not clear that the evolutionarily-informed approach would improve the catalytic rate of PETase activity.

The sequences generated by the conservative approach were saved in case the rational approach failed. The potential existed that the mutations which had been induced in the protein would result in misfolding or aggregation; and that, as a result, the new mutant isPETase would not be able to be purified or assayed for PETase activity. As we limited the scope of our efforts to the rational design sequences, none of the sequence generated by the conservative approach were expressed or tested. Regardless, the results of this search are available here.

Validating Outputs

The extract_structures Python module – the part of the FlexddG algorithm responsible for extracting structures – malfunctioned, and was not able to correctly access the SQL database which internally held the PDB files during the algorithm’s execution. The obsolete DumpPdb RosettaScripts command was used to obtain a snapshot of the PDB at each time point, and this workaround seemed to have been successful. By the output PDB files, it was apparent that the FlexddG algorithm had correctly mutated the target residues, and had actually performed the backbone mutations at each site. FuncLib outputs were validated by comparing output PDBs to the WT protein. These files also showed mutations, indicating that the processing had been successful in that regard.

Figure 8: Final structures outputted by each algorithm. A) A comparison of FlexddG structures from five different runs. Random backbone flexing near the ligand – different for each run – can easily be seen. B) A comparison between the wild-type protein and one FlexddG output for that protein, from the final step in the rational approach. Green = WT, red/orange = mutant. C) A comparison between a wild-type protein and a FuncLib output, from the conservative approach. Green = WT, red/orange = mutant.

Only the sequences originating from the rational design approach were evaluated by the Wet Lab. For the outcome of those efforts, refer to the Wet Lab section of the iGEM Toronto 2019 wiki page.

References

Bornscheuer, U. T., & Pohl, M. (2001). Improved biocatalysts by directed evolution and rational protein design. Current Opinion in Chemical Biology. [Link]

Austin, H. P., Allen, M. D., Donohoe, B. S., Rorrer, N. A., Kearns, F. L., Silveira, R. L., … Beckham, G. T. (2018). Characterization and engineering of a plastic-degrading aromatic polyesterase. Proceedings of the National Academy of Sciences of the United States of America. [Link]

Combs, S. A., Deluca, S. L., DeLuca, S. H., Lemmon, G. H., Nannemann, D. P., Nguyen, E. D., et al. (2013). Small-molecule ligand docking into comparative models with Rosetta. Nature Protocols, 8(7), 1277–1298. doi:10.1038/nprot.2013.074

4. Fecker, T., Galaz-Davison, P., Engelberger, F., Narui, Y., Sotomayor, M., Parra, L. P., & Ramírez-Sarmiento, C. A. (2018). Active Site Flexibility as a Hallmark for Efficient PET Degradation by I. sakaiensis PETase. Biophysical Journal. https://doi.org/10.1016/j.bpj.2018.02.005

5. Deluca, S., Khar, K., Meiler, J. (2015). Fully Flexible Docking of Medium Sized Ligand Libraries with RosettaLigand. PLoS ONE 10(7): e0132508. doi:10.1371/journal.pone.0132508

6. BCL conformer generation: Kothiwale, S., Mendenhall, J.L., Meiler, J. (2015) BCL::Conf: small molecule conformational sampling using a knowledge based rotamer library. J. Cheminform., 7, 47. doi:10.1186/s13321-015-0095-1

7. Lyskov S, Chou FC, Conchúir SÓ, Der BS, Drew K, Kuroda D, Xu J, Weitzner BD, Renfrew PD, Sripakdeevong P, Borgo B, Havranek JJ, Kuhlman B, Kortemme T, Bonneau R, Gray JJ, Das R., "Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE)". PLoS One. 2013 May 22;8(5):e63906. doi: 10.1371/journal.pone.0063906. Print 2013. Link

8. Joo, S., Cho, I. J., Seo, H., Son, H. F., Sagong, H. Y., Shin, T. J., … Kim, K. J. (2018). Structural insight into molecular mechanism of poly(ethylene terephthalate) degradation. Nature Communications. https://doi.org/10.1038/s41467-018-02881-1

9. Labbé CM, Rey J, Lagorce D, Vavrusa M, Becot J, Sperandio O, Villoutreix BO, Tufféry P, Miteva MA, MTiOpenScreen: a web server for structure-based virtual screening. Nucleic Acids Res. 2015 Jul 1;43(W1):W448-54. doi: 10.1093/nar/gkv306. Epub 2015 Apr 8.

10. Barlow, K. A., Ó Conchúir, S., Thompson, S., Suresh, P., Lucas, J. E., Heinonen, M., & Kortemme, T. (2018). Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. Journal of Physical Chemistry B. https://doi.org/10.1021/acs.jpcb.7b11367

11. Khersonsky, O., Lipsh, R., Avizemer, Z., Ashani, Y., Goldsmith, M., Leader, H., … Fleishman, S. J. (2018). Automated Design of Efficient and Functionally Diverse Enzyme Repertoires. Molecular Cell. https://doi.org/10.1016/j.molcel.2018.08.033