Receptor screening

Abstract:

In this section, we collect a candidate receptor set from literature and analyze the differences among these receptors. We come up with an in silico protocol to screen out the most ideal receptor design using a structural bioinformatic approach based on protein-protein docking and affinity-based scoring to guide the project. The results are further validated by wet lab experiments. after generating protein-protein complex via docking, we analyze and count the interacting residue number and calculate the binding affinity. We assume that a receptor with higher binding affinity with downstream protein could recruit downstream protein and form a complex in a relatively higher rate, which could accelerate signal transmission, because binding affinity is the strength of the interaction between two (or more than two) interacting molecules, specifically between candidate receptors and downstream TIRAP and MYD88 in our modeling. At last, we employ the product of relative weight coefficients to score receptor.

Compared with the molecular dynamic approach, our protocol requires neither costly professional software, nor high performance computing source, thus it is friendly and easy to use for most iGEMers.

The binding of two proteins can be viewed as a reversible and rapid process in an equilibrium that is governed by the law of mass action. The binding affinity(BA) is the strength of the interaction between two (or more than two) molecules that bind reversibly (interact)[2], and the force pulling two protein together when they are physically close to each other. Binding affinity is described through the equilibrium dissociation constant K, or equivalently the Gibbs free energy (ΔG = RTlnK), moreover, binding affinity defines whether complex formation occurs.

Measuring how likely two proteins will physically interact with each other and how strong their connections will be through experiments are time consuming and costly. Here we developed an in silico protocol and present an affinity-based scoring model via protein-protein docking to obtain the protein-protein complexes and calculate binding affinity.
*You can refer to our assumption and scoring method in Ranking(Jumping to Ranking)

Our protocol contains the following five procedures, which can be further epitomized to three main steps:
(i) Candidate receptors are identified by searching literature and referring to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database[3] to generate an alternative receptor set.
(ii) Crystal structures of each protein are obtained from the Protein Data Bank (PDB)[4].
(iii) Protein-protein complexes are generated by docking using ClusterPro[5].
(iv) Binding affinity is measured by contacts and contact-types-based model using Prodigy, [6].
(v) Candidate receptors are ranked using our scoring method based on multi-level binding affinity data.

Based on the ranking results, we guided the project design to choose and express TLR1/2.

Candidate Receptor Collection

We firstly searched literature and queried the KEGG database[3] for Tuberculosis - Reference Pathway(PATHWAY: map05152)[7]. Considering plenty of literature has reported that toll-like receptors play an essential role in host immune response against bacteria, we decided to choose Toll-like receptors family to recognize mycobacterium tuberculosis(Mtb). We included TLR1/2, TLR2/6 and TLR4 into our alternative receptor set since these receptors can recognize Mtb.

We further referred to the KEGG database[8] for more information on Toll-like receptor Signaling Pathway - Reference Pathway (PATHWAY: map04620)[OQQ1].We found that the only difference of these alternative receptors in cell signaling is the very first step binding. The signal transduction process are the same regarding the downstream of signal molecule MYD88. Thus we obtain the crystal structures of related proteins in the first step binding: TLR1/2,TLR2/6,TLR4 and MYD88 to conduct further structural bioinformatic analysis.

Crystal structure Collection

Signal transduction through Toll-like receptors (TLRs) originates from their intracellular Toll/interleukin-1 receptor (TIR) domain, which then binds to MyD88, a common adaptor protein also containing a TIR domain to activate host defense responses[9].

While KEGG database shows that TLRs have a direct interaction with myeloid differentiation primary response protein 88 (MYD88), [11-13], we found another adaptor MyD88-adapter-like (Mal, also known as TIRAP)[10] through our literature searching, which plays a crucial role in the MyD88-dependent signaling pathway shared by TLR2 and TLR4. TIRAP acts as a bridge to recruit the MyD88 molecule to activate TLR4 receptors in response to invading pathogens[11]. It has been proved[12-13] that both MyD88 and TIRAP are essential to the common signaling pathway shared by TLR2 and TLR4, which leads to activation of IRAK-1 and NF-κB, and finally to cytokine production. In this case we obtain the crystal structure of TIRAP and integrate it into our modeling.

The Protein Data Bank (PDB) is the fisrt open access digital data resource in all of biology and medicine established in 1971. It provides access to 3D structure data for large biological molecules and has ballooned to >145 000 structures of proteins, DNA, and RNA, and their complexes.[4]. All the cystal structures used in our model are obtained from PDB database,see in Figure.2

These five crystal structures are used for docking analysis.

Protein-Protein Docking

Most cellular processes are carried out by protein–protein interactions. Predicting the 3D structures of protein–protein complexes (docking) can shed light on their functional mechanisms and roles in cells. Docking can help us understanding the signaling pathways and evaluating the affinity of complexes.[14]

Current docking method consider the interacting proteins as either rigid (rigid body docking) or flexible (flexible docking) bodies. Here we use ClusterPro to perform rigid body docking. In brief the docking process can be described as the following three computational steps :
(i) Rigid-body docking by sampling billions of conformations;
(ii)Root-mean-square deviation (RMSD)-based clustering of the 1,000 lowest-energy structures generated, to find the largest clusters that will represent the most likely models of the complex;
(iii)Refinement of selected structures using energy minimization.
*See more in details

For two interacting proteins, one protein is placed on a fixed grid; the other protein is placed on a movable grid. The conformation of protein-protein complex is generated considering the possible rotational and translational positions of the movable protein towards the fixed protein in six-dimensional space. Approximately 1010 putative conformations were sampled using shape complementarity described in literature[5]. By performing this exhaustive sampling of the conformational space on a dense grid, some near-native structures would certainly be sampled. Among all the conformations, 1,000 lowest-energy docked structures were selected to simulate the complex which are considered at least some structures are close to the native structure of the complex. These structures are further clustered and structures at the centers of the 10 most populated clusters are given as output, see details(Jumping to Details)

Details:
The docked 1,000 lowest-energy complexes are ranked by hierarchical clustering using pairwise interface root-mean-square deviation(IRMSD) as the distance measure. For each pair among the 1,000 structures, the structure that has the highest number of neighbors within a 9-Å IRMSD radius would be selected and defined as the center of the first cluster, and the structures within the 9-Å IRMSD neighborhood of the center will constitute the first cluster. The members of this cluster are then removed. And the structure with the highest number of neighbors within the 9-Å IRMSD radius among the remaining structures would be the next cluster center. These neighbors will form the next cluster. Up to 30 clusters are generated in this manner.
After clustering, the energy of the retained structures are minimized for 300 steps with fixed backbone removing the steric overlaps but generally yielding only very small conformational changes. The structures at the centers of the 10 most populated clusters are given as output.

It has been shown that the 30 largest clusters include near-native structures for 92% of complexes in a protein–protein benchmark set. It has been shown that the top 10 predictions include at least one near-native complex, with an average RMSD of 5 Å from the native structure in 31 cases.

ClusPro yields four sets of docked structures as balanced, electrostatic-favored, hydrophobic-favored, and vander Waals + electrostatics. We selected docked complex structures from the top 10 models in hydrophobic-favored sets for further processing. The structures of protein complexes are shown in Figure 3.

Binding Affinity

Binding between two proteins is mainly defined by their contact region and their interface. It is indeed the network of contacts between surface residues that holds complexes together and defines their specificity and contributes to their interaction strength. The non-interacting surface (NIS) has been proved also to have an effect [15]

It’s been shown that the network of inter-residue contacts (ICs), which used to be neglected in the rationalization and prediction of BA, can be considered the best structural property to describe binding strength[16].Residues were classified based on their physico-chemical properties as polar(C, H, N, Q, S, T, W),apolar(A, F, G, I, V, M, P),charged(E, D, K, R).

Alexandre MJJ Bonvin et al [16]raised following linear model to calculate the binding affinity considering the contribution of both ICs and NIS :

They have further built a website based on this model named PRODGY .The website use following linear equation to calculate binding affinity.

This simple ICs/NIS-based model has been proved outperforming other 14 methods such as ‘global surface model’ , RosettaDock, PyDock , FireDock, DFIRE, CCHarPPI webserver and so on. This model has also been indicated that, in contrast to previous predictors, the number of interface residue contacts is a rather robust predictor that is less sensitive to conformational changes.

Here we use PRODGY to analyze ICs/NIS number and calculate binding affinity.
The residue number of interacting proteins are shown in Figure 5 below. In Figure 5, each axis represents one of eight different types of interatomic contacts including both inter-residue contacts (ICs) and non-interacting surface (NIS) ranging from 0 to 48. The green polygon represents the residue number of TLR binding to TIRAP, the purple polygon represents the residue number of TIRAP in TLR-TIRAP complex binding to MYD88, and the orange polygon shows the residue number of both TLR and TIRAP in TLR-TIRAP complex toward MYD88.

Here we find out ,in the binding of TLR to TIRAP(green polygon area)
TLR2/6 evidently owns the maximum number of charged-apolar and charged-charged contacts, which all positively contributes to binding affinity. At the same time,TLR2/6 owns the maximum number of apolar-apolar contacts.
TLR1/2 owns most polar-apolar inter-residue contacts, while TLR4 owns all types of ICs at a moderate level.
In this step, the difference of non-interacting surface between different TLRs is not obvious. The weight coefficient of polar-apolar inter-residue contacts is greater than single charged-apolar or charged-charged inter-residue contacts according to Formula2, and all this cause the results that TLR4 has the lowest binding affinity with TIRAP, and TLR1/2 owns a little lower binding affinity comparing to TLR26 to bind with TIRAP.

In the binding of TLR-TIRAP complex to MYD88 (purple polygon)
TLR1/2 owns the maximum number of all three types of inter-residue contacts positively contributing to the binding affinity.
TLR4 have the second maximum number of charged-apolar and polar-apolar inter-residue contacts, of which charged-apolar type residue contact number is more obvious.
The difference of non-interacting surface between different TLRs is also not very obvious, but generally higher than the last step binding shown in green polygon.
In this step,TLR1/2 and TLR4 show relatively obvious higher affinity than TLR2/6.

Plus, we find that the complex, formed after each TLR bind to TIRAP(shown in orange polygon),owns a higher affinity with MYD88 comparing to single TIRAP binding to MYD88 by reducing the non-interacting area. This tells us that forming a complex is more beneficial for the complex to bind with downstream protein and form further complex, and this could be the reason why this step exist in biological body.

In Figure.6, this heatmap shows the total numbers of surface interatomic contacts of all TLRs at these three steps binding in a comprehensive view. [TLRxx-TIRAP]-MYD88 row represents the residue number of TIRAP in TLR-TIRAP complex binding to MYD88. TLRxx-[TIRAP-MYD88] row represents the residue number of both TLR and TIRAP in TLR-TIRAP complex toward MYD88. The color from blue to red represents the residue number data from little to much.

Ranking

As described before, signal transduction of TLRs is initiated by the approximation of the receptor TIR domains. TIR domains act as a scaffold for downstream signal transducers and lead to the recruitment of intracellular TIR-containing adaptors such as TIRAP and MyD88. The engagement of these proteins then promotes the assembly of higher-order complexes, namely, the helical ‘Myddosome’ complex for myeloid differentiation primary response protein 88 (MYD88)-dependent activation of nuclear factor-κB (NF-κB).[17]

The binary Myddosome complex is quickly formed after the death domain of IRAK4 being recruited to the oligomerized MyD88. Here we assume that the difference caused by different receptors could be ignored after two steps binding, to be specific, TLRs bind to TIRAP and recruit MYD88. The interaction of TIR domains is not allosteric, we consider that while TIRAP still have an impact on the binding of MyD88 to IRAK4, TLRs show no obvious effect on the binding of MyD88 to IRAK4.[18]

We also assume a receptor with a higher binding affinity with the downstream protein could recruit the downstream protein and form a complex in a relatively higher rate, which could accelerate signal transmission because binding affinity represents the strength of the interaction between two or more molecules. In theory, a higher affinity could not only rise the local binding rate, according to the formula (ΔG = RTlnK),at constant temperature, the dissociation constant, a specific type of equilibrium constant that measures the propensity of a larger object to separate (dissociate) reversibly into smaller components, could be measured. As K is negatively related with G, a higher affinity leads to a lower dissociation constant. A lower dissociation constant describes a complex to fall apart into its component molecules less likely and stay together relatively longer, which is essential for the signal transduction.

Here we calculate relative weight coefficients of each step binding directly base on the binding affinity data we obtained. For step j binding, we count the maximum binding energy as Gjmax ,

Then the relative weight coefficients of number i TLR at step j binding is calculated by

Finally the relative weight coefficients are integrated into the final score of number i TLR with following formula:

Where i=1,2,3 representing TLR4,TLR1/2 and TLR2/6 respectively;
j=1,2 representing the following two steps binding, namely the binding of TLR to TIRAP and the binding of TLR-TIRAP complex to MYD88.

This calculation process is summarized and shown in Fig.8.1 and the score of each step is shown in Fig.8.2

From the final results we can see that TLR12 have the highest score, which means that TLR1/2 could perform better in signal transduction comparing to TLR2/6 and TLR4.

Conclusion

In short we develop a structural bioinformatic protocol and screen out an ideal receptor. Though the relative differences between receptors could be absolutely small, in reality, however there are plenty of cells where each single cell have abundant receptors on the surface. So, tiny differences between these receptors could result in a big difference. This protocol could be used in general receptor screening and be extended to adapt more conditions.

References

[1] Gromiha, M. M. , Yugandhar, K. , & Jemimah, S. . (2017). Protein–protein interactions: scoring schemes and binding affinity. Current Opinion in Structural Biology, 44, 31-38.

[2] Kastritis, P. L. , & Bonvin, A. . (2013). On the binding affinity of macromolecular interactions: daring to ask why proteins interact. Journal of the Royal Society Interface, 10(79), 20120835.

[3] https://www.kegg.jp/kegg/

[4] (2018). Rcsb protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Research.

[5] Kozakov, D. , Hall, D. R. , Xia, B. , Porter, K. A. , & Vajda, S. . (2017). The cluspro web server for protein–protein docking. Nature Protocols, 12(2), 255-278.

[6] Xue, L. , Kastritis, P. L. , Bonvin, A. M. , & Vangone, A. . (2016). Prodigy : a web server for predicting the binding affinity of protein-protein complexes. Bioinformatics, 32(23), 3676-3678.

[7] https://www.kegg.jp/dbget-bin/www_bget?map05152

[8] https://www.kegg.jp/dbget-bin/www_bget?map04620

[9] Aderem, A. , & Ulevitch, R. J. . (2000). Toll-like receptors in the induction of the innate immune response. Nature, 406(6797), 782-787.

[10] O"Neill, L. A. J. , Dunne, A. , Edjeback, M. , Gray, P. , Jefferies, C. , & Wietek, C. . (2003). Mal and myd88: adapter proteins involved in signal transduction by toll-like receptors. Journal of Endotoxin Research, 9(1), 55-59.

[11] Yamamoto, M. , Sato, S. , Hemmi, H. , Sanjo, H. , Uematsu, S. , & Kaisho, T. , et al. (2002). Essential role for tirap in activation of the signalling cascade shared by tlr2 and tlr4. Nature (London), 420(6913), 324-329.

[12] Horng, T. , Barton, G. M. , Flavell, R. A. , & Medzhitov, R. . (2002). The adaptor molecule tirap provides signalling specificity for toll-like receptors. Nature (London), 420(6913), 329-333.

[13] Fitzgerald, K. A. , Palsson-Mcdermott, E. M. , Bowie, A. G. , Jefferies, C. A. , Mansell, A. S. , & Brady, G. , et al. (2001). Mal (myd88-adapter-like) is required for toll-like receptor-4 signal transduction. Nature, 413(6851), 78-83.

[14] Andrusier, N. , Mashiach, E. , Nussinov, R. , & Wolfson, H. J. . (2008). Principles of flexible protein-protein docking. Proteins Structure Function and Bioinformatics, 73(2), 271-289.

[15] Kastritis, P. L. , Rodrigues, J. P. G. L. M. , Folkers, G. E. , Boelens, R. , & Bonvin, A. M. J. J. . (2014). Proteins feel more than they see: fine-tuning of binding affinity by properties of the non-interacting surface. Journal of Molecular Biology, 426(14), 2632-2652.

[16] Vangone, A. , & Bonvin, A. M. . (2015). Contacts-based prediction of binding affinity in protein–protein complexes. eLife,4,(2015-07-08), 4.

[17] Gay, N. J. , Symmons, M. F. , Gangloff, M. , & Bryant, C. E. . (2014). Assembly and localization of toll-like receptor signalling complexes. Nature Reviews Immunology, 14(8), 546-558.

[18] Lin, S. C., Lo, Y. C., & Wu, H. (2010). Helical assembly in the MyD88-IRAK4-IRAK2 complex in TLR/IL-1R signalling. Nature, 465(7300), 885–890. doi:10.1038/nature09121

TOP

Team:CPU CHINA/ModelI