Team:NJTech China/Model

MODEL

Hydrogels Model

Background

NJT0401 is designed as a model drug to study the γ-glutamic acid (γ-PGA) hydrogels release behavior. To investigate the effect of the physicochemical properties of hydrogels, such as the degree of substitution, concentration and degradation rate (The gelation time, swelling ratio, rheological properties and the protein release behavior are modulated in terms of the level of protein load of hydrogels, on the release efficiency of medicine, the concentration of medicine and medicine's therapeutic effect on cancer, and finally determine the optimization of the medicine release, which owns best physical and chemical indicators. We try to describe the relationships between the load and the efficient of drug release, as well as the effect on cell growth. The γ-PGA hydrogels show good biocompatibility. Optimal estimates of the model parameters will be obtained by minimizing the difference between model simulation and experimentally measured drug release kinetics.

Summary

In our work, we establish Stepwise Regression Model and Drug Release Kinetics Model to Optimize the release effects of NJT0401 with hydrogels physical and chemical paraments regulated.

Stepwise Regression Model

The Stepwise Regression Model is to establish the total regression equation of the physical and chemical properties of the hydrogels as the independent variable X and optimal effect of the interleukins Y. Then calculate the sum of partial-square regression (ie contribution) of total regression equation and the variables which have been introduced into the regression equation. When the total regression equation is not significant, which indicates the linear relationship of the multiple regression equation does not hold; and when an independent variable X has no significant effect on Y, it should be removed and re-established. A multiple regression equation containing the factors with significant influence are selected as independent variables. And the other variables in the equation are tested in order from small to large in the sum of partial-square regression. All variables which are not significant will be removed, and all factors that have significant effects will be retained. We thus establish an optimal regression equation, obtain the optimal interleukins under the optimal physical and chemical indicators of the hydrogels finally.

Drug Release Kinetic Model

The Drug Release Kinetic Model is designed to describe the stabilization time of NJT0401 release progress to the optimal concentration range. We fit the experimental data with the curve to obtain a series of kinetic constant K values, then filter out hydrogel samples that meet the sustained release mechanism. Considering the lag period, we introduce the modified Peppas-Sahlin Model and similarly screen out the qualified hydrogel samples

Restate of questions:

  • Optimize the release effects of NJT0402 with hydrogels physical and chemical paraments regulated
  • A fusion protein model of TAT was constructed to analyze the feasibility of PDL-1 on the surface of the fusion protein and IL-2 receptor binding on the surface of T-cell.

Assumption:

  • Hydrogel is temperature sensitive
  • Hydrogel degradation has no significant effect on drug degradation.
  • Experimental data is close to error free.

Model Establishment and Solution

Stepwise Regression Model

Hydrogel release experiments need to consider many factors,Such as gelation time, swelling ratio, degradation rate, pore size, strength, water absorption, etc.and further, we evaluate these factors, we establish the physical and chemical properties of hydrogels as independent variables xi ,and establish a total regression equation with release concentration y j.our equation as follow:

  • x k (k=1,2,3,4…): "hydrogels physical and chemical paraments."
  • y : release concentration.
  • β0 : a constant term.
  • β k (k=1,2,3,4…) : a regression coefficient.
  • ε : a random error term.

Then calculate the summation of regression square, variables have been introduced into the regression equation. When the total regression equation is not significant, it indicates that the linear relationship of the multiple regression equation does not hold; and when an independent variable x has no significant effect on y, it should be eliminated and re-established a multiple regression equation without the factor.

Establish a linear regression equation for x1, x2, x3, and x4, use all subset regression for stepwise regression.

The factors with significant influence are selected as independent variables, and the other variables in the equation are tested in order from small to large in the square of the partial regression. Variables which are not significantly influential will be removed, and all factors that have significant effects will be retained. We thus established an optimal regression equation as follow:

The result of our programming with R is:

The normal residual of the regression model lm.salary and the normalized residual of the regression model lm.step are calculated separately.

Draw a residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.

Fig.2.Residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.

Logarithmic transformation of the model

The normalized residual of the new regression model lm.step and the predicted value of the model lm.step are calculated separately.

Draw a residual scatter plot with the normalized residual as the ordinate and the predicted value as the abscissa

Fig.3.Residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.with new regression equation.

We can see that data 4 and data 8 may be interference points, remove observations 4 and 8 and return to the simulation. Draw a model diagnosis diagram.

Fig.4.Model diagnosis diagram

Calculate diagnostic statistics for individual observations.

Given the values of the explanatory variables x1 and x4. Perform point prediction and interval prediction. Draw a scatter plot matrix between independent variables and find that the linear relationship between the independent variables is weak.

Fig.5.Scatter plot matrix between independent variables

Drug Release Kinetics Model

The model is designed to describe the stabilization time of NJT0401 release progress to the optimal concentration range.

We fit the experimental data with the curve to obtain a series of kinetic constant K values, determine the K value, and filter out A hydrogel sample that meets the sustained release profile. Considering the lag period, we introduce the modified Peppas-Sahlin Model and similarly screen out the qualified hydrogel samples.

Fig.6.Drug release kinetics and correlation coefficient values from different kinetic models.

Then we found when K1 is negative , K2 is positive, Case-II relaxation K2' is positive, l is negative, Fickian diffusion with a lag period.

In short,our hydrogel model is to optimize the release effects of NJT0401 with hydrogels physical and chemcal paraments regulated.with the results of best release concetration and stable concentration time, our model provided guidance for downstream validation experiments a lot.

Structural model

To verify whether the NJT0401 sequence we have identified can construct a complete protein structure, we computerized the protein structure of the sequence. The sequence composition is divided into Nanobody region, linker region and IL-2 region, wherein the Nanobody region is composed of VHH region and Fc region. The linker adopts rigid and flexible methods to analyze the rationality of the two connection methods through simulation.

Sequence Overview

NJT0401-SOFT (Soft Docking)

QVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVAKLLTTS

GSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSFEDPTCTLVTSS

GAFQYWGQGTQVTVSEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVT

CVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGK

EYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIA

VEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHY

TQKSLSLSPGK SSSGSGSSSGSG APTSSSTKKTQLQLEHLLLDLQMILNGINNYK

NPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDL

ISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

NJT0401-RIGID (Rigid Docking)

QVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVAKLLTTS

GSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSFEDPTCTLVTSS

GAFQYWGQGTQVTVSEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVT

CVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGK

EYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIA

VEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHY

TQKSLSLSPGKEAAAKEAAAKAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKN

PKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLI

SNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

Notes

  • Red part is the sequence of the nanobody.
  • Red bold is the VHH area.
  • Red thin part is the Fc area.
  • Yellow part of the red letter is the linker.
  • Blue part is the IL-2 sequence.

Composition Details:

Nanobody-VHH area: 5JDS chain B

TGQVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVA

KLLTTSGSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSF

EDPTCTLVTSSGAFQYWGQGTQVTVSSGSMDPGGSHHHHHHHH

(Notes: The purple part is the his tag, because we have it on the carrier, so it will be removed.The sequence we used added the Fc region after it, and its source is a patent.)

Nanobody-Fc area: 4CDH

EPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHE

DPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEY

KCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFY

PSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCS

VMHEALHNHYTQKSLSLSPGK

Linker area:

  • Soft: SSSGSGSSSGSG

  • Rigid:EAAAKEAAAK

IL-2 area:2ERJ (Crystal structure of the heterotrimeric interleukin-2 receptor in complex with interleukin-2)

The difference between the 5m5e and 2erj N ends is the red part in front of the 5m5e. This coincides with the partial sequence of the V. vulgaris immunoglobulin H chain V region precursor sequence, the function of which is unknown.

The green portion is a segment that is highly similar to a portion of the 2erj N-terminus (only the third amino acid is different).

2erjd

MYRMQLLSCIALSLALVTNS APTSSSTKKTQLQLEHLLLDLQMILNGINNYKNP

PKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLI

SNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

5m5eD

MGWSCIILFLVATATGVHS APASSSTKKTQLQLEHLLLDLQMILNGINNYKNP

KLTRMLTAKFAMPKKATELKHLQCLEEELKPLEEVLNGAQSKNFHLRPRDLIS

NINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

Hence, we choose 2erjd as our IL-2 sequence.

Homology Modeling

I-TASSER was originally designed for protein structure modeling by iterative threading assembly simulations. It was recently extended for structure-based function annotation by matching structure predictions with known functional templates. The I-TASSER Suite, a stand-alone package implementing the I-TASSER–based protein structure and function modeling pipelines. The I-TASSER Suite pipeline consists of four general steps: threading template identification, iterative structure assembly simulation, model selection and refinement, and structure-based function annotation.

Predicted Secondary Structure

Predicted Solvent Accessibility

Predicted normalized B-factor

B-factor is a value to indicate the extent of the inherent thermal mobility of residues/atoms in proteins. this value is deduced from threading template proteins from the PDB in combination with the sequence profiles derived from sequence databases. The reported B-factor profile in the figure below corresponds to the normalized B-factor of the target protein, defined by B=(B'-u)/s, where B' is the raw B-factor value, u and s are respectively the mean and standard deviation of the raw B-factors along the sequence.

Top 10 threading templates used by I-TASSER

We use the templates of the highest significance in the threading alignments, the significance of which are measured by the Z-score.

Table1:Top 10 threading templates
Rank PDB Hit Iden1 Iden2 Cov Norm.Z-score
1 1hzhH 0.26 0.55 0.87 2.47
2 1igtA 0.29 0.44 0.84 4.85
3 1hzhH 0.24 0.55 0.87 3.11
4 1hzh 0.24 0.55 0.87 1.05
5 1hzhH 0.25 0.55 0.87 3.82
6 1hzh 0.73 0.55 0.71 1.58
7 4rrpA 0.23 0.28 0.88 4.33
8 5dk3B 0.73 0.52 0.68 6.77
9 1irlA 0.98 0.26 0.26 4.60
10 1igtB 0.59 0.44 0.70 1.93

Notes

  • Rank of templates represents the top ten threading templates used by I-TASSER.
  • Ident1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence.
  • Ident2 is the percentage sequence identity of the whole template chains with query sequence.
  • Cov represents the coverage of the threading alignment and is equal to the number of aligned residues divided by the length of query protein.
  • Norm. Z-score is the normalized Z-score of the threading alignments. Alignment with a Normalized Z-score >1 mean a good alignment and vice versa.

Our Final model predicted by I-TASSER

To select the final models, we used the SPICKER program to cluster all the decoys based on the pair-wise structure similarity, and reported up to five models which corresponds to the five largest structure clusters. The confidence of each model was quantitatively measured by C-score that was calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa. TM-score and RMSD are estimated based on C-score and protein length following the correlation observed between these qualities. Although the first model had a better quality in most cases, it was also possible that the lower-rank models had a better quality than the higher-rank models as seen in our benchmark tests.

C-score=-0.20

Estimated TM-score = 0.69±0.12

Estimated RMSD = 7.8±4.4Å

Proteins structurally close to the target in the PDB

We used the TM-align structural alignment program to match the first I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB that have the closest structural similarity

Table2:Table2:PDB that have closest structural similarity
Rank PDB-Hit TM-score RMSD a IDEN a Cov
1 1hzhH 0.836 1.94 0.230 0.865
2 1igyB 0.451 4.43 0.204 0.536
3 4ye4H 0.431 1.83 0.370 0.448
4 1p7kH 0.426 1.27 0.327 0.436
5 4rqqH 0.423 2.03 0.304 0.444
6 1tjhH 0.423 2.44 0.317 0.456
7 1rihH 0.420 1.79 0.303 0.439
8 5nstB 0.419 1.85 0.339 0.439
9 4jb9H 0.419 1.52 0.312 0.432
10 1igtB 0.419 4.47 0.072 0.500

Predicted function using COFACTOR and COACH

Biological annotations of the target protein by COFACTOR and COACH based on the I-TASSER structure prediction

Table3:Biological annotations of the target protein
Rank C-score Cluster sizePDB HitLig NameLigand Binding Site Residues
1 0.13 31 4j8rB PEPTIDE 50,110,111,113,114
2 0.08 15 3ls4H TCI 33,35,37,47,50,59,99,100,113,114,115,118
3 0.06 11 1frgH PEPTIDE 50,52,53,54,55,56,57,59,99,112,114
4 0.04 8 4xccH GOL 202,203,205,213,214,215
5 0.03 7 1cbvH Nuc.Acid 31,32,33,52,53,99,100,101,102,112,113

Notes

  • C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction.
  • Cluster size is the total number of templates in a cluster.
  • Lig Name is name of possible binding ligand.

Gene Ontology terms

Table4:Top 10 homologous GO templates in PDB
Rank Cscore GO TM-score RMSD a IDEN a Cov PDB Hit
1 0.36 0.8364 1.94 0.23 0.87 1hzhH
2 0.21 0.4340 2.21 0.41 0.46 2ig2H
3 0.20 0.4184 1.77 0.45 0.44 1hinH
4 0.20 0.414 1.95 0.41 0.44 1deeB
5 0.20 0.4082 2.03 0.42 0.43 1qkzH
6 0.20 0.4493 1.1 0.31 0.46 2ny7H
7 0.20 0.4256 1.94 0.34 0.45 1g9mH
8 0.20 0.427 2.05 0.34 0.45 2qadD
9 0.20 0.4085 1.75 0.42 0.43 1f4wH
10 0.20 0.4256 1.37 0.37 0.44 3pp3H

Notes

  • TM-score is a measure of global structural similarity between query and template protein.
  • RMSD is the RMSD between residues that are structurally aligned by TM-align.
  • IDEN is the percentage sequence identity in the structurally aligned region.
  • Cov represents the coverage of global structural alignment and is equal to the number of structurally aligned residues divided by length of the query protein.

References:

1.Yang Zhang. I-TASSER: Fully automated protein structure prediction in CASP8. Proteins, 77 (Suppl 9): 100-113, 2009.

2.Ambrish Roy, Jianyi Yang, Yang Zhang. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Research, 40: W471-W477, 2012.

3.Jianyi Yang, Yang Zhang. I-TASSER server: new development for protein structure and function predictions, Nucleic Acids Research, 43: W174-W181, 2015.

4. Ma X, Xu T, Chen W, et al. Injectable hydrogels based on the hyaluronic acid and poly (γ-glutamic acid) for controlled protein delivery[J]. Carbohydrate polymers, 2018, 179: 100-109.

© 2019 NJTech_China iGEM Team. All Rights Reserved.