Summary

Restate of questions

Assumption

Model Etablishment and Solution

Structural model

Sequence Overview

Composition Details

Homology Modeling

MODEL

Hydrogels Model

Background

NJT0401 is designed as a model drug to study the γ-glutamic acid (γ-PGA) hydrogels release behavior. To investigate the effect of the physicochemical properties of hydrogels, such as the degree of substitution, concentration and degradation rate (The gelation time, swelling ratio, rheological properties and the protein release behavior are modulated in terms of the level of protein load of hydrogels, on the release efficiency of medicine, the concentration of medicine and medicine's therapeutic effect on cancer, and finally determine the optimization of the medicine release, which owns best physical and chemical indicators. We try to describe the relationships between the load and the efficient of drug release, as well as the effect on cell growth. The γ-PGA hydrogels show good biocompatibility. Optimal estimates of the model parameters will be obtained by minimizing the difference between model simulation and experimentally measured drug release kinetics.

Summary

In our work, we establish Stepwise Regression Model and Drug Release Kinetics Model to Optimize the release effects of NJT0401 with hydrogels physical and chemical paraments regulated.

Stepwise Regression Model

The Stepwise Regression Model is to establish the total regression equation of the physical and chemical properties of the hydrogels as the independent variable X and optimal effect of the interleukins Y. Then calculate the sum of partial-square regression (ie contribution) of total regression equation and the variables which have been introduced into the regression equation. When the total regression equation is not significant, which indicates the linear relationship of the multiple regression equation does not hold; and when an independent variable X has no significant effect on Y, it should be removed and re-established. A multiple regression equation containing the factors with significant influence are selected as independent variables. And the other variables in the equation are tested in order from small to large in the sum of partial-square regression. All variables which are not significant will be removed, and all factors that have significant effects will be retained. We thus establish an optimal regression equation, obtain the optimal interleukins under the optimal physical and chemical indicators of the hydrogels finally.

Drug Release Kinetic Model

The Drug Release Kinetic Model is designed to describe the stabilization time of NJT0401 release progress to the optimal concentration range. We fit the experimental data with the curve to obtain a series of kinetic constant K values, then filter out hydrogel samples that meet the sustained release mechanism. Considering the lag period, we introduce the modified Peppas-Sahlin Model and similarly screen out the qualified hydrogel samples

Restate of questions:

Optimize the release effects of NJT0402 with hydrogels physical and chemical paraments regulated
A fusion protein model of TAT was constructed to analyze the feasibility of PDL-1 on the surface of the fusion protein and IL-2 receptor binding on the surface of T-cell.

Assumption:

Hydrogel is temperature sensitive
Hydrogel degradation has no significant effect on drug degradation.
Experimental data is close to error free.

Model Establishment and Solution

Stepwise Regression Model

Hydrogel release experiments need to consider many factors，Such as gelation time, swelling ratio, degradation rate, pore size, strength, water absorption, etc.and further, we evaluate these factors, we establish the physical and chemical properties of hydrogels as independent variables x_i ，and establish a total regression equation with release concentration y _j.our equation as follow:

x _k (k=1,2,3,4…): "hydrogels physical and chemical paraments."
y : release concentration.
β₀ : a constant term.
β _k (k=1,2,3,4…) : a regression coefficient.
ε : a random error term.

Then calculate the summation of regression square, variables have been introduced into the regression equation. When the total regression equation is not significant, it indicates that the linear relationship of the multiple regression equation does not hold; and when an independent variable x has no significant effect on y, it should be eliminated and re-established a multiple regression equation without the factor.

Establish a linear regression equation for x₁, x₂, x₃, and x₄, use all subset regression for stepwise regression.

The factors with significant influence are selected as independent variables, and the other variables in the equation are tested in order from small to large in the square of the partial regression. Variables which are not significantly influential will be removed, and all factors that have significant effects will be retained. We thus established an optimal regression equation as follow:

The result of our programming with R is:

Start:  AIC=-164.15
                y ~ x1 + x2 + x3 + x4
                
                       Df  Sum of Sq    RSS           AIC
                - x1    1   0.00000945   0.00030949   -165.65
                - x2    1   0.00001323   0.00031327   -165.46
                <none>                  0.00030004   -164.15
                - x3    1   0.00033705   0.00063708   -154.10
                - x4    1   0.00110848   0.00140852   -141.41
                
                Step:  AIC= -165.65
                y ~ x2 + x3 + x4
                
                       Df   Sum of Sq        RSS       AIC
                - x2    1   0.00001448   0.00032397   -166.92
                <none>                 0.00030949   -165.65
                + x1    1   0.00000945  0.00030004   -164.15
                - x3    1   0.00034309   0.00065258  -155.72
                - x4    1   0.00110011   0.00140960   -143.39
                
                Step:  AIC=-166.92
                y ~ x3 + x4
                
                       Df   Sum of Sq        RSS     AIC
                <none>                  0.00032397   -166.92
                + x2    1    0.00001448  0.00030949   -165.65
                + x1    1    0.00001070  0.00031327   -165.46
                - x3    1    0.00033651  0.00066047   -157.52
                - x4    1    0.00112368  0.00144765   -144.97
                
                Call:
                lm(formula = y ~ x3 + x4, data = data2)
                
                Residuals:
                       Min         1Q       Median         3Q        Max 
                -0.0067235   -0.0039085  0.0005098     0.0045098  0.0071732 
                
                Coefficients:
                             Estimate  Std. Error  t value  Pr(>|t|)    
                (Intercept)    -0.013902   0.005607  -2.479  0.0276 *  
                x3           0.003832   0.001043  3.675   0.0028 ** 
                x4           0.015233   0.002269  6.715   1.44e-05 ***
                ---
                Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
                
                Residual standard error: 0.004992 on 13 degrees of freedom
                Multiple R-squared:  0.826,     Adjusted R-squared:  0.7993 
                F-statistic: 30.86 on 2 and 13 DF,   p-value: 1.156e-05

The normal residual of the regression model lm.salary and the normalized residual of the regression model lm.step are calculated separately.

Draw a residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.

Fig.2.Residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.

Logarithmic transformation of the model

The normalized residual of the new regression model lm.step and the predicted value of the model lm.step are calculated separately.

Draw a residual scatter plot with the normalized residual as the ordinate and the predicted value as the abscissa

Fig.3.Residual scatter plot with the normal residual as the ordinate and the predicted value as the abscissa.with new regression equation.

We can see that data 4 and data 8 may be interference points, remove observations 4 and 8 and return to the simulation. Draw a model diagnosis diagram.

Fig.4.Model diagnosis diagram

Calculate diagnostic statistics for individual observations.

Given the values of the explanatory variables x₁ and x₄. Perform point prediction and interval prediction. Draw a scatter plot matrix between independent variables and find that the linear relationship between the independent variables is weak.

Fig.5.Scatter plot matrix between independent variables

Drug Release Kinetics Model

The model is designed to describe the stabilization time of NJT0401 release progress to the optimal concentration range.

We fit the experimental data with the curve to obtain a series of kinetic constant K values, determine the K value, and filter out A hydrogel sample that meets the sustained release profile. Considering the lag period, we introduce the modified Peppas-Sahlin Model and similarly screen out the qualified hydrogel samples.

Fig.6.Drug release kinetics and correlation coefficient values from different kinetic models.

Then we found when K₁ is negative , K₂ is positive, Case-II relaxation K₂' is positive, l is negative, Fickian diffusion with a lag period.

In short,our hydrogel model is to optimize the release effects of NJT0401 with hydrogels physical and chemcal paraments regulated.with the results of best release concetration and stable concentration time, our model provided guidance for downstream validation experiments a lot.

Structural model

To verify whether the NJT0401 sequence we have identified can construct a complete protein structure, we computerized the protein structure of the sequence. The sequence composition is divided into Nanobody region, linker region and IL-2 region, wherein the Nanobody region is composed of VHH region and Fc region. The linker adopts rigid and flexible methods to analyze the rationality of the two connection methods through simulation.

Sequence Overview

NJT0401-SOFT (Soft Docking)

QVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVAKLLTTS

GSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSFEDPTCTLVTSS

GAFQYWGQGTQVTVSEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVT

CVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGK

EYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIA

VEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHY

TQKSLSLSPGK SSSGSGSSSGSG APTSSSTKKTQLQLEHLLLDLQMILNGINNYK

NPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDL

ISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

NJT0401-RIGID (Rigid Docking)

QVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVAKLLTTS

GSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSFEDPTCTLVTSS

GAFQYWGQGTQVTVSEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVT

CVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGK

EYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIA

VEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHY

TQKSLSLSPGKEAAAKEAAAKAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKN

PKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLI

SNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

Notes

Red part is the sequence of the nanobody.
Red bold is the VHH area.
Red thin part is the Fc area.
Yellow part of the red letter is the linker.
Blue part is the IL-2 sequence.

Composition Details:

Nanobody-VHH area: 5JDS chain B

TGQVQLQESGGGLVQPGGSLRLSCAASGKMSSRRCMAWFRQAPGKERERVA

KLLTTSGSTYLADSVKGRFTISQNNAKSTVYLQMNSLKPEDTAMYYCAADSF

EDPTCTLVTSSGAFQYWGQGTQVTVSSGSMDPGGSHHHHHHHH

(Notes: The purple part is the his tag, because we have it on the carrier, so it will be removed.The sequence we used added the Fc region after it, and its source is a patent.)

Nanobody-Fc area: 4CDH

EPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHE

DPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEY

KCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFY

PSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCS

VMHEALHNHYTQKSLSLSPGK

Linker area：

Soft: SSSGSGSSSGSG
Rigid:EAAAKEAAAK

IL-2 area:2ERJ (Crystal structure of the heterotrimeric interleukin-2 receptor in complex with interleukin-2)

The difference between the 5m5e and 2erj N ends is the red part in front of the 5m5e. This coincides with the partial sequence of the V. vulgaris immunoglobulin H chain V region precursor sequence, the function of which is unknown.

The green portion is a segment that is highly similar to a portion of the 2erj N-terminus (only the third amino acid is different).

2erjd

MYRMQLLSCIALSLALVTNS APTSSSTKKTQLQLEHLLLDLQMILNGINNYKNP

PKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLI

SNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

5m5eD

MGWSCIILFLVATATGVHS APASSSTKKTQLQLEHLLLDLQMILNGINNYKNP

KLTRMLTAKFAMPKKATELKHLQCLEEELKPLEEVLNGAQSKNFHLRPRDLIS

NINVIVLELKGSETTFMCEYADETATIVEFLNRWITFAQSIISTLT

Hence, we choose 2erjd as our IL-2 sequence.

Homology Modeling

I-TASSER was originally designed for protein structure modeling by iterative threading assembly simulations. It was recently extended for structure-based function annotation by matching structure predictions with known functional templates. The I-TASSER Suite, a stand-alone package implementing the I-TASSER–based protein structure and function modeling pipelines. The I-TASSER Suite pipeline consists of four general steps: threading template identification, iterative structure assembly simulation, model selection and refinement, and structure-based function annotation.

Predicted Secondary Structure

Predicted Solvent Accessibility

Predicted normalized B-factor

B-factor is a value to indicate the extent of the inherent thermal mobility of residues/atoms in proteins. this value is deduced from threading template proteins from the PDB in combination with the sequence profiles derived from sequence databases. The reported B-factor profile in the figure below corresponds to the normalized B-factor of the target protein, defined by B=(B'-u)/s, where B' is the raw B-factor value, u and s are respectively the mean and standard deviation of the raw B-factors along the sequence.

Top 10 threading templates used by I-TASSER

We use the templates of the highest significance in the threading alignments, the significance of which are measured by the Z-score.

Table1：Top 10 threading templates
Rank	PDB Hit	Iden1	Iden2	Cov	Norm.Z-score
1	1hzhH	0.26	0.55	0.87	2.47
2	1igtA	0.29	0.44	0.84	4.85
3	1hzhH	0.24	0.55	0.87	3.11
4	1hzh	0.24	0.55	0.87	1.05
5	1hzhH	0.25	0.55	0.87	3.82
6	1hzh	0.73	0.55	0.71	1.58
7	4rrpA	0.23	0.28	0.88	4.33
8	5dk3B	0.73	0.52	0.68	6.77
9	1irlA	0.98	0.26	0.26	4.60
10	1igtB	0.59	0.44	0.70	1.93

Notes

Rank of templates represents the top ten threading templates used by I-TASSER.
Ident1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence.
Ident2 is the percentage sequence identity of the whole template chains with query sequence.
Cov represents the coverage of the threading alignment and is equal to the number of aligned residues divided by the length of query protein.
Norm. Z-score is the normalized Z-score of the threading alignments. Alignment with a Normalized Z-score >1 mean a good alignment and vice versa.

Our Final model predicted by I-TASSER

To select the final models, we used the SPICKER program to cluster all the decoys based on the pair-wise structure similarity, and reported up to five models which corresponds to the five largest structure clusters. The confidence of each model was quantitatively measured by C-score that was calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa. TM-score and RMSD are estimated based on C-score and protein length following the correlation observed between these qualities. Although the first model had a better quality in most cases, it was also possible that the lower-rank models had a better quality than the higher-rank models as seen in our benchmark tests.

C-score=-0.20

Estimated TM-score = 0.69±0.12

Estimated RMSD = 7.8±4.4Å

Proteins structurally close to the target in the PDB

We used the TM-align structural alignment program to match the first I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB that have the closest structural similarity

Table2：Table2：PDB that have closest structural similarity
Rank	PDB-Hit	TM-score	RMSD ^a	IDEN ^a	Cov
1	1hzhH	0.836	1.94	0.230	0.865
2	1igyB	0.451	4.43	0.204	0.536
3	4ye4H	0.431	1.83	0.370	0.448
4	1p7kH	0.426	1.27	0.327	0.436
5	4rqqH	0.423	2.03	0.304	0.444
6	1tjhH	0.423	2.44	0.317	0.456
7	1rihH	0.420	1.79	0.303	0.439
8	5nstB	0.419	1.85	0.339	0.439
9	4jb9H	0.419	1.52	0.312	0.432
10	1igtB	0.419	4.47	0.072	0.500

Predicted function using COFACTOR and COACH

Biological annotations of the target protein by COFACTOR and COACH based on the I-TASSER structure prediction

Table3：Biological annotations of the target protein
Rank	C-score	Cluster size	PDB Hit	Lig Name	Ligand Binding Site Residues
1	0.13	31	4j8rB	PEPTIDE	50,110,111,113,114
2	0.08	15	3ls4H	TCI	33,35,37,47,50,59,99,100,113,114,115,118
3	0.06	11	1frgH	PEPTIDE	50,52,53,54,55,56,57,59,99,112,114
4	0.04	8	4xccH	GOL	202,203,205,213,214,215
5	0.03	7	1cbvH	Nuc.Acid	31,32,33,52,53,99,100,101,102,112,113

Notes

C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction.
Cluster size is the total number of templates in a cluster.
Lig Name is name of possible binding ligand.

Gene Ontology terms

Table4:Top 10 homologous GO templates in PDB
Rank	Cscore ^GO	TM-score	RMSD ^a	IDEN ^a	Cov	PDB Hit
1	0.36	0.8364	1.94	0.23	0.87	1hzhH
2	0.21	0.4340	2.21	0.41	0.46	2ig2H
3	0.20	0.4184	1.77	0.45	0.44	1hinH
4	0.20	0.414	1.95	0.41	0.44	1deeB
5	0.20	0.4082	2.03	0.42	0.43	1qkzH
6	0.20	0.4493	1.1	0.31	0.46	2ny7H
7	0.20	0.4256	1.94	0.34	0.45	1g9mH
8	0.20	0.427	2.05	0.34	0.45	2qadD
9	0.20	0.4085	1.75	0.42	0.43	1f4wH
10	0.20	0.4256	1.37	0.37	0.44	3pp3H

Notes

TM-score is a measure of global structural similarity between query and template protein.
RMSD is the RMSD between residues that are structurally aligned by TM-align.
IDEN is the percentage sequence identity in the structurally aligned region.
Cov represents the coverage of global structural alignment and is equal to the number of structurally aligned residues divided by length of the query protein.

References:

1.Yang Zhang. I-TASSER: Fully automated protein structure prediction in CASP8. Proteins, 77 (Suppl 9): 100-113, 2009.

2.Ambrish Roy, Jianyi Yang, Yang Zhang. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Research, 40: W471-W477, 2012.

3.Jianyi Yang, Yang Zhang. I-TASSER server: new development for protein structure and function predictions, Nucleic Acids Research, 43: W174-W181, 2015.

4. Ma X, Xu T, Chen W, et al. Injectable hydrogels based on the hyaluronic acid and poly (γ-glutamic acid) for controlled protein delivery[J]. Carbohydrate polymers, 2018, 179: 100-109.

Team:NJTech China/Model

Hydrogels Model

Background

Summary

Stepwise Regression Model

Drug Release Kinetic Model

Restate of questions:

Assumption:

Model Establishment and Solution

Stepwise Regression Model

The result of our programming with R is:

Drug Release Kinetics Model

Structural model

Sequence Overview

NJT0401-SOFT (Soft Docking)

NJT0401-RIGID (Rigid Docking)

Notes

Composition Details:

Nanobody-VHH area: 5JDS chain B

Nanobody-Fc area: 4CDH

Linker area：

Soft: SSSGSGSSSGSG

Rigid:EAAAKEAAAK

IL-2 area:2ERJ (Crystal structure of the heterotrimeric interleukin-2 receptor in complex with interleukin-2)

2erjd

5m5eD

Homology Modeling

Predicted Secondary Structure

Predicted Solvent Accessibility

Predicted normalized B-factor

Top 10 threading templates used by I-TASSER

Our Final model predicted by I-TASSER

Proteins structurally close to the target in the PDB

Predicted function using COFACTOR and COACH

Gene Ontology terms

References: