Team:Shanghai-United/Contribution

Home

Contribution

Contribution:
We set up a systematic procedure to study the proteomics of overexpression of a protein in human cells.

Here is the procedure:

  1. Contruction a plasmid for a target gene overexpression.
  2. To confirm the over-expression of a target protein in cells, Western Blot should be run to see if there is a high concentration of NFX1 protein in experimental group.
  3. To find the differential proteins, IP-MS (Immunoprecipitation-Mass Spectrum) and FASP should be used.


For IP-MS, we will add new flag to protein, which can attach to the agarose beads we add later. The target protein along with proteins that are interrelated with will be attached to the agarose beads, other unrelated protein can be washed away. Then SDS-PAGE, comparing the proteins left on the beads and total protein. In-gel enzymatic hydrolysis shall be done to break the proteins in the gel into amino acids, which can be used for Mass Spectrum.

For FASP, you can gather all proteins' concentration in both groups and use SDT for cell lysis, and then use trypsin to break up protein into amino acid chains. They can then be sent to Mass Spectrum to evaluate concentration for different proteins. Comparing the statistics, you may filter out the proteins that differ in concentration significantly in control and experimental groups.


In the last phase, you will compare our results to past studies and researches, further consolidating certain protein's correlation with the protein.

 

Here is our example for studying NFX1.
1.1 Sample Information
We prepared a total of  18 dishes of HeLa cells. Each experiment used 6 dishes of cells, consisting of 3 normal cells and 3 with overexpression of NFX1 protein. 
Chart 1-1 Sample Information

1.2 Protein Identification Counts
Chart 1-2 Protein Identification Counts


Total spectra
Spectra Peptides Protein groups
417824 191276 42101 4673

Total spectra: Number of Secondary Mass Spectra;
Spectra: Number of Spectra Matched with Identified Peptides.
Peptides: The total number of distinct peptide sequences identified in the protein group.

Protein groups: Identified Protein Groups. A protein group consists of the following: One master protein that is identified by a set of peptides that are not included (all together) in any other protein group. All proteins that are identified by the same set or a subset of those peptides.

2. Proteomics & MS Results
2.1 Protein Quantification Assay & SDS-PAGE
Protein Quantification Assay

SDS-PAGE



2.2 Distribution of Peptide Score

The x-axis represents peptide score by Maxquant; the y-axis represents number of total peptides with scores within the according range. Maxquant scores of MS 2 are generally ideal.
The distribution in the histogram above is approximately normal, therefore, we speculate the results are random.
2.3 Distribution of Proteins' Molecular Weights

The x-axis represents proteins' molecular weights; the y-axis indicate the number of proteins that has a molecular weight in the according range. The distribution in the histogram above is unimodal and extremely positive-skewed. This means that the majority of the proteins detected have relatively small molecular weight.


2.4 Distribution of Peptides' Lengths

The x-axis represents length of peptides in  amino acids; the y-axis represents frequency of identified peptides. The distribution in the histogram above is unimodal and positive-skewed. This shows that more shorter peptides were detected than longer ones. The majority of the peptides detected have a length of three to thirteen amino acids.

2.5 Sequence Coverage of Protein Groups

The percent coverage calculated by dividing the number of amino acids in all found peptides by the total number of amino acids in the entire protein sequence.  This histogram illustrates the distribution of proteins with respect to their peptide coverage. The x-axis represents proteins' percentages of covered sequence; the y-axis represents the counts of identified proteins. From the graph, we can see that there's still a decent amount of protein with a coverage rate higher than 70%, with some even near the 90% rage. This indicates a very successful experiment. According to school of medicine at University of Virginia, a 70% coverage is a very successful protein analysis.

 There are several reasons why an analysis does not find all amino acids.

  • protein does not digest well
  • peptides too hydrophilic or small-they pass through the reverse phase column with salt and are not analyzed
  • peptides too large/hydrophobic-they stick in gel, adsorb to tubes, do not elute from column, or are too large for the mass spectrometer to analyze because of poor fragmentation
  • peptides fragment in ways which cannot be analyzed. Many spectra in an analysis cannot be interpreted. Some spectra only give limited data; proline, histidine, internal lysine and arginine are some reasons peptides do not give complete fragmentation data.

 

2.6 Distribution of Number of Identified Peptides of Proteins

This graph displays the distribution identified proteins with respect to the number of matching peptides. The x-axis represents the number of peptides matched with the identified protein, and the y-axis represents the number of proteins.


2.7 Identified Protein Counts

  • 4673 unique proteins are identified from the 6 samples.
  • After filtering out proteins marked in reverse and potential contaminated column as well as proteins with less than four data points out of six total, 2420 proteins are kept.
  • Out of these proteins, 575 proteins have a p value less than or equal to 0.05, meaning their data is significant rather than mere luck.
  • Within the 575 proteins, 133 of them have a fold change of Overexpression/Normal-greater than 1.2(upregulation), and 147 proteins have a fold change of Normal/Overexpression greater than 1.2(downregulation).

 

2.8 Boxplots of Protein Quantification Assay

On the x-axis are the names of each sample, and the y-axis represents the (Multiple of Median) MoM of LFQ(Label-Free Quantification) intensity (log base 2). It shows the individual sample deviation from the median value.


2.9 Correlation Matrix of Protein Quantification

In the graph's lower left are scatterplots of proteins correlation. On the upper right, there are Pearson's correlation of determination. The diagonal line represents the samples.

Log2 fold change

 2.10 Volcano Plots of all identified proteins

Fold Change (FC) = (Overexpression)Protein Contents/(Normal) Protein Contents
Red: ( Overexpression) fold change > 1.2, n = 133.
Green: (Normal)fold change > 1.2,  n= 147.


2.11 Heatmap of Differential Proteins(Clustering)

2.12 Principal Component Analysis of Differential Proteins(PCA)

Using differential proteins, the two groups can be differentiated distinctively and easily.

©2019 wiki-hpv Corporation. All rights reserved.