Team:Shanghai-United/Demonstrate

Home

Demonstrate

Overview
Here our team finds a list of proteins that are highly likely to be associated with an over-expression of NFX1-123.

Hela cells were either injected with empty vectors, or plasmids containing NFX1 and GFP genes. After they fully expressed inserted genes, we ran Western Blot, IP MS, and FASP to determine and confirm the differential proteins.

After careful analysis, the statistical outcomes of IP and FASP Mass Spectrum combined,  indicated significant changes in the expression of 280 proteins caused by NFX1 overexpression. Our findings can serve as a convenient bio marker to identify people carrying over-expressed NFX1, signal high risk of cervical cancer, and prevent cervical cancer in early stages.

We found in controlled in vitro assays that an over expression of NFX1-123 results in different concentration of certain proteins compared to normal. Most importantly, our results are highly convincing because we utilized three distinct methods to confirm and filter the results, ensuring a strong relativity of NFX1-123 gene expression and the proteins found.

Our results indicate potential proteins engaged in HPV infection, providing a new method to diagnose people with high risk of HPV virus even before they are actually infected. Since cervical cancer can only be cured in early stages, this evaluation by protein concentration may serve as important signs that help us to prevent and combat cervical cancer well before it gets uncontrollable. We also anticipate our assay to be starting point for further research on the mechanisms of HPV virus. For example, a specific protein associated with NFX1 gene could be tested to examine the role of that protein in HPV infection.

1.Primary Results
1.1 Sample Information
We prepared a total of  18 dishes of HeLa cells. Each experiment used 6 dishes of cells, consisting of 3 normal cells and 3 with overexpression of NFX1 protein. 
Chart 1-1 Sample Information

1.2 Protein Identification Counts
Chart 1-2 Protein Identification Counts


Total spectra
Spectra Peptides Protein groups
417824 191276 42101 4673

Total spectra: Number of Secondary Mass Spectra;
Spectra: Number of Spectra Matched with Identified Peptides.
Peptides: The total number of distinct peptide sequences identified in the protein group.

Protein groups: Identified Protein Groups. A protein group consists of the following: One master protein that is identified by a set of peptides that are not included (all together) in any other protein group. All proteins that are identified by the same set or a subset of those peptides.

2. Proteomics & MS Results
2.1 Protein Quantification Assay & SDS-PAGE
Protein Quantification Assay

SDS-PAGE


2.2 Distribution of Peptide Score

The x-axis represents peptide score by Maxquant; the y-axis represents number of total peptides with scores within the according range. Maxquant scores of MS 2 are generally ideal.
The distribution in the histogram above is approximately normal, therefore, we speculate the results are random.

2.3 Distribution of Proteins' Molecular Weights

The x-axis represents proteins' molecular weights; the y-axis indicate the number of proteins that has a molecular weight in the according range. The distribution in the histogram above is unimodal and extremely positive-skewed. This means that the majority of the proteins detected have relatively small molecular weight.


2.4 Distribution of Peptides' Lengths

The x-axis represents length of peptides in  amino acids; the y-axis represents frequency of identified peptides. The distribution in the histogram above is unimodal and positive-skewed. This shows that more shorter peptides were detected than longer ones. The majority of the peptides detected have a length of three to thirteen amino acids.

2.5 Sequence Coverage of Protein Groups

The percent coverage calculated by dividing the number of amino acids in all found peptides by the total number of amino acids in the entire protein sequence.  This histogram illustrates the distribution of proteins with respect to their peptide coverage. The x-axis represents proteins' percentages of covered sequence; the y-axis represents the counts of identified proteins. From the graph, we can see that there's still a decent amount of protein with a coverage rate higher than 70%, with some even near the 90% rage. This indicates a very successful experiment. According to school of medicine at University of Virginia, a 70% coverage is a very successful protein analysis.

 There are several reasons why an analysis does not find all amino acids.

  • protein does not digest well
  • peptides too hydrophilic or small-they pass through the reverse phase column with salt and are not analyzed
  • peptides too large/hydrophobic-they stick in gel, adsorb to tubes, do not elute from column, or are too large for the mass spectrometer to analyze because of poor fragmentation
  • peptides fragment in ways which cannot be analyzed. Many spectra in an analysis cannot be interpreted. Some spectra only give limited data; proline, histidine, internal lysine and arginine are some reasons peptides do not give complete fragmentation data.

2.6 Distribution of Number of Identified Peptides of Proteins

This graph displays the distribution identified proteins with respect to the number of matching peptides. The x-axis represents the number of peptides matched with the identified protein, and the y-axis represents the number of proteins.


2.7 Identified Protein Counts

  1. 4673 unique proteins are identified from the 6 samples.
  2. After filtering out proteins marked in reverse and potential contaminated column as well as proteins with less than four data points out of six total, 2420 proteins are kept.
  3. Out of these proteins, 575 proteins have a p value less than or equal to 0.05, meaning their data is significant rather than mere luck.
  4. Within the 575 proteins, 133 of them have a fold change of Overexpression/Normal-greater than 1.2(upregulation), and 147 proteins have a fold change of Normal/Overexpression greater than 1.2(downregulation).

 

2.8 Boxplots of Protein Quantification Assay

On the x-axis are the names of each sample, and the y-axis represents the (Multiple of Median) MoM of LFQ(Label-Free Quantification) intensity (log base 2). It shows the individual sample deviation from the median value.


2.9 Correlation Matrix of Protein Quantification

In the graph's lower left are scatterplots of proteins correlation. On the upper right, there are Pearson's correlation of determination. The diagonal line represents the samples.

Log2 fold change

 2.10 Volcano Plots of all identified proteins

Fold Change (FC) = (Overexpression)Protein Contents/(Normal) Protein Contents
Red: ( Overexpression) fold change > 1.2, n = 133.
Green: (Normal)fold change > 1.2,  n= 147.

2.11 Heatmap of Differential ProteinsClustering

2.12 Principal Component Analysis of Differential ProteinsPCA

Using differential proteins, the two groups can be differentiated distinctively and easily.

©2019 wiki-hpv Corporation. All rights reserved.