Proteomics Laurent Gatto 1 http://cpu.sysbiol.cam.ac.uk CSAMA 27 June 2014 1 lg390@cam.ac.uk
Outline Proteomics and MS data Ranges infrastructure Application: spatial proteomics
Mass-spectrometry (LC-MSMS) Separation Source Analyser Precursor ions MS1 Detector Fragmented ions MSMS - MS2 Precursor ion dissociation
MS1 and MS2 spectra MS1 scan @ 21:3 min MS2 scan, precursor m/z 460.79 intensity 0.0e+00 5.0e+07 1.0e+08 1.5e+08 2.0e+08 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 200 400 600 800 MS2 scan, precursor m/z 488.8 400 500 600 700 800 m/z 200 400 600 800 1000 m/z
MS1 and MS2 spectra 2.0e+08 2.0e+08 1.5e+08 1.5e+08 1.0e+08 1.0e+08 5.0e+07 5.0e+07 0.0e+00 0.0e+00 21.35 21.30 21.25 21.20 800 1000 1200 21.11 21.10 21.09 21.08 600 800 1000 1200 21.15 retention time 21.10 400 600 m/z 21.07 retention time 21.06 200 400 m/z
Proteomics data raw data quantitation identication protein database
Proteomics data raw data quantitation identication protein database Status package Raw (mz*ml) mzr mztab MSnbase mgf MSnbase mzidentml mzid (mzr) mzquantml (?mzr)
Example library("msnbase") rx <- readmsdata(f, centroided = TRUE) rx <- addidentificationdata(rx, g) rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true)
Example Precursor M/Z 600.36 8e+05 8e+05 6e+05 4e+05 6e+05 2e+05 0e+00 126.080 126.591 127.102 127.613 128.124 128.635 129.146 129.657 130.168 130.679 131.190 Intensity 4e+05 2e+05 0e+00 300 600 900 1200 M/Z
Example library("msnbase") rx <- readmsdata(f, centroided = TRUE) rx <- addidentificationdata(rx, g) rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true) plot(rx[[4730]], rx[[4929]])
Example intensity 1.0 0.5 0.0 0.5 1.0 prec scan: 6975, prec mass: 545.017, prec z: 3, # common: 34 prec scan: 7198, prec mass: 545.017, prec z: 3, # common: 35 0 500 1000 1500 m/z
Example library("msnbase") rx <- readmsdata(f, centroided = TRUE) rx <- addidentificationdata(rx, g) rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true) plot(rx[[4730]], rx[[4929]]) qt <- quantify(rx, reporters = TMT6, method = "max") ## qt <- readmsnset(f2) nqt <- normalise(qt, method = "vsn") boxplot(exprs(nqt))
Example 10 15 20 25 TMT6.126 TMT6.127 TMT6.128 TMT6.129 TMT6.130 TMT6.131
More library("biocinstaller") bioclite("rforproteomics")
raw data quantitation identication protein database
Ranges infrastructure
Pbase package library("pbase") p <- Proteins(db) p <- addidentificationdata(p, id) aa(p) ## AAStringSet pranges(p) ## IRangesList i <- which(acols(p)[, "EntryName"] == "EF2_HUMAN") plot(p[i]) plot(p[i], from = 155, to = 185)
100 NH 300 500 700 COOH NH 230 250 COOH 200 400 600 800 240 P13639 W A F T L K Q F A E M Y V A K F A A K G E G Q L G P A E R A K K V E peptides peptides
Spatial proteomics The cellular sub-division allows cells to establish a range of distinct microenvironments, each favouring dierent biochemical reactions and interactions and, therefore, allowing each compartment to full a particular functional role. Localisation and sequestration of proteins within subcellular niches is a fundamental mechanism for the post-translational regulation of protein function. Spatial proteomics is the systematic study of protein localisations.
Spatial proteomics Disruption of the targeting/tracking process alters proper sub-cellular localisation, which in turn perturb the cellular functions of the proteins. Abnormal protein localisation leading to the loss of functional eects in diseases (Laurila et al. 2009) Disruption of the nuclear/cytoplasmic transport (nuclear pores) have been detected in many types of carcinoma cells (Kau et al. 2004).
fraction Cell lysate centrifugation Pure fraction catalogue Subtractive proteomics (enrichment) Figure: Immuno uorescence: ZFPL1, Golgi (left) and FHL2, mainly localized to actin laments and focal adhesion sites. Also detected in the nucleus (right). (from the Human Protein Atlas) label-free MS/MS Invariant rich fraction (clustering) PCP (χ2) itraq MS/MS LOPIT (PCA, PLS-DA) Figure: Mass spectrometry-based approaches based on density gradient subcellular fractionation.
Cell membrane lysis Mechanical or buer-induced lysis of the plasma membrane with minimal disruption to intracellular organelles followed by subcellular fractionation.
Density gradient separation
Quantitation by LC-MSMS
Data Fraction 1 Fraction 2... Fraction m markers p 1 q 1,1 q 1,2... q 1, m unknown p 2 q 2,1 q 2,2... q 2, m loc1 p 3 q 3,1 q 3,2... q 3, m unknown p 4 q 4,1 q 4,2... q 4, m loc k...... p n q n,1 q n,2... q n, m unknown
Correlation profile ER Correlation profile Golgi Correlation profile mit/plastid 0.2 0.3 0.4 0.5 1 2 4 5 7 8 Fractions 11 12 0.1 0.2 0.3 0.4 1 2 4 5 7 8 Fractions 11 12 0.0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 5 7 8 Fractions 11 12 Principal component analysis Correlation profile PM PC2 5 0 5 10 5 0 5 PC1 ER Golgi mit/plastid PM vacuole marker PLS DA unknown 0.15 0.20 0.25 0.30 0.35 0.1 0.2 0.3 0.4 0.5 0.6 11 1 2 4 5 7 8 12 Fractions Correlation profile Vacuole 11 1 2 4 5 7 8 12 Fractions Figure: From Gatto et al. (2010), data from Dunkley et al. (2006).
2009 vs 2013 3 2 1 0 1 2 3 3 2 1 0 1 2 3 PC1 (58.53%) PC2 (29.96%) ER/Golgi mitochondrion PM unknown 3 2 1 0 1 2 3 3 2 1 0 1 2 3 PC1 (58.53%) PC2 (29.96%) Cytoskeleton ER Golgi Lysosome mitochondrion Nucleus Peroxisome PM Proteasome Ribosome 40S Ribosome 60S Figure: proloc package. Semi-supervised approach Breckels et al. (2013). Data from Tan et al (2009).
6 4 2 0 2 4 4 2 0 2 4 PC1 (50.05%) PC2 (24.61%) Actin cytoskeleton Cytosol Endosome ER/GA Extracellular matrix Lysosome Mitochondria Nucleus Chromatin Nucleus Nucleolus Peroxisome Plasma Membrane Proteasome Ribosome 40S Ribosome 60S unknown 1 11 1 1 1 2 2 2 22 3 3 3 3 4 4 444 4 44 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 99 9 9 9 9 9 1 Dynein 2 Vesicles Clathrin 3 13S condensin 4 T complex 5 Nucleus lamina 6 Vesicles COPI/II 7 eif3 complex 8 ARP2/3 complex 9 COP9 signalosome
Dynamic Figure: prolocgui package.
Dual localisation 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Fraction Intensity 1 1 2 2 4 4 5 5 7 7 8 8 11 11 AT5G61790 (ER membrane) AT4G32400 (Plastid) AT2G45470 (PM) 6 4 2 0 2 4 6 4 2 0 2 4 PC1 (64.36%) PC2 (22.34%) ER lumen ER membrane Golgi Mitochondrion Plastid PM Ribosome TGN unknown vacuole Figure: Proteins may be present simultaneously in several organelles (dual localisation, tracking) vs. no man's land. (Gatto et al. 2014)
Acknowledgement CPU Lisa Breckels Sebastien Gibb Thomas Naake Kathryn Lilley (CCP) Computational Proteomics Unit Cambridge Centre for Proteomics Cambridge System Biology Centre Department of Biochemistry University of Cambridge http://cpu.sysbiol.cam.ac.uk @lgatt0 Software Sustainability Institute http://software.ac.uk