Introduction to MS-based proteomics and Bioconductor infrastructure Laurent Gatto lg390@cam.ac.uk @lgatt0 http://cpu.sysbiol.cam.ac.uk CSAMA 17 June 2015
Outline Proteomics and MS data Bioconductor infrastructure Examples Ranges infrastructure Application: spatial proteomics
Mass-spectrometry LC-MS/MS Separation Source Analyser Precursor ions MS1 Detector Fragmented ions MSMS - MS2 Precursor ion dissociation
Chromatogram: total intensity over time Total ion current 0 20 40 60 80 100 TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01 20141210.mzML TIC: 7.22e+09 0 500 1000 1500 2000 2500 3000 3500 Time (sec)
MS1 (and MS2) spectra MS1 scan @ 21:3 min MS2 scan, precursor m/z 460.79 intensity 0.0e+00 5.0e+07 1.0e+08 1.5e+08 2.0e+08 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 200 400 600 800 MS2 scan, precursor m/z 488.8 400 500 600 700 800 m/z 200 400 600 800 1000 m/z
Mass-spectrometry LC-MS/MS Separation Source Analyser Precursor ions MS1 Detector Fragmented ions MSMS - MS2 Precursor ion dissociation
Fragmentation Credit abrg.org
cid <- calculatefragments("aegklrfk", type=c("b", "y"), z=2) ## Modifications used: C=160.030649 ht(cid, n = 3) ## mz ion type pos z seq ## 1 36.52583 b1 b 1 2 A ## 2 101.04713 b2 b 2 2 AE ## 3 129.55786 b3 b 3 2 AEG ##... ## mz ion type pos z seq ## 31 357.7185 y6* y* 6 2 GKLRFK ## 32 422.2398 y7* y* 7 2 EGKLRFK ## 33 457.7583 y8* y* 8 2 AEGKLRFK
MS1 and MS2 spectra MS1 scan @ 21:3 min MS2 scan, precursor m/z 460.79 intensity 0.0e+00 5.0e+07 1.0e+08 1.5e+08 2.0e+08 0e+00 2e+06 4e+06 6e+06 8e+06 1e+07 0e+00 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 200 400 600 800 MS2 scan, precursor m/z 488.8 400 500 600 700 800 m/z 200 400 600 800 1000 m/z
MS1 and MS2 spectra 2.0e+08 2.0e+08 1.5e+08 1.5e+08 1.0e+08 1.0e+08 5.0e+07 5.0e+07 0.0e+00 0.0e+00 21.35 21.30 21.25 21.20 800 1000 1200 21.11 21.10 21.09 21.08 600 800 1000 1200 21.15 retention time 21.10 400 600 m/z 21.07 retention time 21.06 200 400 m/z
Proteomics data raw data: MS1 and MS2 over retention time identication: MS2 quantitation: MS1 or MS2 protein database (to match MS2 spectra against) Status package Raw (mz*ml) mzr mztab MSnbase mgf MSnbase mzidentml mzid, mzr mzquantml (?mzr)
Bioconductor infrastructure biocviews: Proteomics, MassSpectrometry Number of Bioconductor packages 0 10 20 30 40 50 Proteomics MassSpectrometry MassSpectrometryData Package downloads 0 10000 20000 30000 40000 50000 Proteomics MassSpectrometry 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 Bioconductor Versions Bioconductor Versions
Learning from Bioconductor genomics proteomics ------------------+------------------------ eset (past?) *MSnSet (present) Ranges (present) *Pbase et al. (future) PPI *localisation (present)
MSnSet
Example library("msnbase") rx <- readmsdata("rawdata.mzml") ## raw data rx <- addidentificationdata(rx, "identification.mzid") rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true)
Example Precursor M/Z 600.36 8e+05 8e+05 6e+05 4e+05 6e+05 2e+05 0e+00 126.080 126.591 127.102 127.613 128.124 128.635 129.146 129.657 130.168 130.679 131.190 Intensity 4e+05 2e+05 0e+00 300 600 900 1200 M/Z
Example library("msnbase") rx <- readmsdata(f, centroided = TRUE) rx <- addidentificationdata(rx, g) rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true) plot(rx[[4730]], rx[[4929]])
Example intensity 1.0 0.5 0.0 0.5 1.0 prec scan: 6975, prec mass: 545.017, prec z: 3, # common: 34 prec scan: 7198, prec mass: 545.017, prec z: 3, # common: 35 0 500 1000 1500 m/z
Example library("msnbase") rx <- readmsdata(f, centroided = TRUE) rx <- addidentificationdata(rx, g) rx <- rx[!is.na(fdata(rx)$pepseq)] plot(rx[[10]], reporters = TMT6, full=true) plot(rx[[4730]], rx[[4929]]) qt <- quantify(rx, reporters = TMT6) ## qt <- readmsnset("quantdata.csv", ecols = 5:11) nqt <- normalise(qt, method = "vsn") boxplot(exprs(nqt)) MAplot(nqt[, 1:2])
Example 10 15 20 25 TMT6.126 TMT6.128 TMT6.130 M 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Median: 0.000187 IQR: 0.013 3.0 3.5 4.0 4.5 A
More RforProteomics package library("rforproteomics") RforProteomics() RProtVis() citation(package = "RforProteomics") Proteomics workow on the Bioc site Lab on Friday
protein database raw data quantitation identication
Ranges infrastructure g s g e G e s i e e i e s i e e i e s i e e i e s i e e i T1 i = 1 i = 2 i = 3 i = 4 T2 i = 1 i = 3 i = 4 T3 i = 1 i = 4 p s j p e j p s j p e j p s j p e j P j = 1 j = 2 j = 3 P 1 L P j = 1 j = 2 j = 3 π s k π e k π s k π e k π s k π e k π s k π e k Π k = 1 k = 2 k = 3 k = 4
Pbase package library("pbase") p <- Proteins("uniprot.fasta") p <- addidentificationdata(p, "identification.mzid") aa(p) ## peptides sequences as a AAStringSet pranges(p) ## peptide ranges as IRangesList i <- which(acols(p)[, "EntryName"] == "EF2_HUMAN") plot(p[i]) plot(p[i], from = 155, to = 185)
Along protein coordinates NH 100 300 500 700 COOH NH 230 250 COOH 200 400 600 800 240 P13639 W A F T L K Q F A E M Y V A K F A A K G E G Q L G P A E R A K K V E peptides peptides
Along genome coordinates... using transcript models as GRangesList and Gviz for plotting. Chromosome 1 156.11 mb 156.12 mb 156.13 mb 156.115 mb 156.125 mb 156.135 mb RATRSGAQASSTPLSPTR DTSRRLLAEKEREMAEMR RATRSGAQASSTPLSPTR ELEKTYSAKLDNAR METPSQRRATR RQNGDDPLLTYRFPPK SYLLGNSSPRTQSPQNCSIM peptides LALDMEIHAYRK LALDMEIHAYR SVGGSGGGSFGDNLVTR SVGGSGGGSFGDNLVTR SVGGSGGGSFGDNLVTR From the Pbase mapping vignette.
Along genome coordinates (with raw data) Chromosome X 78.11 mb 78.12 mb 5' 3' 3' 5' 78.115 mb 78.125 mb ENST00000373316 MS2 spectra From the Pbase mapping vignette.
80 60 40 20 0 With RNA-Seq reads Chromosome 17 30.12 mb 30.14 mb 30.16 mb 30.18 mb 30.13 mb 30.15 mb 30.17 mb AlignmentsTrack ENST00000612959 From https://github.com/computationalproteomicsunit/intro-integ-omics-prot
Spatial proteomics The cellular sub-division allows cells to establish a range of distinct microenvironments, each favouring dierent biochemical reactions and interactions and, therefore, allowing each compartment to full a particular functional role. Localisation and sequestration of proteins within subcellular niches is a fundamental mechanism for the post-translational regulation of protein function. Spatial proteomics is the systematic study of protein localisations.
fraction Cell lysate centrifugation Pure fraction catalogue Subtractive proteomics (enrichment) Figure : Immuno uorescence: ZFPL1, Golgi (left) and FHL2, mainly localized to actin laments and focal adhesion sites. Also detected in the nucleus (right). (from the Human Protein Atlas) label-free MS/MS Invariant rich fraction (clustering) PCP (χ2) itraq MS/MS LOPIT (PCA, PLS-DA) Figure : Mass spectrometry-based approaches based on density gradient subcellular fractionation.
Cell membrane lysis Mechanical or buer-induced lysis of the plasma membrane with minimal disruption to intracellular organelles followed by subcellular fractionation.
Density gradient separation
Quantitation by LC-MSMS
Data Fraction 1 Fraction 2... Fraction m markers p 1 q 1,1 q 1,2... q 1, m unknown p 2 q 2,1 q 2,2... q 2, m loc1 p 3 q 3,1 q 3,2... q 3, m unknown p 4 q 4,1 q 4,2... q 4, m loc k...... p n q n,1 q n,2... q n, m unknown Data analysis MSnbase for data manipulation, proloc for clustering, classication and plotting, and prolocgui for interactive exploration.
Correlation profile ER Correlation profile Golgi Correlation profile mit/plastid 0.2 0.3 0.4 0.5 1 2 4 5 7 8 Fractions 11 12 0.1 0.2 0.3 0.4 1 2 4 5 7 8 Fractions 11 12 0.0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 5 7 8 Fractions 11 12 Principal component analysis Correlation profile PM PC2 5 0 5 10 5 0 5 PC1 ER Golgi mit/plastid PM vacuole marker PLS DA unknown 0.15 0.20 0.25 0.30 0.35 0.1 0.2 0.3 0.4 0.5 0.6 11 1 2 4 5 7 8 12 Fractions Correlation profile Vacuole 11 1 2 4 5 7 8 12 Fractions Figure : From Gatto et al. (2010), data from Dunkley et al. (2006).
2009 vs 2013 3 2 1 0 1 2 3 3 2 1 0 1 2 3 PC1 (58.53%) PC2 (29.96%) ER/Golgi mitochondrion PM unknown 3 2 1 0 1 2 3 3 2 1 0 1 2 3 PC1 (58.53%) PC2 (29.96%) Cytoskeleton ER Golgi Lysosome mitochondrion Nucleus Peroxisome PM Proteasome Ribosome 40S Ribosome 60S Figure : Semi-supervised approach Breckels et al. (2013). Data from Tan et al (2009).
6 4 2 0 2 4 4 2 0 2 4 PC1 (50.05%) PC2 (24.61%) Actin cytoskeleton Cytosol Endosome ER/GA Extracellular matrix Lysosome Mitochondria Nucleus Chromatin Nucleus Nucleolus Peroxisome Plasma Membrane Proteasome Ribosome 40S Ribosome 60S unknown 1 11 1 1 1 2 2 222 3 3 3 3 4 4 444 4 44 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 99 9 9 9 9 9 1 Dynein 2 Vesicles Clathrin 3 13S condensin 4 T complex 5 Nucleus lamina 6 Vesicles COPI/II 7 eif3 complex 8 ARP2/3 complex 9 COP9 signalosome From Betschinger et al. (2013) 6 4 2 0 2 4 4 2 0 2 4 Mouse ESC (E14TG2a) in serum LIF PC1 (50.05%) PC2 (24.61%) Actin cytoskeleton Cytosol Endosome ER/GA Extracellular matrix Lysosome Mitochondria Nucleus Chromatin Nucleus Nucleolus Peroxisome Plasma Membrane Proteasome Ribosome 40S Ribosome 60S unknown Tfe3
Acknowledgement Lisa Breckels Sebastien Gibb Kathryn Lilley (CCP) ## R version 3.2.0 Patched (2015-04-22 r68234) ## Platform: x86_64-unknown-linux-gnu (64-bit) ## Running under: Ubuntu 14.04.2 LTS ## ## attached base packages: ## [1] stats graphics grdevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] Biobase_2.29.1 vsn_3.37.1 ## [3] splines_3.2.0 foreach_1.4.2 ## [5] Formula_1.2-1 affy_1.47.1 ## [7] Pbase_0.9.0 highr_0.5 ## [9] stats4_3.2.0 latticeextra_0.6-26 ## [11] BSgenome_1.37.1 Rsamtools_1.21.8 ## [13] impute_1.43.0 RSQLite_1.0.0 ## [15] lattice_0.20-31 biovizbase_1.17.1 ## [17] limma_3.25.9 chron_2.3-45 ## [19] digest_0.6.8 GenomicRanges_1.21.15 ## [21] RColorBrewer_1.1-2 XVector_0.9.1 ## [23] colorspace_1.2-6 preprocesscore_1.31.0 ## [25] plyr_1.8.2 MALDIquant_1.12 ## [27] XML_3.98-1.2 biomart_2.25.1 ## [29] zlibbioc_1.15.0 scales_0.2.4 ## [31] affyio_1.37.0 cleaver_1.7.0 ## [33] BiocParallel_1.3.25 IRanges_2.3.11 ## [35] ggplot2_1.0.1 SummarizedExperiment_0.1.5 ## [37] GenomicFeatures_1.21.13 nnet_7.3-9 ## [39] Gviz_1.13.2 BiocGenerics_0.15.2 ## [41] proto_0.3-10 survival_2.38-1 ## [43] magrittr_1.5 evaluate_0.7 ## [45] doparallel_1.0.8 MASS_7.3-40 ## [47] foreign_0.8-63 mzr_2.3.1