Bioinformatics

Following the acquisition of data in mass spectrometry based proteomics experiments, rigorous computational analysis is a crucial step that is often necessary to permit biological interpretation of results.

The UVic Proteomics Centre is dedicated to maintaining a diverse array of open source and commercial software solutions. We ensure that our locally installed systems are regularly maintained and kept up to date. We also continually assess the merits and benefits of adding new software products as they become available. As a Genome Canada funded platform facility our repository of software tools is accessible to all Genome BC funded large scale projects. Also, as a founding member of the BC Proteomics Network the majority of our tools are accessible to all BCPN members.

To anticipate and rapidly adapt to changes in the field of bioinformatics, we have formed strategic partnerships with expert labs located at the University of British Columbia, University of Alberta, and University of Calgary. We particularly value our closely working relationship with Dr. Ron Beavis, a member of our Scientific Advisory Board, who is the creator of The Global Proteome Machine.

MS/MS analysis & interpretation
The UVic – GBC Proteomics Centre houses an array of bioinformatic analyses that cover all aspects of data interpretation from MS datasets. Bioinformatics services include expert interpretation and assignment of peptide and protein identifications to MS/MS data. MS/MS assignment services routinely make of use multiple commercial and open source MS/MS analysis algorithms (Mascot, ProteinPilot, X!Tandem). These algorithms query both publically curated sequence databases and custom sequence databases generated for our clients. Additionally, we offer de novo sequence analysis of MS/MS spectra for the identification of novel proteins in biological systems with poorly annotated or incomplete genomes using PEAKS 2.0.

Quantitative MS analysis
Quantitative analysis of MS/MS datasets is another service we routinely provide. We are experts in the use and analysis of iTRAQ labeling chemistry for largescale quantitative proteomics experiments. We are able to perform quantitative analysis of iTRAQ data using ProteinPilot and Mascot v2.2. Analysis of quantitative data from MRM based experiment has increased dramatically, and we are now capable of interrogating MRM data from both Applied Biosystems and Agilent triple quadrupole mass spectrometers. We have also recently added Mascot Distiller and Scaffold to our repertoire of software for quantitative analysis which is capable of analyzing data from any quantitative proteomics workflow coupled with statistical validation of the quantitative data to ensure that only high integrity results are reported back to our clients for interpretation.

Statistical validation & higher order analyses
Higher order analyses (pathway mapping, gene ontology) and statistical validation (false positive estimation) of large MS based datasets of identified proteins is becoming increasingly necessary to aid in the biological interpretation of results. We routinely retain the custom programming services of Dr. Ryan Danell to develop novel software tools to address specific tasks. Our proven working relationship with Dr. Danell has resulted in the development of custom software tools to visualize MALDI imaging data, perform metabolite ID from FT-MS datasets, and results based recalibration of FT-MS data (1). We intend to retain the services of Dr. Danell to the develop customized installations of open source data analysis pipelines (ex. Cytoscape). This customized suite of tools will permit pathway mapping & visualization of large MS datasets with gene ontology (ex. function, subcellular localization) in a graphical web based environment that can be offered to clients and collaborators as value added bioinformatic analysis.


Tools for MS/MS analysis & quantitation

The Proteomics Centre is committed to meeting the needs of their collaborators and of proteomics researchers in BC and we continually adjust and expand our capabilities to meet upcoming demands and developments. Our suite of bioinformatic tools range from commercial packages designed for identifying peptides from MS/MS datasets to specialized tools for the integration of MRM data for quantifying peptides and proteins.

The Centre hosts a wide variety of data analysis algorithms for the purpose of identifying and quantifying peptides & proteins from tandem MS data. These tools currently include:

Application
Uses
Developer
Mascot 2.2 Protein ID, quantitation Matrix Science
X!Tandem Protein ID www.thegpm.org
PEAKS 4.5 Protein ID, de novo analysis Bioinformatics Solutions, Inc.
ProteinPilot 3.0 Protein ID, quantitation Applied Biosystems
Mascot Distiller 2.2.1 Protein ID, quantitation Matrix Science
MultiQuant 2.0 MRM Quantitation Applied Biosystems
MassHunter MRM Quantitation Agilent

Mascot, X!Tandem and PEAKS are available to all members of the BCPN free of charge, whether they are clients of the Centre or not. It is the goal of the Centre to provide a centralized repository of proteomic data analysis tools for the benefit of the entire BCPN community.

Tools for statistical validation of MS data

To enable statistical validation of peptide and protein identifications, the Centre hosts another suite of software tools that include:

Application
Developer
Scaffold 2.0 Proteome Software, Inc.
Mascot Distiller 2.2.1 Matrix Science
TransProteomic Pipeline Institute of Systems Biology

These applications are capable performing statistical validation of peptide & protein assignments from Mascot and X!Tandem search results.

Bioinformatics hardware

The need for increased hardware has grown with the implementation of these numerous software tools. Many of these programs require independent servers for optimal performance. To ensure fast access to our network, the Centre’s internet connection is a 1 gigabit/sec fibre optic BC Net connection to Internet 2.

The Centre currently hosts 10 servers and a 16 node cluster server for database searching. Data is stored using 3 RAID storage arrays with a combined capacity of 20 Terabytes to ensure data integrity. Additionally, automated robotic tape backups are performed and stored off site. All hardware is housed in a climate controlled room.

  1. Petrotchenko E.V. and Borchers C.H. ICC-CLASS: Isotopically-Coded Cleavable Cross-Linking Analysis Software Suite. BMC Bioinformatics 11:64 1471-2105. (2010)
  2. Danell, R., Ouvry-Patat, S.A., Scarlett, C.O., Speir, J.P., Borchers C.H. DAta SElf-Recalibration and Mixture Mass Fingerprint Searching (DASER-MMF) to Enhance Proteomics Identifications from Complex Mixtures. J Am Soc Mass Spectrom. 19:1914-25. (2008)