Machine Learning predictions of peptide behaviour for improved identification of modified peptides
The overall objective of this research project is to enable a much more sensitive yet reliable identification of (modified) peptides through DDA and DIA approaches. This will be achieved through four sub-goals: first, a novel predictor will be made for ion mobility behaviour (collisional cross-section) of (modified) peptides; second, a novel predictor will be made for the retention time of (modified) peptides; third, based on these two predictors alongside our existing MS2PIP predictor for fragmentation spectra, advanced theoretical spectral libraries will be built for DIA-based identification; fourth, we will add these two predictors to complement the core modules in our existing cloud-based ionbot tool (https://ionbot.cloud) for open modification searching in DDA data.
Machine Learning approaches to predict analyte behaviour; gradient boosting and deep learning algorithms will be employed, based on large amounts of available public data.
– Python programming (Numpy, Pandas)
– Experience with a Python Machine Learning library (e.g. Scikit-learn).
Nice to have:
– Experience in Deep Learning
– Programming in C, C++
We expect i) a predictor for ion mobility behaviour (collisional cross-section) of (modified) peptides, ii) a predictor for the retention time of (modified) peptides, and iii) availability and demonstration of the use of advanced theoretical spectral libraries based on available predictions (newly built retention time prediction, existing MS2PIP fragmentation spectrum prediction, newly built ion mobility prediction) for DIA approaches. Improved ionbot system for open modification searches of DDA data by including additional predictions (retention time prediction, ion mobility prediction).
Host: FHOOE (V. Dorfer), Duration: 1 Month; When: Year 1; Goal: Analysis of chimeric MS2 spectra.
Host: CRG (E. Sabido), Duration: 3 Month; When: Year 2; Goal: Experience in DIA proteomics acquisition methods.
Host: EMBO (B. Pulverer), Duration: 2 Weeks, When: Year 3, Goal: Scientific writing and editing.
Enrolment in doctoral programs
Ph.D. in Bioinformatics from Ghent University.
1. Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research, 47(W1), W295–W299. https://doi.org/10.1093/nar/gkz299
2. Bouwmeester, R., Gabriels, R., Hulstaert, N., Martens, L., & Degroeve, S. (2020). DeepLC can predict retention times for peptides that carry as-yet unseen modifications. BioRxiv, https://doi.org/10.1101/2020.03.28.013003
3. Bouwmeester, R., Gabriels, R., Van Den Bossche, T., Martens, L., & Degroeve, S. (2020). The Age of Data‐Driven Proteomics: How Machine Learning Enables Novel Workflows. PROTEOMICS, 20(21–22), 1900351. https://doi.org/10.1002/pmic.201900351