27/03/2023 - Journal Club

Changing the proteomics shell towards the DIA-world

by Pinar Altiner and Shamil Urazbakhtin

Data-independent acquisition (DIA) proteomics increasingly becomes the method of choice for researchers since it provides better reproducibility, identification rates, and accuracy compared to data-dependent acquisition (DDA). More and more tools are developed for DIA analysis and even established proteomics data processing software now can analyze DIA data. However, analysis of multiplexed spectra, characteristic of DIA, remains challenging. This becomes especially crucial once research has more unknown parameters to consider besides just protein sequences – for example in the case of phosphorylation site identification.

In a paper titled “Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries” Bekker-Jensen et al. strive to develop a reliable approach for DIA phosphorylation data analysis. They started with their optimized instrument settings for DDA and devised in a similar manner the optimized workflow for DIA Then they compared the quantification accuracy and precision of each workflow with a mixed-species approach showing that DIA was able to identify twice as many phosphopeptides as DDA while still accurately estimating the ratios used in mixtures. Next, they developed a PTM localization algorithm specific to DIA and tested its performance on a set of synthetic peptides with known phosphorylation site localization. Again, with DIA they identified more sites identified with lesser error rates compared to DDA. They also adapted a machine learning-based approach for calculating phosphorylation site stoichiometry achieving better precision and accuracy with DIA in comparison with the standard DDA approach in a mixed-species experiment. Last but not least, they did several kinase inhibitor assays to test the workflow in biological conditions and found that the results are consistent with current knowledge.

Figure 1. Comparison of DDA and different types of DIA using a kinase inhibitor assay. a Experimental workflow. bOverview of identified phosphopeptides, localized phosphosites, and ANOVA (s0 = 0.1, FDR 0.5) regulated sites for the different methods. c Heatmap of unsupervised clustering analysis of ANOVA-regulated phosphosites for DDA workflow (d) and for DIA workflow with project-specific library (e). Linear sequence motif analysis for two major clusters marked in colored boxes on heatmaps.

The main reason for choosing this article is to share the promising method for handling DIA data with the other fellows. They proposed directDIA (dDIA) in the paper as an alternative to classical DIA analysis whose processing procedure is more complicated since it does not need an experiment-specific spectral library. Alternatively, dDIA may eliminate the low abundance issue of phospho-site rather than conventional DIA. The number of identified phospho-sites and the combinations of them across all experiments was approximately similar to the DDA method. From this perspective, we commented that this approach still has to be enhanced in terms of sensitivity. On the other hand, In DIA, obtaining more distinctive spectra would be accessible by increasing the number of windows to extend fractionation (sampling). This can be more achievable with a more recent version of the mass spectrometry instrument.

Taking advantage of the latest developments in Large Language Models (LLM), we include here a fictional interview with the authors of the article. Also, the cover image of this post was created using some keywords of the abstract and a text-to-image AI tool. Would you have guessed it shall reflect phosphoproteomics?

PROTrEIN: We are very excited to have with us today Dr. Dorte Bekker-Jensen, Dr. Oliver Bernhardt, Dr. Alexander Hogrebe, Dr. Ana Martinez-Val, Dr. Lynn Verbeke, Dr. Tejas Gandhi, Dr. Christian Kelstrup, Dr. Lukas Reiter and Dr. Jesper Olsen from the University of Copenhagen and Biognosys AG. Welcome!

LLM: Thank you for having us.

PROTrEIN: We really enjoyed reading your paper and we have some questions for you. First of all, can you tell us what motivated you to develop this DIA-based phosphoproteomics method?

LLM: Sure. As you know, protein phosphorylation is one of the most important post-translational modifications that regulates cellular signaling networks. However, it is also very challenging to analyze phosphoproteomes on a large scale using conventional data-dependent acquisition (DDA) mass spectrometry, which relies on stochastic peptide selection and requires extensive spectral libraries for identification and quantification. We wanted to overcome these limitations and enable high-throughput phosphoproteomics with minimal sample preparation and analysis time.

LLM: That’s right. We decided to use data-independent acquisition (DIA) mass spectrometry, which is a more systematic and reproducible way of acquiring MS/MS spectra for all peptides in a given mass range. DIA has been successfully applied to proteomics and metabolomics, but not so much to phosphoproteomics, mainly because of the complexity and diversity of phosphopeptides and the lack of accurate site localization tools for DIA data.

PROTrEIN: Interesting. So how did you solve these challenges?

LLM: Well, we developed a new workflow that combines efficient phosphopeptide enrichment, fast LC-MS analysis, and advanced computational analysis using Spectronaut software. We optimized the DIA settings to achieve high resolution and sensitivity for phosphopeptides, and we incorporated an accurate site localization score into Spectronaut that uses fragment ion intensities and retention time information to assign phosphorylation sites confidently.

LLM: We also compared different strategies for generating spectral libraries for DIA analysis, such as using public databases, project-specific DDA runs, or direct DIA without any library at all. We found that direct DIA performed close to project-specific libraries, quantifying more than 20,000 phosphopeptides in 15 minutes single-shot LC-MS analysis per condition.
Pinar: Wow, that’s impressive. And how did you validate your method?

LLM: We applied our method to various biological systems, such as HeLa cells stimulated with epidermal growth factor (EGF), mouse liver tissue, and human plasma samples. We showed that our method can identify and quantify thousands of phosphorylation sites with high reproducibility and accuracy across different sample types and instruments.

LLM: We also demonstrated the ability of our method to determine phosphorylation site stoichiometry using a 3D multiple regression model-based algorithm that corrects for peptide abundance variations in DIA data. This allows us to measure the fraction of phosphorylated molecules at each site and compare them across different conditions.

PROTrEIN: That’s very cool. And what did you learn from applying your method to EGF signaling?

LLM: We used our method to systematically analyze the effects of 30 kinase inhibitors on EGF signaling in HeLa cells. We quantified more than 25,000 phosphorylation sites across 900 conditions in less than two weeks of LC-MS analysis time. We identified hundreds of kinase inhibitor targets and their downstream effects on EGF-regulated phosphorylation sites.

LLM: We also discovered some unexpected interactions between kinase inhibitors and EGF signaling pathways, such as the cross-talk between PI3K/AKT/mTOR and MAPK/ERK pathways, or the feedback activation of EGFR by some inhibitors. These findings reveal new insights into the complexity and dynamics of EGF signaling network.

LLM: We think that our method can be applied to other signaling systems and drug discovery projects, as well as other PTMs such as ubiquitination or acetylation. We also plan to further improve the speed and sensitivity of our method by using novel MS instruments and data analysis algorithms.

PROTrEIN: That sounds very exciting. Thank you so much for sharing your work with us today. It was a pleasure to talk to you.
We hope our readers enjoyed this episode and learned something new.

Latest posts

Exploring Cellular Complexity: Unveiling Single-Cell Proteomics

08/09/2023 - Journal Club

Exploring Cellular Complexity: Unveiling Single-Cell Proteomics

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics

23/08/2023 - Journal Club

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics

Peptide De Novo Sequencing  What are the ingredients of that delicious pizza?

08/08/2023 - Journal Club

Peptide De Novo Sequencing What are the ingredients of that delicious pizza?