Oral Presentation 26th Annual Lorne Proteomics Symposium 2021

SRL generation using neural network analysis in lung carcinoma SWATH-MS acquisition (#48)

Daniel Bucio Noble 1 , Erin Humphries 1 , Jennifer Koh 1 , Steve Williams 1 , Daniela Smith 1 , Erin Sykes 1 , Clare Loudon 1 , Dylan Xavier 1 , Natasha Lucas 1 , Peter Hains 1 , Phil Robinson 1
  1. Children's Medical Research Institution, Westmead, NSW, Australia

Data Independent Acquisition (DIA) strategies have been widely adopted in Proteomics studies involving clinical samples in large cohorts. Given its robustness and reproducibility, SWATH-MS data can be routinely analysed using spectral reference libraries (SRLs) derived from Information Dependent Acquisition (IDA). The emergence of novel tools for SRL generation employing neural network analysis, such as DIA-NN, may offer alternatives to conventional methods. Despite the importance of SRLs in peptide identification and quantification, there is a lack of information in regards to SRL comparison derived from IDA and DIA-NN in clinical settings. Here we present such comparison analysis using formalin-fixed parafilm embedded (FFPE) and fresh frozen (FF) preserved Adenocarcinoma and Squamous Stage 1 Lung cancer tissue. Our data shows that a neural network trained library built on high pH fractions acquired in SWATH mode improves the number of identified proteins in 8% in contrast to a conventional high pH-fractionated SRL acquired with IDA mode. In addition, our DIA-NN SWATH library improves the number of quantified proteins in around 22% compared to libraries built on different methods. DIA-NN allows us to build subset specific SRLs using exclusively SWATH files. For instance, FF-Adenocarcinoma and Squamous libraries differentiate malignant from their corresponding normal matching tissue as shown in principal component analysis. We also determined that an Adenocarcinoma SRL made from FFPE and FF samples, containing 4779 proteins, and an SRL composed of a mix of Adenocarcinoma, Squamous, Lung Neuroendocrine Carcinoma (LNEC) and Large-cell lung carcinoma (LCC)-FFPE samples, containing 5447 proteins, when both applied to the same Adenocarcinoma-FFPE cohort, effectively segregate malignant from normal tissue. The employment of SWATH data for library generation is of relevance in studies where protein yield is limited for IDA acquisition such is often the case for clinical samples. These also suggest the suitability of DIA-NN in producing comprehensive and complementary SWATH-derived subset specific libraries regardless of their preservation method. Our results show DIA-NN as a valuable tool in clinical settings for SRL production.