Poster Presentation 26th Annual Lorne Proteomics Symposium 2021

Proteomics at scale: Experimental design and automated data management for large clinical cohorts (#83)

Ahmed Mohamed 1 2 , Julian Kelabora 1 , Laura Dagley 1 , Melissa Davis 2 , Andrew Webb 1
  1. Colonial Foundation Healthy Ageing Centre, WEHI, Parkville, Vic, Australia
  2. Bioinformatics division, WEHI, Parkville, Vic, Australia

Scaling proteomics profiling to large clinical cohorts presents challenges in mitigating experimental variability and streamlining data management. The prevalence and effects of unanticipated confounding variables such as batching effect or instrument fluctuations over time, can directly hampers the discovery of robust biomarkers for such studies. Additionally, traditional software tools for proteomics data analysis are not yet tailored to processing thousands of samples. Here, we present a scalable experimental design allows minimization and estimation of confounding variables, through combination of batch design, global QC samples and spiked-in internal standards. We also established a cloud-based fully automated data processing and archiving workflow that is petabyte scalable. The workflow is hooked to a user-friendly interactive interface allowing rapid identification of instrument- and sample-related issues. The workflow will be utilised to acquire and process proteomics data for the 22,000 samples of the ASPREE cohort.