Background: Human leukocyte antigen (HLA) molecules are cell-surface glycoproteins that present peptide antigens for surveillance by T lymphocytes seeking signs of disease. Mass spectrometric analysis allows us to identify large numbers of these peptides (the immunopeptidome) following affinity purification of HLA-peptide complexes from cell lysates. However, in recent years there has been a growing awareness of the ‘dark side’ of the immunopeptidome: unconventional peptide epitopes, including neoepitopes in cancer, which elude detection by conventional search methods because their sequences are not present in reference protein databases.
Methodologies: Here we establish a bioinformatic workflow to aid identification of peptides generated by non-canonical translation of mRNA or genome variants. The workflow incorporates both standard transcriptomics software and novel computer programs to produce cell line-specific protein databases based on 3-frame translation of the transcriptome. The final protein database also includes sequences resulting from variants determined by variant calling on the same RNA-seq data. We then search our experimental data against both transcriptome-based and standard databases using PEAKS Studio. Finally, further novel software helps to compare the various result sets arising for each sample, pinpoint putative genomic origins for the identified unconventional sequences, and highlight potential neoepitopes.
Results: We have trialled the workflow to study the immunopeptidome of the acute myeloid leukaemia cell line THP-1, using RNA-seq and mass spectrometric immunopeptidome data. We confidently identified over 14000 peptides from 3 replicates of purified THP-1 HLA peptides using UniProt. Using the transcriptome-based database, we recapitulated >75% of these, and also identified over 927 unconventional peptides, including 14 sequences caused by non-synonymous variants.
Conclusions: Our workflow, which we term ‘immunopeptidogenomics’, can provide databases which include pertinent unconventional sequences, allowing neoepitope discovery in cancer studies, without becoming unsearchably large. Immunopeptidogenomics is a step towards the unbiased search approaches needed to illuminate the dark side of the immunopeptidome.