Postdoc in the Field of Mass Spectrometry
Centre CERIT-SC (CERIT Scientific Cloud), a national centre operating computing and data storage infrastructure for executing large-scale experiments “in-silico”, often collaborating with other scientific disciplines, is looking for a new postdoc researcher to join the team working on advanced machine learning models in mass spectrometry.
What is the project about?
Mass spectrometry is a widespread experimental technique used to identify chemicals in samples e.g. biological, environmental, etc. As with most state-of-the-art techniques, it is tightly coupled with complex computational processing of the acquired data. In routine usage, it is used to confirm a specific compound in the sample. Advanced usage identifies which millions of known compounds in databases are contained in the sample. The most challenging usage attempts to identify compounds that were not seen before and are not recorded in the databases.
In 2021 four independent works, which address this "de novo" identification challenge with state-of-the-art machine learning techniques, appeared (references below). They all share the approach of supervised training of an underlying neural network model using a huge set of mass spectra generated in-silico, achieving an even larger training set than direct use of existing spectral databases would allow. Afterwards, the models are refined with the available experimental data.
The neural network models are strongly inspired by natural language processing; they follow LSTM or transformer architectures.
From the application point of view, all the published techniques address the LC-[ESI]-MS2 (liquid chromatography, electrospray ionization with two-stage fragmentation) variant of the experimental technique, which yields slightly different data from GC-[EI]-MS (gas chromatography, electron ionization with single-stage fragmentation) we use as our dominating experimental techniques. Therefore, the computational processing is not directly applicable.
What would be the short term goals of this position?
In the first year, the candidate is expected to get familiar with the cited work in detail and reproduce the results of at least some of them. Then he/she will modify the selected method to be applicable in our setup. The expected outcome is, besides the working software and its proper evaluation with representative testing data, a journal publication submitted in a good shape.
What is the research team like?
The candidate will work in a small interdisciplinary team with the participation of senior researchers, PhD and undergraduate students joining with both chemistry and computing backgrounds. Some preliminary work has been done on this specific topic and some knowledge already built in the team.
What do we offer?
- Exciting research topic, following very recent results, with high application potential
- Opportunity to achieve outstanding results quickly, development of scientific career
- Well-established interdisciplinary team with friendly relationships
What do we require?
- PhD in computing or natural science (chemistry, biology, physics, ...) with a clear focus on computational aspects
- Experience with baseline machine learning frameworks (Tensorflow, PyTorch); further experience with advanced frameworks (HuggingFace, ...) is welcome
- Experience with large-scale computing (parallel computation, GPU acceleration, use of computing clusters/supercomputers)
- Willingness to work in the interdisciplinary team, i.e. also building minimal knowledge of the application area (mass spectrometry); prior knowledge is not required, though
- English: oral communication, reading, writing (approx. C1 level)
- Eleni Litsa et al., Spec2Mol: An end-to-end deep learning framework for translating MS/MS Spectra to de-novo molecules, https://chemrxiv.org/engage/chemrxiv/article-details/613e83a7656369203b2a249b
- Svetlana Kutuzova et al., Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry, https://www.biorxiv.org/content/10.1101/2021.08.03.454944v1.full
- A.D.Shrivastava et al., MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8699281/
- M.A.Stravs, MSNovelist: De novo structure generation from mass spectra, https://www.biorxiv.org/content/10.1101/2021.07.06.450875v1.full