Research Engineer

California, United States
29 Jun 2019
End of advertisement period
29 Aug 2019
Contract Type
Fixed Term
Full Time

The Kundaje lab develops statistical and machine learning methods for large-scale integrative analysis of functional genomic data to decode regulatory elements and pathways across diverse cell types and tissues and understand their role in cellular function and disease.

We have led the analysis efforts of the Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Projects with the development of novel methods for

  • Denoising and normalization of large-scale functional genomic data.
  • Dissecting combinatorial transcription factor co-occupancy within and across cell-types.
  • Predicting cell-type specific enhancers from chromatin state profiles.
  • Modeling 3D genome architecture and predicting cell-type specific enhancer-promoter interactions.
  • Learning transcriptional regulatory networks that integrate proximal and distal cis and trans signals.
  • Improving the detection and interpretation of potentially causal disease-associated variants from genome-wide association studies.
  • More recently, we have also been developing
  • Interpretable deep learning frameworks for functional genomics and epigenomics.
  • Causal regulatory models by integrating functional genomic data from temporal (e.g. differentiation/reprogramming) and perturbation (e.g. drug response, knockdown, genome-editing) experiments.
  • Early cancer detection and tissue-of-origin deconvolution from liquid biopsy (e.g. cell-free DNA) assays.
  • Methods to understand the relationships between genetic variation, regulatory chromatin variation and expression variation in healthy and diseased individuals.

We are a highly interdisciplinary group, and lab members originate from diverse backgrounds including computer science, pure and applied mathematics, genetics, computational biology, biomedical informatics, and even biophysics. We collaborate extensively with other labs at Stanford and beyond to decipher genome function by integrating cutting-edge computational and experimental approaches.

We are seeking a Machine Learning Engineer to join our core team.  


  • Expand prototypes of machine learning workflows developed by lab members into  robust, stable pipelines that scale efficiently to large datasets.
  • Deploy lab workflows/pipelines in a variety of Cloud compute environments, including Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure, DNA Nexus, and others.
  • The machine learning literature is growing rapidly and it’s a challenge to keep with all the advances in the field. The ML Engineer would help benchmark and test new methods, software applied to functional genomics data projects.
  • Assist in building robust, scalable tools for data cleaning and transformation of raw datasets into inputs for machine learning algorithms.
  • Serve as a technical resource for applications that are developed and used by the team. This entails implementing feature requests from application users.
  • Put together documents to summarize results and/or to present at conferences. For some projects, you may be expected to lead and contribute to peer-reviewed publications and assist with grant applications (NIH/NSF).

* ­ Other duties may also be assigned  


  • Several years of experience designing, training and deploying production scale deep learning models. Extensive expertise with common deep learning frameworks (at least one of the following: TensorFlow, Keras, PyTorch, Torch, Theano).
  • Thorough knowledge of  classical supervised and unsupervised  machine learning algorithms (gradient-boosted trees, Random Forest, k-means clustering, support vector machine-based methods, linear and logistic regression, etc.).
  • Comfortable working with Unix-based systems and SLURM-based compute cluster environments.
  • Familiarity with standard programming practices (version control with git, modular software design, unit test frameworks such as Snakemake, ReadTheDocs documentation).
  • Familiarity with Singularity and Docker container deployment and maintenance.
  • High competence with the Python programming language.
  • Familiarity with R or similar statistical analysis/ visualization framework.
  • Comfortable working with SQL and NoSQL databases (MySQL, tileDB, MongoDB, other triplestore and RDF databases).
  • Strong software engineering skills.


  • Strong communication skills
  • Comfort with a high learning curve and rapidly changing code-base  
  • Team player


  • HTML
  • Comfort with web programming/ UI framework such as Javascript, Flask, PHP, Django
  • Strong writing skills
  • Familiarity with epigenomics and functional genomics and common tools utilized in the field (i.e. alignment algorithms, variant calling, gene expression analysis, GWAS)

* ­ Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.   


PhD in Bioinformatics, Computational Biology, Computer Science, Statistics, Mathematics, or a related field, or a combination of BS/MS and a minimum of 5 years of experience.

Contact information:

Please direct all applicants to  

Affirmative Action statement:

Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Stanford welcomes applications from all who would bring additional dimensions to the University’s research, teaching, and clinical missions.”

Additional Information

  • Schedule: Full-time
  • Job Code: 6438
  • Employee Status: Fixed-Term
  • Grade: R99
  • Department URL: