Data Curation Engineer

  • South San Francisco, CA
  • Full Time
  • Computing


Who we are:

Calico is a research and development company whose mission is to understand the biology of aging, and to help people to lead longer and healthier lives. We aim to combine the best biomedical science with cutting edge technology and computing. We are a nimble, fast-paced startup, while also having secure, long-term funding.

Position description:

Biological insight requires clean, robust biological data. Calico is seeking a data curation engineer to collect, denoise, and organize data generated by the broader research community related to aging and aging-related diseases. Large-scale data generation has tremendously benefitted research efforts in genome biology, and additional data, using novel technologies and new experimental protocols are constantly being generated. Undoubtedly, greater insights lie awaiting discovery in improved processing of these data. You will tackle the challenge of identifying key resources, navigating data in-house, determining proper pre-processing techniques, detecting problematic instances, and preparing data for integration and analysis by a world-class data science team. Relevant data will span multiple organisms (from yeast to human), scales (from molecules to entire organisms), modalities (from sequencing to imaging to physiology), and time scales (from single time points to long-term time series). You will need to develop scalable and reusable pipelines to enable Calico to take advantage of these data to understand aging biology and improve human health.

In this role, you will be a key member in a range cross-functional efforts that involve engineers, biomedical scientists, and computational biologists. You will also be joining and helping form the culture for a new team in a company that is both a nimble startup but also has a firm financial footing.

Position requirements:

  • 4+ years of experience in data processing and/or analysis, and familiarity with current data processing technologies
  • At least 2 years of hands-on work experience with real biological data (e.g., DNA sequencing, gene expression data, mass spec, imaging), either in an academic environment or in industry, including experience with biological data analysis tools and databases (e.g., dbGAP, GTeX, GEO, Ensembl, GO, TCGA)
  • Strong coding skills and substantial experience coding in at least one major scripting language (e.g., Python or R); experience with relational databases and software engineering a significant plus
  • Understanding of biological experiment design methodology, and ability to assess the strength of evidence supporting a scientific finding
  • Track record of effective collaboration in a cross-functional environment with people of diverse backgrounds
  • Degree in Biological Science, Computer Science, Statistics, Bioinformatics, or related technical field, or equivalent practical experience
  • A mindset of flexibility and the desire to learn new types of data and new tools
  • Strong analytical and quantitative skills, including some familiarity with state-of-the-art methods in ML and/or advanced statistics, is a plus, not required.
Read More

Apply for this position

Attach resume ( .pdf, .doc, .docx ) or Paste resume

Paste your resume here or Attach resume file

To comply with government Equal Employment Opportunity / Affirmative Action reporting regulations, we are requesting (but NOT requiring) that you enter this personal data. This information will be used solely for government reporting purposes, and will not be used as selection criteria. Your voluntary cooperation would be appreciated.
Veteran/Disability status