Lead Data Engineer

Remote

IT

Full-time

  Facebook   Linkedin

Overview
Join a pioneering health-technology team focused on harnessing gut microbiome insights to drive disease prevention, early detection, and treatment strategies—from metabolic disorders and cancers to neurodegenerative conditions and allergies. You’ll help build and maintain the foundational data backbone that powers research discoveries and product innovations. 

Your Mission

As the senior data engineering lead, you will:
 • Own Data Architecture: Craft and evolve robust pipelines for ingesting, cleansing, transforming, and storing vast volumes of microbiome and clinical datasets.
 • Enable Bioinformatics Workflows: Partner with bioinformatics specialists, data scientists, and researchers to integrate genomic and health data into analytics environments.
 • Optimize Cloud Infrastructure: Design, deploy, and fine-tune AWS-based services (e.g., S3, Glue, Lambda, EC2, RDS) ensuring high availability, security, and cost-effectiveness.
 • Champion Data Quality & Governance: Establish standards for data lineage, validation, and access control—guaranteeing accuracy, traceability, and compliance.
 • Promote Engineering Best Practices: Lead code reviews, enforce modular design and automated testing, and mentor fellow engineers.
 • Guide Strategic Initiatives: Advise on scaling strategies, system resilience, and emerging technologies to support evolving health-data needs.

What You Bring
 • Proven Expertise: Minimum 5 years in data engineering with a strong software development foundation.
 • Technical Fluency: Advanced Python (including Pandas, PySpark) and JavaScript/TypeScript for developing tooling and APIs.
 • Cloud Mastery: Hands-on experience architecting and managing AWS environments via infrastructure-as-code.
 • Data Modeling Savvy: Deep knowledge of relational/non-relational databases, schema design, and ETL/ELT patterns.
 • Health Data Experience: Working familiarity with structured and semi-structured clinical or genomic datasets.
 • Workflow Orchestration: Competence with tools such as Airflow, AWS Step Functions, or Nextflow.
 • Communication & Documentation: Ability to clearly articulate designs, document systems, and collaborate across disciplines.

Desirable
 • Background in bioinformatics, genomics, or life-science data systems.
 • Integration experience with laboratory or R&D software platforms.
 • Understanding of healthcare data security and regulatory frameworks

Application form

Full Name *
Email Address *
Phone Number *
Your Resume *
To attach your Resume, click here to upload from your Computer.
Security code *

Submit