🔬 AI/LLM Research Engineer
We are looking for an AI Research Engineer who seamlessly blends frontier research curiosity with engineering discipline. You will join our Models Team, dedicated to training a state-of-the-art model capable of executing complex, long-running deep research tasks for millions of users globally.
This role is ideal for an individual who thrives in high-performance computing environments, deeply understands the nuances of training large-scale models (LLMs, VLMs, etc.), and is obsessed with fast, reproducible experimentation and applied product usability.
About the Solution
We provide an Open Intelligence solution built for simplicity across web, desktop, browser, and mobile. A powerful local mode ensures that user data remains entirely private.
Key Responsibilities
As a core member of the Models Team, you will drive the model development lifecycle:
-
Model Usability: Obsess over the direct usability and performance of the model for end-users, ensuring research translates into product value.
-
Deep Iteration: Design and execute high-throughput AI experiments and training runs. Scrutinize results, diagnose issues, and iterate rapidly on model architecture and training techniques.
-
Production Readiness: Ensure continuous, smooth productionization of model checkpoints in close collaboration with the product engineering team.
-
Algorithm Advancement: Continuously iterate upon and implement the latest Reinforcement Learning (RL) techniques (e.g., GRPO, DPO, RePO, etc.).
-
Scalable Infrastructure: Build, maintain, and optimize modular, scalable training codebases and efficient data pipelines (synthetic and real).
-
Distributed Training: Ensure training jobs efficiently scale across multiple GPUs and nodes using techniques like FSDP, DDP, and NCCL.
-
Code Rigor: Maintain long-term code health by writing clean, testable, and reproducible code with a commitment to engineering best practices.
-
Open Source Contribution: Actively contribute to upstream open source dependencies and the broader machine learning community.
Requirements
-
Deep Expertise in Python and Training Frameworks: Proven proficiency in PyTorch (or equivalent) for large-scale deep learning and reinforcement learning model training in real-world settings.
-
Training Dynamics Mastery: Strong understanding of training dynamics—what causes models to fail, how to debug, and how to stabilize complex training processes.
-
Data & Pipeline Experience: Experience working with large datasets and designing complex, efficient data ingestion and processing pipelines.
-
Tooling Familiarity: Experience with job launchers, logging tools (e.g., Weights & Biases, TensorBoard), and robust checkpointing systems.
-
Engineering Mindset: A strong mindset of engineering rigor applied to research: readable code, thoughtful design, and commitment to reproducibility.
Bonus Points (Highly Desired)
-
Active Open Source contributions to PyTorch or ML tooling.
-
Experience with TorchScript, ONNX, or developing custom inference runtimes.
-
Experience working on transformer models, diffusion models, VLMs, or large-scale vision/NLP tasks.
-
Familiarity with cluster environments, batch schedulers (like SLURM), and GPU resource management.
-
Ability to collaborate closely with systems engineers or MLOps teams for seamless model integration.