We‘re seeking a Senior Software Engineer (Python, MLOps) to build and scale our multimodal AI models into a real-time API platform. You‘ll own the inference lifecycle, from infrastructure management and deployment optimization to API design and performance tuning. Collaborate with our research team to productize cutting-edge model architectures and automate key processes.
Responsibilities:
- Build and deploy multimodal AI models into a real-time API platform.
- Manage infrastructure using Terraform (or similar IaC).
- Optimize deployment and CI/CD pipelines for seamless ML model integration.
- Design and build RESTful/WebRTC APIs for model serving.
- Identify and eliminate inference lifecycle bottlenecks.
- Collaborate with research to productize new model architectures.
- Automate documentation, processes, and systems.
Requirements:
- Production-level MLOps or Python experience.
- Strong backend fundamentals (concurrency, event-driven architectures, caching).
- Experience scaling software with Docker, Kubernetes, or similar.
- API/SDK design and development experience.
- Cloud experience (AWS, GCP, Azure, or on-premise).
- Basic understanding of LLM inference (KV cache, paged attention).