In this role, you'll:
- Design and build a world-class API Platform with a focus on reliability, performance, and developer experience.
- Develop intuitive APIs and SDKs to integrate multimodal AI capabilities.
- Deploy and optimize AI/ML models into scalable production environments.
- Manage a modern, cloud-native infrastructure (Kubernetes, Docker, IaC).
- Ensure platform reliability through robust monitoring and recovery mechanisms.
- Collaborate on developer and operations workflows (CI/CD, release management).
- Implement secure APIs with fine-grained access control and billing integration.
- Continuously improve platform performance and observability.
What you'll bring:
- 3+ years building and operating large-scale production systems.
- Strong Kubernetes, Docker, and Helm experience (service mesh a plus).
- Proficiency with AWS, GCP, or Azure.
- Hands-on Python and JavaScript/TypeScript skills (backend frameworks).
- Deep understanding of API architecture (REST, gRPC, WebSockets, etc.).
- Experience with PostgreSQL, Redis, and vector databases.
- Familiarity with CI/CD and monitoring tools.
- Bonus: MLOps or developer platform experience.