DevOps Engineer (Relocate to Singapore)

Singapore

IT

Full-time

  Facebook   Linkedin

We are looking for a hands-on, high-ownership DevOps Engineer to own, scale, and evolve the core platform that powers our entire R&D and engineering ecosystem. In this role, you will architect infrastructure running everything from distributed AI training rigs and high-throughput inference serving to our day-to-day internal engineering tools. You will work deep in the stack across multi-cloud environments, enterprise Kubernetes networks, and high-performance bare metal.

Key Responsibilities

  • Enterprise Kubernetes Orchestration: Scale and operate multi-cluster Kubernetes environments across public clouds (GCP, AWS) and on-premises infrastructure, handling complex control plane operations, node lifecycles, and advanced autoscaling via KEDA and HPA.

  • Hybrid Cloud Architecture: Design, implement, and maintain secure hub-and-spoke and multi-AZ network topologies, balancing public cloud resources with bare-metal on-premises fabrics.

  • AI/ML Infrastructure Management: Optimize and manage our high-density inference platform, leveraging vLLM, AIBrix, and specialized autoscaling across a distributed fleet of NVIDIA GPUs.

  • GitOps & Continuous Delivery: Own the end-to-end CI/CD and GitOps lifecycle, driving secure container multi-stage builds, image optimization, and progressive delivery patterns utilizing ArgoCD or FluxCD.

  • Unified Observability: Maintain a single-pane-of-glass observability ecosystem across the Grafana LGTM stack (Mimir, Loki, Tempo, Pyroscope) while actively pushing toward agent-assisted SRE workflows.

  • Security & Identity Lifecycle: Harden platform security by integrating central IdPs (Keycloak, Google Workspace) via OIDC/SAML, enforcing robust RBAC, and managing enterprise secrets.

  • Data & Compute Automation: Support distributed database and messaging platforms (PostgreSQL HA, Kafka, Redis, OpenSearch) alongside self-service training infrastructure and RunPod burst capacity.

What We Are Looking For

  • Production Kubernetes Expertise: Deep, hands-on understanding of workloads, CNI networking, CSI storage plugins, and advanced event-driven autoscaling. Self-managed or bare-metal K8s experience is a massive plus.

  • Design-Level Networking: Proven ability to engineer real-world network topologies, securely managing private clusters, firewalls, load balancers, and complex routing tables.

  • GitOps & Container Native Mindset: Expert-level execution of Docker multi-stage builds, caching layer optimization, and declarative GitOps delivery pipelines.

  • Full-Stack Observability: Practical experience standing up, tuning, and monitoring metrics, logs, traces, and alert routing configurations from scratch in high-scale environments.

  • Identity & Access Controls: Immediate instinct to wire platform applications into centralized access directories rather than managing localized service accounts.

  • Linux & IaC Foundations: Excellent Linux systems administration proficiency paired with modern Infrastructure as Code tools (Terraform, Terragrunt, or Pulumi).

Bonus Technical Points

  • Production experience running OpenStack services (Nova, Neutron, Cinder) or KVM virtualization.

  • Familiarity with distributed open-source storage architectures like Ceph or Rook-Ceph.

  • Deep understanding of LLM inference internals (PagedAttention, continuous batching, tensor parallelism).

Application form

Full Name *
Email Address *
Phone Number *
Your Resume *
To attach your Resume, click here to upload from your Computer.
Security code *

Submit