Location: Hybrid - KL, Malaysia
Core Responsibilities
-
Architect and oversee sophisticated performance testing frameworks that span distributed systems, APIs, SDKs, and various service integrations across the organization.
-
Plan, run, and evaluate performance, load, soak, and stress testing scenarios that mirror real-world traffic volumes, concurrency levels, and usage patterns.
-
Develop scalable, reusable performance testing infrastructures using tools like JMeter, k6, Gatling, Locust, or LoadRunner Enterprise, and ensure seamless integration into CI/CD pipelines.
-
Create and continually refine the Performance Testing Runbook — a comprehensive guide detailing environment configuration, load profiles, KPIs, execution standards, and performance sign-off requirements.
-
Implement robust observability by instrumenting systems with APM and monitoring platforms (Datadog, New Relic, Grafana, Prometheus, ELK, OpenTelemetry) to align test outcomes with backend telemetry data.
-
Execute in-depth root-cause and bottleneck analyses across the full technology stack — including network, API layers, caching mechanisms, databases, and compute resources — to identify performance constraints and improvement opportunities.
-
Work closely with backend engineers, DevOps, and SRE teams to evaluate system capacity, investigate regressions, validate tuning experiments, and forecast performance trends.
-
Automate the collection and enforcement of performance metrics within CI/CD pipelines, supporting objective release decisions through data-driven gating and clear, visualized performance trends.
-
Provide guidance, mentoring, and technical leadership to QA, automation, and infrastructure teams to elevate the overall maturity of performance engineering across the company.
Required Technical Skills & Expertise
-
Minimum 8 years of practical experience conducting performance, load, and scalability testing for complex, high-volume or enterprise-grade systems.
-
Deep hands-on knowledge of a variety of performance testing tools such as JMeter, k6, Gatling, Locust, LoadRunner Enterprise, NeoLoad, or BlazeMeter.
-
Strong scripting and automation capabilities in languages like Python, JavaScript/TypeScript, or Groovy.
-
Solid understanding of distributed system design, including microservices architectures, API-driven systems, message queues, asynchronous workflows, and event-based processing.
-
Proficiency with profiling and diagnostic utilities such as YourKit, JProfiler, VisualVM, Chrome DevTools, perf, and flame graphs.
-
Ability to analyze and interpret data from APM platforms (Datadog, New Relic, Dynatrace, AppDynamics) and system monitoring tools (Grafana, Prometheus, CloudWatch).
-
Strong foundation in networking and infrastructure concepts: TCP/IP, HTTP/2, caching models, CDN behavior, load balancing techniques, and service mesh patterns.
-
Experience defining SLAs, SLOs, and performance benchmarks, and integrating them into CI/CD workflows as enforceable quality gates.
-
Excellent analytical skills with the ability to distill high-volume telemetry into actionable insights, meaningful stories, and accurate capacity predictions.
-
Confident working with Git-based workflows and CI/CD platforms such as Jenkins, GitHub Actions, GitLab CI, or CircleCI.
Preferred / Bonus Qualifications
-
Experience executing performance testing within cloud-native or containerized environments (Kubernetes, Docker, AWS Fargate, GCP GKE).
-
Knowledge of chaos engineering or resilience testing tools like Gremlin, Litmus, or ChaosMesh.
-
Familiarity with database query optimization and performance tuning for PostgreSQL, MySQL, Redis, or MongoDB.
-
Understanding of API gateway performance patterns (Kong, NGINX, AWS API Gateway).
-
Background in designing, maintaining, or evolving enterprise-scale performance testing frameworks or internal test platforms.
-
Contributions to internal or open-source performance engineering playbooks, guidelines, or tooling.
What Success Looks Like
-
A Performance Testing Runbook that becomes the definitive reference for all future load, stress, and scalability assessments across the organization.
-
Fully automated performance baselines integrated into build and release pipelines, providing continuous insights.
-
Clear, organization-wide visibility into bottlenecks, regression trends, capacity thresholds, and system hotspots.
-
A mature, data-driven performance culture where every release is validated, measured, and continuously optimized.
-
Actionable, evidence-backed recommendations that directly enhance system architecture, product reliability, and engineering quality.
Benefits
Work Tools
-
High-performance Mac workstations plus any additional equipment needed to maximize productivity.
-
Collaboration tools including Google Workspace (Chat, Gmail, Drive), Confluence, Jira, and GitLab.
Professional Development
-
Access to advanced training programs and industry conferences.
-
A strong internal culture of knowledge sharing and cross-team learning.
Additional Perks
-
Comprehensive medical, dental, and optical insurance coverage for employees and their dependents.
-
Flexible working hours tailored to individual schedules and team needs.
-
No formal dress code.
-
A modern, comfortable office environment designed for focused and collaborative work.