Key Responsibilities
-
Architect & Scale Systems: Lead the design, development, and optimization of fault-tolerant, high-performance backend services and asynchronous APIs using Python (FastAPI) or Java (Vert.x, Quarkus), ensuring they handle large-scale traffic with minimal latency.
-
Full Product Lifecycle Ownership: Oversee features from ideation and prototyping to testing, deployment, and continuous improvement—working closely with product managers, designers, and DevOps engineers.
-
Event-Driven Systems: Build and maintain asynchronous, real-time pipelines (e.g., pub/sub, message queues) to support fast data ingestion, processing, and third-party integrations.
-
Performance & Reliability: Define and enforce best practices for monitoring, observability, and automated testing. Implement caching, backpressure, rate-limiting, and connection pooling strategies to achieve 99.9%+ uptime.
-
Service Monitoring: Design and manage end-to-end observability stacks leveraging Prometheus (metrics) and Grafana (dashboards/alerting).
-
AI & RAG Solutions: Architect Retrieval-Augmented Generation pipelines and AI-driven agents by integrating vector databases, LLM APIs (e.g., OpenAI), and custom prompt orchestration frameworks.
-
Leadership & Mentorship: Guide mid-level engineers through code reviews, pair programming, and technical discussions, while promoting high engineering standards and cross-team collaboration.
Required Qualifications
-
5+ years of professional software engineering experience, with a strong track record of shipping and maintaining production-grade distributed systems.
-
Advanced expertise in Python (FastAPI) or Java (Vert.x, Quarkus), with deep knowledge of async programming, concurrency models, profiling, and performance tuning.
- Frontend Development Experience: Solid experience with React.js and Next.js for building and maintaining modern web applications
-
Proven ability to design and implement REST, GraphQL, or gRPC APIs that handle high traffic loads, with working knowledge of HTTP/2, WebSockets, and low-latency protocols.
-
Strong cloud background: AWS, GCP, or Azure, containerization with Docker, orchestration via Kubernetes/Helm, and IaC tools like Terraform or CloudFormation.
-
Solid understanding of data storage: relational (PostgreSQL, MySQL) and NoSQL (MongoDB, Redis, DynamoDB), including schema design for scalability.
-
Hands-on experience with asynchronous ecosystems, including message brokers (Kafka, RabbitMQ, AWS SQS), task queues (Celery, AWS Lambda) or Java equivalents, and stream-processing frameworks.
Preferred Qualifications
-
Practical experience with RAG pipelines, vector search technologies (e.g., Pinecone, Weaviate), and building autonomous AI agents using LangChain or similar.
-
Exposure to embedding-based retrieval, prompt engineering, and orchestration of multi-LLM workflows.
-
Familiarity with observability (OpenTelemetry, Prometheus, Grafana) and security best practices (OAuth2, JWT, encryption standards).
-
Strong leadership and communication skills with the ability to influence architecture, mentor peers, and collaborate effectively in cross-functional teams.