Intelligent Document Processing Engineer ( Backend)

Ho Chi Minh

IT

Full-time

  Facebook   Linkedin

As a Backend Engineer at the company, you will design and build a multi-agent Python pipeline that transforms unstructured exam papers (PDFs) into structured question–answer schemas used by our student-facing application. You will work with FastAPI, LLMs, OCR/Vision models, LangChain-style agent frameworks, and scalable cloud infrastructure.

This role sits at the intersection of GenAI engineering, backend systems design, and EdTech product development.


Key Responsibilities

1. Backend Engineering (FastAPI & Pipeline Architecture)

  • Architect and implement a modular, multi-stage Python pipeline covering OCR, content extraction, parsing, and question structuring.

  • Own, deploy, and maintain FastAPI microservices that integrate with LLMs, OCR engines, and internal data stores.

  • Build reliable logging, monitoring, and cost-tracking systems across all pipeline components and agents.

2. GenAI, LLMs & Multi-Agent Systems

  • Create multi-agent orchestration flows using frameworks like LangChain, LlamaIndex, or custom-built agent systems.

  • Implement LLM-driven extraction, structuring, and reasoning workflows for exam content.

  • Manage agent input/output schemas, reasoning traces, and message protocols.

  • Ensure consistently structured outputs across a wide range of exam formats.

3. OCR & Computer Vision

  • Integrate OCR and vision models for extracting text, diagrams, and images from PDFs.

  • Handle multimodal input types such as equations, tables, graphs, and visual elements.

  • Ensure reliable capture of mathematical notation and diagram-based questions.

4. Question Schema & Data Modelling

  • Design scalable schemas for:

    • Questions and sub-questions

    • Answer keys

    • Diagrams, images, and mathematical expressions

    • Metadata supporting automated/LLM-based grading

  • Ensure schema output supports both UI rendering and automated evaluation workflows.

5. Quality, Observability & Reliability

  • Implement structured, timestamped logging at each pipeline stage.

  • Build monitoring layers to track pipeline failures, false positives, and system health.

  • Develop automated tests for multi-stage and multi-agent workflows.


Technical Requirements

Core Skills

  • Strong proficiency in Python (3.9+)

  • Experience building production systems using FastAPI

  • Deep hands-on experience with LLMs, prompt engineering, and agentic workflows

  • Familiarity with OCR engines (Tesseract, PaddleOCR, Vision-Language models such as GPT-5, Gemini 3 Pro, Qwen3-VL, etc.)

  • Experience with PDF parsing libraries (pymupdf, pdfplumber, unstructured, etc.)

  • Understanding of LangChain BaseMessage schemas or equivalent custom agent designs

  • Ability to build and manage multi-stage pipelines with clean architecture

Software Engineering Best Practices

  • Strong focus on modular design, documentation, and maintainability

  • Experience implementing logging, monitoring, and analytics for backend systems

  • Comfort working with cloud environments for scaling inference workloads

  • Experience with CI/CD, unit testing, and integration testing

  • Experience modelling structured and unstructured data

Bonus Skills

  • Experience with:

    • Vector databases (Pinecone, Weaviate, Chroma)

    • Multi-agent orchestration frameworks

    • Educational content processing / assessment workflows

  • Knowledge of math or science content (optional but helpful)


What You Will Build (Example Projects)

You will directly contribute to:

  • Automated OCR → Question Schema → Answer Key pipelines

  • LLM-based grading systems for handwritten or digital submissions

  • Multi-agent reasoning systems with detailed trace logging

  • Scalable ingestion frameworks capable of processing 10,000+ exam papers across regions and syllabi

  • Multimodal processing pipelines for diagrams, charts, graphs, and mathematical expressions

Application form

Full Name *
Email Address *
Phone Number *
Your Resume *
To attach your Resume, click here to upload from your Computer.
Security code *

Submit