AI Engineer

Job Category: AI Engineer Software Development
Job Type: Full Time
Job Location: Dubai

Location: Dubai, United Arab Emirates (Micropolis Robotics) 

About the Role 

We’re building an AI Assistant that serves as a digital concierge, helpdesk, and knowledge hub inside our
mobile application. Users will:

  • Ask questions about our services and their own data (documents and tables)
  • Perform actions by invoking our application services through the assistant (tool use/MCP/API integrations)
  • Converse via both text and real‐time voice

The AI system runs using open‐source models and tooling, with on‐premise deployment as a Ërst‐class option.

Key Responsibilities

Design, build, and operate a production server‐side AI assistant that:
– Answers questions grounded in user‐scoped data (docs, tables) with citations where applicable
– Performs actions by calling internal/external APIs securely on the user’s behalf
– Supports low‐latency, real‐time voice chat (streaming STT/TTS + incremental LLM responses)

Implement the tool/agent layer:
– Structured tool calling (JSON‐schema based) to integrate business services
– Model Context Protocol (MCP) servers/clients where appropriate for tool discovery and execution

Architect retrieval‐augmented generation (RAG):
– Ingestion for documents and tables, parsing, chunking, embeddings, metadata, and indexing
– Hybrid retrieval (sparse+dense), query rewriting, and answer attribution

Deliver performant, cost‐eÉcient inference on open‐source models:
– Model selection/routing; context management; caching/batching; streaming token delivery
– GPU utilization and serving via vLLM/TGI/llama.cpp/Ollama or similar

Build resilient APIs and real‐time integrations:
– WebSockets/WebRTC/gRPC for streaming voice; REST/GraphQL for control and orchestration

Productionize and operate on server/on‐prem:
– Containerize with Docker; automate CI/CD; implement logs/metrics/traces (OpenTelemetry)
– Evals, A/B tests, safety/guardrails, and human‐in‐the‐loop feedback

Required Skills and Experience

  • Proficiency in Python
  • Open‐source LLMs and serving:
    • Experience with models like Llama, Mistral/Mixtral, Qwen, etc.
    • Serving stacks: vLLM, Text Generation Inference (TGI), llama.cpp, Ollama
    • Prompt engineering, routing, context/window management
  • RAG and data systems:
    • Vector DBs (like FAISS, Qdrant, Weaviate, Milvus, pgvector) and hybrid search
    • Document/table ingestion and normalization; schema/metadata design
  • Real‐time voice:
    • STT/TTS (e.g., Whisper/faster‐whisper, Vosk, Coqui TTS, Piper) with streaming pipelines
    • Low‐latency streaming via WebSockets/WebRTC/gRPC
  • Tooling for actions:
    • Structured tool/function calling, API design/integration, and service orchestration
    • Familiarity with Model Context Protocol (MCP) concepts and usage
  • Deployment and operations (on‐prem Ërst):
    • Docker, Linux, networking, and secure service deployment
    • GPU stacks (CUDA/drivers/containers) and performance tuning
  • Excellent communication, documentation, and cross‐functional collaboration

Preferred Skills

  • Agents/frameworks: LangChain, LlamaIndex, Semantic Kernel, or custom tool routers
  • Advanced retrieval: multi‐vector stores, RRF/hybrid search, query planning, re‐ranking
  • SQL generation and safe execution over tabular data; row‐level security; schema mapping
  • Document processing: OCR, table extraction, CSV/Parquet pipelines
  • Serving/perf: Triton, quantization (GGUF/GGML), LoRA/QLoRA with PEFT, KV‐cache optimizations
  • Evals and observability: Ragas/DeepEval, Langfuse/PromptLayer, OpenTelemetry

Nice to Have (Not Required)

  • Public cloud experience (AWS/Azure/GCP) for hybrid or future deployments

Apply for this position

Backend Developer

Allowed Type(s): .pdf
Preloader image