Job Category: AI Engineer Software Development

Job Type: Full Time

Job Location: Dubai

Location: Dubai, United Arab Emirates (Micropolis Robotics)

About the Role

We’re building an AI Assistant that serves as a digital concierge, helpdesk, and knowledge hub inside our
mobile application. Users will:

Ask questions about our services and their own data (documents and tables)
Perform actions by invoking our application services through the assistant (tool use/MCP/API integrations)
Converse via both text and real‐time voice

The AI system runs using open‐source models and tooling, with on‐premise deployment as a Ërst‐class option.

Key Responsibilities

Design, build, and operate a production server‐side AI assistant that:
– Answers questions grounded in user‐scoped data (docs, tables) with citations where applicable
– Performs actions by calling internal/external APIs securely on the user’s behalf
– Supports low‐latency, real‐time voice chat (streaming STT/TTS + incremental LLM responses)

Implement the tool/agent layer:
– Structured tool calling (JSON‐schema based) to integrate business services
– Model Context Protocol (MCP) servers/clients where appropriate for tool discovery and execution

Architect retrieval‐augmented generation (RAG):
– Ingestion for documents and tables, parsing, chunking, embeddings, metadata, and indexing
– Hybrid retrieval (sparse+dense), query rewriting, and answer attribution

Deliver performant, cost‐eÉcient inference on open‐source models:
– Model selection/routing; context management; caching/batching; streaming token delivery
– GPU utilization and serving via vLLM/TGI/llama.cpp/Ollama or similar

Build resilient APIs and real‐time integrations:
– WebSockets/WebRTC/gRPC for streaming voice; REST/GraphQL for control and orchestration

Productionize and operate on server/on‐prem:
– Containerize with Docker; automate CI/CD; implement logs/metrics/traces (OpenTelemetry)
– Evals, A/B tests, safety/guardrails, and human‐in‐the‐loop feedback

Required Skills and Experience

Proficiency in Python
Open‐source LLMs and serving:
- Experience with models like Llama, Mistral/Mixtral, Qwen, etc.
- Serving stacks: vLLM, Text Generation Inference (TGI), llama.cpp, Ollama
- Prompt engineering, routing, context/window management
RAG and data systems:
- Vector DBs (like FAISS, Qdrant, Weaviate, Milvus, pgvector) and hybrid search
- Document/table ingestion and normalization; schema/metadata design
Real‐time voice:
- STT/TTS (e.g., Whisper/faster‐whisper, Vosk, Coqui TTS, Piper) with streaming pipelines
- Low‐latency streaming via WebSockets/WebRTC/gRPC
Tooling for actions:
- Structured tool/function calling, API design/integration, and service orchestration
- Familiarity with Model Context Protocol (MCP) concepts and usage
Deployment and operations (on‐prem Ërst):
- Docker, Linux, networking, and secure service deployment
- GPU stacks (CUDA/drivers/containers) and performance tuning
Excellent communication, documentation, and cross‐functional collaboration

Preferred Skills

Agents/frameworks: LangChain, LlamaIndex, Semantic Kernel, or custom tool routers
Advanced retrieval: multi‐vector stores, RRF/hybrid search, query planning, re‐ranking
SQL generation and safe execution over tabular data; row‐level security; schema mapping
Document processing: OCR, table extraction, CSV/Parquet pipelines
Serving/perf: Triton, quantization (GGUF/GGML), LoRA/QLoRA with PEFT, KV‐cache optimizations
Evals and observability: Ragas/DeepEval, Langfuse/PromptLayer, OpenTelemetry

Nice to Have (Not Required)

Public cloud experience (AWS/Azure/GCP) for hybrid or future deployments

Apply for this position

Backend Developer