Contengine

A distributed orchestration gateway for local LLM inference servers with JWT auth, rate limiting, and isolated MCP tool containers.

2026

Ongoing

ActiveOpen Source

Contengine Logo

Contengine is a production-grade MCP (Model Context Protocol) server that functions as an orchestration gateway between users and a local NVIDIA inference server running Qwen3.6-35B-MoE via vLLM. The system handles JWT authentication, API key management, rate limiting, tool routing, and multi-user isolation across two hosts (a VPS and a local LAN DGX Spark connected via Tailscale). The architecture is built around a custom FastAPI gateway that intercepts `tool_use` messages from the vLLM inference engine, resolves tools to isolated MCP tool containers, and loops results back to the model. Each user gets dynamically provisioned Docker Compose stacks for complete isolation. The system was designed for asymmetric resilience — external users retain access to their data layer even during WAN drops, since the inference engine remains accessible locally. The project includes 20+ MCP tools written with the MCP SDK, full Observability stack with Prometheus/Grafana dashboards, a Qdrant vector store with sentence-transformers embeddings, and a graphrag corpus management system for multi-user context injection.

Technologies Used

language

Python

framework

FastAPIMCP SDKvLLM

tool

DockerPrometheusTailscale

database

Qdrant

Challenges

Custom FastAPI gateway to intercept `tool_use` messages from vLLM and resolve tools to MCP containers — iterative tool→observe→tool loop that standard proxies (like LiteLLM) don't support. Asymmetric resilience design: external users must retain access to their data layer when WAN drops, requiring careful separation of auth/routing (VPS) from inference (LAN DGX Spark). Multi-user isolation via dynamic Docker Compose provisioning without shared containers. Distributed debugging across two hosts connected via Tailscale. JWT auth, rate limiting, and observability (Prometheus/Grafana) across the full stack.

Key Learnings

Custom FastAPI gateway middleware for LLM tool interception and MCP container routing. Docker dynamic compose provisioning per-user with isolation. Asymmetric resilience patterns for WAN drop scenarios. MCP protocol deep understanding beyond surface tool registration. Prometheus metrics collection and Grafana dashboards for real-time system observability. Tailscale network topology for secure two-host communication. Sentence-transformers embeddings in Qdrant for vector-based context resolution.

Project Details

Difficulty

advanced

Duration

Ongoing

Role

Author & Sole Developer

Related Projects

YouTube Transcript Search interface

YouTube Transcript Search

Full-stack application with browser extensions for capturing and searching YouTube video transcripts across channels.

FastAPISvelteKit

Wecelium

A Postgres-native architecture system for AI-assisted engineering using vector search, graph traversal, and full-text search fusion.

PostgreSQLpgvector

Screenshot of add-on options page

Home Assistant Discourse Backup Sync

Home Assistant Add-on for automated Discourse backup sync

Back to all projects