A distributed orchestration gateway for local LLM inference servers with JWT auth, rate limiting, and isolated MCP tool containers.

Contengine Logo
Custom FastAPI gateway to intercept `tool_use` messages from vLLM and resolve tools to MCP containers — iterative tool→observe→tool loop that standard proxies (like LiteLLM) don't support. Asymmetric resilience design: external users must retain access to their data layer when WAN drops, requiring careful separation of auth/routing (VPS) from inference (LAN DGX Spark). Multi-user isolation via dynamic Docker Compose provisioning without shared containers. Distributed debugging across two hosts connected via Tailscale. JWT auth, rate limiting, and observability (Prometheus/Grafana) across the full stack.
Custom FastAPI gateway middleware for LLM tool interception and MCP container routing. Docker dynamic compose provisioning per-user with isolation. Asymmetric resilience patterns for WAN drop scenarios. MCP protocol deep understanding beyond surface tool registration. Prometheus metrics collection and Grafana dashboards for real-time system observability. Tailscale network topology for secure two-host communication. Sentence-transformers embeddings in Qdrant for vector-based context resolution.

Full-stack application with browser extensions for capturing and searching YouTube video transcripts across channels.

A Postgres-native architecture system for AI-assisted engineering using vector search, graph traversal, and full-text search fusion.

Home Assistant Add-on for automated Discourse backup sync