Overview
The Code Genius is an AI compute orchestration platform. Instead of calling OpenAI, Anthropic, or Groq directly, your application sends requests through The Code Genius routing layer — which automatically selects the cheapest model that meets the quality requirements for that specific task.
The platform exposes a unified /api/public/chat endpoint compatible with the OpenAI chat completions format, so migration from direct API calls requires minimal code changes.
42%
Average cost reduction
< 50ms
Routing latency overhead
7
Supported AI providers
Routing Engine
The routing engine is the core of The Code Genius. Every request passes through a multi-stage pipeline before any AI model is invoked.
How it works
Task Classification
The request is analyzed for task type (code, research, data, general, wab) and estimated complexity (low / medium / high).
Policy Check
PII detection runs. If sensitive content is found, it is redacted before leaving your org boundary. Regional routing rules are applied.
KB Cache Lookup
Semantically similar past answers are searched in your Knowledge Base. A cache hit means zero API cost and sub-5ms response.
Model Selection
Based on complexity + policy, the cheapest qualifying model is selected: Ollama → Groq → GPT-4o Mini → Claude Haiku → GPT-4o.
Response + Logging
The response is streamed back. Cost, latency, model used, and routing decision are logged to UsageLog for the FinOps dashboard.
Routing strategies
You can override the default routing strategy per request or at the organization level:
COST_FIRSTAlways use the cheapest sufficient model (default).
QUALITY_FIRSTAlways use the best available model regardless of cost.
LOCAL_FIRSTPrefer Ollama. Only fall back to cloud if local fails.
BALANCEDWeighted scoring: 60% cost + 40% quality.
// Override routing strategy per request
const res = await fetch('/api/public/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Routing-Strategy': 'QUALITY_FIRST', // optional override
},
body: JSON.stringify({
messages: [{ role: 'user', content: 'Review this code for bugs...' }],
agent: 'code',
}),
})AI Agents
The Code Genius ships 5 specialized agents. Each agent has a tailored system prompt optimized for its domain and is automatically routed to the most appropriate model tier.
agent="code"Code AgentOptimized for code review, security analysis, refactoring, and debugging. Automatically escalates to GPT-4o for complex security or architectural reviews.
Use cases: PRs, code review, bug fixing, refactoring, SQL injection detection
agent="general"General AIContext-aware assistant for documentation writing, planning, Q&A, and general knowledge. Uses Ollama/Groq for most requests.
Use cases: Documentation, planning, Q&A, brainstorming, writing
agent="research"Research AgentDeep analysis, technical comparisons, synthesis of complex topics. Uses higher-tier models more frequently due to reasoning requirements.
Use cases: Architecture decisions, technology comparisons, deep technical analysis
agent="data"Data AgentSQL generation, chart recommendations, dataset analysis, and schema advice. Highly optimized for structured data tasks.
Use cases: SQL queries, data analysis, schema design, ETL logic
agent="wab"WAB AgentConfigures, validates, and monitors Web Agent Bridge DNS records and manifest files. Unique to The Code Genius.
Use cases: WAB setup, DNS record validation, manifest generation, agent discovery
Supported Models
All models are accessible through the single /api/public/chat endpoint. You do not need to manage API keys per provider — only configure what you want enabled in the Admin panel.
| Model | Provider | Cost / 1K tokens | Tier | Default use |
|---|---|---|---|---|
| llama3.2 (3B) | Ollama (local) | $0.000 | Local | All simple tasks first |
| llama3.2:1b | Ollama (local) | $0.000 | Local | Ultra-fast local fallback |
| llama-3.1-8b-instant | Groq | $0.000 | Free Cloud | Medium tasks, fast inference |
| mistral-7b-instruct | Mistral | $0.00025 | Open Source | Multilingual, code assist |
| gpt-4o-mini | OpenAI | $0.0006 | Cheap Frontier | Best quality/cost ratio |
| claude-3-haiku | Anthropic | $0.0008 | Cheap Frontier | Fast, accurate reasoning |
| gpt-4o | OpenAI | $0.005 | Premium | Complex reasoning only |
Knowledge Base
Every interaction is semantically indexed in your organization's private Knowledge Base (PII is stripped before indexing). Future requests that are semantically similar (cosine similarity > 0.92) are served directly from cache — with zero API cost.
Teams on the Free plan benefit from a shared public KB. Paid plans get a private org-scoped KB that only learns from your own interactions.
How KB indexing works
// KB is automatic — every response is indexed.
// You can also manually add entries via API:
POST /api/kb/entries
Authorization: Bearer <your-api-key>
{
"question": "How do I handle JWT expiry in Next.js?",
"answer": "Use middleware to check token expiry...",
"tags": ["auth", "nextjs", "jwt"],
"visibility": "org" // "public" | "org" | "private"
}AI FinOps
Every routed request is logged with: model used, input/output token counts, cost in USD, latency in ms, task type, agent, user ID, project ID, and routing decision reason.
Cost breakdown API
# Get cost breakdown for your org (last 30 days)
GET /api/finops/summary?period=30d
Authorization: Bearer <your-api-key>
# Response
{
"total_cost_usd": 47.32,
"saved_vs_gpt4_usd": 312.18,
"saving_pct": 86.8,
"by_model": {
"ollama/llama3.2": { "requests": 4821, "cost": 0.00 },
"groq/llama-3.1-8b": { "requests": 2103, "cost": 0.00 },
"openai/gpt-4o-mini": { "requests": 891, "cost": 12.44 },
"anthropic/claude-haiku":{ "requests": 203, "cost": 8.11 },
"openai/gpt-4o": { "requests": 41, "cost": 26.77 }
}
}Policy & Compliance
The Policy Engine lets you define routing rules that are enforced on every request — before any content leaves your organization boundary. This is especially critical for regulated industries (healthcare, finance, legal) and companies with data residency requirements (PDPL in Saudi Arabia, GDPR in Europe).
Example policies
{
"policies": [
{
"name": "no-pii-to-cloud",
"trigger": "contains_pii",
"action": "REDACT_AND_LOCAL_ONLY",
"description": "Never send PII to external APIs"
},
{
"name": "saudi-residency",
"trigger": "org_region == 'SA'",
"action": "ROUTE_TO_REGIONAL",
"region": "me-central-1",
"description": "Saudi data stays in Saudi region"
},
{
"name": "budget-cap",
"trigger": "monthly_cost_usd > 500",
"action": "DOWNGRADE_TO_FREE_TIER",
"description": "Hard budget cap — fallback to Ollama/Groq only"
}
]
}Policy rules are evaluated in order. The first matching rule wins. Place more specific rules before general ones.
WAB Protocol
Every project created in The Code Genius automatically receives a Web Agent Bridge (WAB) configuration. WAB is an open protocol that makes your site discoverable and executable by AI agents via DNS TXT records.
Automatic WAB setup
When you create a project and link your domain, The Code Genius generates:
DNS TXT record
Add to your DNS: _wab.yourdomain.com → wab-endpoint=https://yourdomain.com/api/wab
wab.json manifest
Auto-generated manifest declaring your AI-callable intents (chat, search, execute)
WAB endpoint
/api/wab — proxied through The Code Genius routing layer, respecting all your policies
; Add these two records to your domain's DNS: _wab.yourdomain.com. TXT "wab-endpoint=https://yourdomain.com/api/wab v=1.0 caps=chat,search" yourdomain.com TXT "wab-project=proj_xxxxxxxxxxxxxxxx"
WAB manifest structure
{
"version": "1.0",
"provider": "thecodegenius",
"project": "proj_xxxxxxxxxxxxxxxx",
"intents": {
"chat": {
"method": "POST",
"endpoint": "/api/wab/chat",
"auth": "bearer",
"rate_limit": "60/min"
},
"search": {
"method": "GET",
"endpoint": "/api/wab/search",
"auth": "none"
}
},
"trust_level": "verified",
"region": "global"
}REST API Reference
All endpoints are under https://thecodegenius.com/api. Authenticate with Authorization: Bearer YOUR_API_KEY.
/api/public/chatSend a chat message. Supports streaming via SSE. No auth required for free tier.
{ "messages": [...], "agent": "code" | "general" | "research" | "data" | "wab" }/api/finops/summaryGet cost breakdown and savings summary for your organization.
Query: ?period=7d|30d|90d/api/kb/entriesList Knowledge Base entries for your org.
Query: ?q=search+term&limit=20/api/kb/entriesManually add an entry to the Knowledge Base.
{ "question": "...", "answer": "...", "tags": [], "visibility": "org" }/api/modelsList all active AI models available in your plan.
/api/usageGet paginated usage logs with model, cost, latency per request.
Query: ?page=1&limit=50&agent=codeAPI Key management
# Generate a new API key from your dashboard, or via API:
POST /api/keys
Authorization: Bearer <existing-key>
{ "name": "production", "scopes": ["chat", "kb:read", "finops:read"] }
# Response
{ "key": "tcg_live_xxxxxxxxxxxxxxxxxxxxxxxx", "name": "production" }
# Use it:
curl https://thecodegenius.com/api/public/chat \
-H "Authorization: Bearer tcg_live_xxxx" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Review this function"}],"agent":"code"}'VS Code Extension
The Code Genius VS Code extension brings the routing engine directly into your editor. It replaces GitHub Copilot suggestions with cost-optimized completions routed through your org's policy and budget rules.
Inline completions
Tab-complete suggestions routed to the cheapest model that can handle the current context.
Code review panel
Right-click → "Review with AI" — sends the selection to the Code Agent.
Chat sidebar
Full agent chat panel inside VS Code, with your KB and cost tracking.
Cost indicator
Status bar shows estimated cost of the last completion in real time.
Install from the VS Code Marketplace: search "The Code Genius" or download the .vsix from your dashboard under Settings → Extensions.
Self-hosting
The Code Genius can be self-hosted for complete data sovereignty. The platform runs on Node.js + Next.js with SQLite (or PostgreSQL for production scale).
Quick start with Docker
# 1. Clone the repo git clone https://github.com/thecodegenius/platform cd platform # 2. Set environment variables cp .env.example .env.local # Edit .env.local — set DATABASE_URL, OLLAMA_URL, etc. # 3. Start with Docker Compose docker compose up -d # 4. Initialize database docker compose exec app npx prisma migrate deploy docker compose exec app node scripts/seed.js # Platform runs at http://localhost:3004
Required environment variables
# .env.local DATABASE_URL="file:./prisma/dev.db" # or postgresql://... NEXTAUTH_SECRET="your-secret-here" NEXTAUTH_URL="https://yourdomain.com" # AI Providers (add only what you want enabled) OPENAI_API_KEY="sk-..." ANTHROPIC_API_KEY="sk-ant-..." GROQ_API_KEY="gsk_..." MISTRAL_API_KEY="..." OLLAMA_URL="http://localhost:11434" # local Ollama instance OLLAMA_MODEL="llama3.2" # Stripe (optional, for billing) STRIPE_SECRET_KEY="sk_live_..." STRIPE_WEBHOOK_SECRET="whsec_..." # WAB integration (optional) WAB_INTERNAL_URL="http://localhost:3003"
Ready to cut your AI costs?
Start free with 3 projects. No credit card required. Routing engine starts saving immediately.