Documentation — v1.0

The Code Genius Docs

Complete technical reference for the routing engine, AI agents, REST API, WAB protocol integration, and self-hosting.

Overview

The Code Genius is an AI compute orchestration platform. Instead of calling OpenAI, Anthropic, or Groq directly, your application sends requests through The Code Genius routing layer — which automatically selects the cheapest model that meets the quality requirements for that specific task.

The platform exposes a unified /api/public/chat endpoint compatible with the OpenAI chat completions format, so migration from direct API calls requires minimal code changes.

42%

Average cost reduction

< 50ms

Routing latency overhead

7

Supported AI providers

Routing Engine

The routing engine is the core of The Code Genius. Every request passes through a multi-stage pipeline before any AI model is invoked.

How it works

01

Task Classification

The request is analyzed for task type (code, research, data, general, wab) and estimated complexity (low / medium / high).

02

Policy Check

PII detection runs. If sensitive content is found, it is redacted before leaving your org boundary. Regional routing rules are applied.

03

KB Cache Lookup

Semantically similar past answers are searched in your Knowledge Base. A cache hit means zero API cost and sub-5ms response.

04

Model Selection

Based on complexity + policy, the cheapest qualifying model is selected: Ollama → Groq → GPT-4o Mini → Claude Haiku → GPT-4o.

05

Response + Logging

The response is streamed back. Cost, latency, model used, and routing decision are logged to UsageLog for the FinOps dashboard.

Routing strategies

You can override the default routing strategy per request or at the organization level:

COST_FIRST

Always use the cheapest sufficient model (default).

QUALITY_FIRST

Always use the best available model regardless of cost.

LOCAL_FIRST

Prefer Ollama. Only fall back to cloud if local fails.

BALANCED

Weighted scoring: 60% cost + 40% quality.

typescript
// Override routing strategy per request
const res = await fetch('/api/public/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Routing-Strategy': 'QUALITY_FIRST',  // optional override
  },
  body: JSON.stringify({
    messages: [{ role: 'user', content: 'Review this code for bugs...' }],
    agent: 'code',
  }),
})

AI Agents

The Code Genius ships 5 specialized agents. Each agent has a tailored system prompt optimized for its domain and is automatically routed to the most appropriate model tier.

agent="code"Code Agent

Optimized for code review, security analysis, refactoring, and debugging. Automatically escalates to GPT-4o for complex security or architectural reviews.

Use cases: PRs, code review, bug fixing, refactoring, SQL injection detection

agent="general"General AI

Context-aware assistant for documentation writing, planning, Q&A, and general knowledge. Uses Ollama/Groq for most requests.

Use cases: Documentation, planning, Q&A, brainstorming, writing

agent="research"Research Agent

Deep analysis, technical comparisons, synthesis of complex topics. Uses higher-tier models more frequently due to reasoning requirements.

Use cases: Architecture decisions, technology comparisons, deep technical analysis

agent="data"Data Agent

SQL generation, chart recommendations, dataset analysis, and schema advice. Highly optimized for structured data tasks.

Use cases: SQL queries, data analysis, schema design, ETL logic

agent="wab"WAB Agent

Configures, validates, and monitors Web Agent Bridge DNS records and manifest files. Unique to The Code Genius.

Use cases: WAB setup, DNS record validation, manifest generation, agent discovery

Supported Models

All models are accessible through the single /api/public/chat endpoint. You do not need to manage API keys per provider — only configure what you want enabled in the Admin panel.

ModelProviderCost / 1K tokensTierDefault use
llama3.2 (3B)Ollama (local)$0.000LocalAll simple tasks first
llama3.2:1bOllama (local)$0.000LocalUltra-fast local fallback
llama-3.1-8b-instantGroq$0.000Free CloudMedium tasks, fast inference
mistral-7b-instructMistral$0.00025Open SourceMultilingual, code assist
gpt-4o-miniOpenAI$0.0006Cheap FrontierBest quality/cost ratio
claude-3-haikuAnthropic$0.0008Cheap FrontierFast, accurate reasoning
gpt-4oOpenAI$0.005PremiumComplex reasoning only

Knowledge Base

Every interaction is semantically indexed in your organization's private Knowledge Base (PII is stripped before indexing). Future requests that are semantically similar (cosine similarity > 0.92) are served directly from cache — with zero API cost.

Teams on the Free plan benefit from a shared public KB. Paid plans get a private org-scoped KB that only learns from your own interactions.

How KB indexing works

typescript
// KB is automatic — every response is indexed.
// You can also manually add entries via API:
POST /api/kb/entries
Authorization: Bearer <your-api-key>

{
  "question": "How do I handle JWT expiry in Next.js?",
  "answer": "Use middleware to check token expiry...",
  "tags": ["auth", "nextjs", "jwt"],
  "visibility": "org"   // "public" | "org" | "private"
}

AI FinOps

Every routed request is logged with: model used, input/output token counts, cost in USD, latency in ms, task type, agent, user ID, project ID, and routing decision reason.

Cost breakdown API

bash
# Get cost breakdown for your org (last 30 days)
GET /api/finops/summary?period=30d
Authorization: Bearer <your-api-key>

# Response
{
  "total_cost_usd": 47.32,
  "saved_vs_gpt4_usd": 312.18,
  "saving_pct": 86.8,
  "by_model": {
    "ollama/llama3.2":       { "requests": 4821, "cost": 0.00 },
    "groq/llama-3.1-8b":     { "requests": 2103, "cost": 0.00 },
    "openai/gpt-4o-mini":    { "requests": 891,  "cost": 12.44 },
    "anthropic/claude-haiku":{ "requests": 203,  "cost": 8.11 },
    "openai/gpt-4o":         { "requests": 41,   "cost": 26.77 }
  }
}

Policy & Compliance

The Policy Engine lets you define routing rules that are enforced on every request — before any content leaves your organization boundary. This is especially critical for regulated industries (healthcare, finance, legal) and companies with data residency requirements (PDPL in Saudi Arabia, GDPR in Europe).

Example policies

json
{
  "policies": [
    {
      "name": "no-pii-to-cloud",
      "trigger": "contains_pii",
      "action": "REDACT_AND_LOCAL_ONLY",
      "description": "Never send PII to external APIs"
    },
    {
      "name": "saudi-residency",
      "trigger": "org_region == 'SA'",
      "action": "ROUTE_TO_REGIONAL",
      "region": "me-central-1",
      "description": "Saudi data stays in Saudi region"
    },
    {
      "name": "budget-cap",
      "trigger": "monthly_cost_usd > 500",
      "action": "DOWNGRADE_TO_FREE_TIER",
      "description": "Hard budget cap — fallback to Ollama/Groq only"
    }
  ]
}

Policy rules are evaluated in order. The first matching rule wins. Place more specific rules before general ones.

WAB Protocol

Every project created in The Code Genius automatically receives a Web Agent Bridge (WAB) configuration. WAB is an open protocol that makes your site discoverable and executable by AI agents via DNS TXT records.

Automatic WAB setup

When you create a project and link your domain, The Code Genius generates:

DNS TXT record

Add to your DNS: _wab.yourdomain.com → wab-endpoint=https://yourdomain.com/api/wab

wab.json manifest

Auto-generated manifest declaring your AI-callable intents (chat, search, execute)

WAB endpoint

/api/wab — proxied through The Code Genius routing layer, respecting all your policies

dns
; Add these two records to your domain's DNS:
_wab.yourdomain.com.  TXT  "wab-endpoint=https://yourdomain.com/api/wab v=1.0 caps=chat,search"
yourdomain.com        TXT  "wab-project=proj_xxxxxxxxxxxxxxxx"

WAB manifest structure

json
{
  "version": "1.0",
  "provider": "thecodegenius",
  "project": "proj_xxxxxxxxxxxxxxxx",
  "intents": {
    "chat": {
      "method": "POST",
      "endpoint": "/api/wab/chat",
      "auth": "bearer",
      "rate_limit": "60/min"
    },
    "search": {
      "method": "GET",
      "endpoint": "/api/wab/search",
      "auth": "none"
    }
  },
  "trust_level": "verified",
  "region": "global"
}

REST API Reference

All endpoints are under https://thecodegenius.com/api. Authenticate with Authorization: Bearer YOUR_API_KEY.

POST/api/public/chat

Send a chat message. Supports streaming via SSE. No auth required for free tier.

{ "messages": [...], "agent": "code" | "general" | "research" | "data" | "wab" }
GET/api/finops/summary

Get cost breakdown and savings summary for your organization.

Query: ?period=7d|30d|90d
GET/api/kb/entries

List Knowledge Base entries for your org.

Query: ?q=search+term&limit=20
POST/api/kb/entries

Manually add an entry to the Knowledge Base.

{ "question": "...", "answer": "...", "tags": [], "visibility": "org" }
GET/api/models

List all active AI models available in your plan.

GET/api/usage

Get paginated usage logs with model, cost, latency per request.

Query: ?page=1&limit=50&agent=code

API Key management

bash
# Generate a new API key from your dashboard, or via API:
POST /api/keys
Authorization: Bearer <existing-key>

{ "name": "production", "scopes": ["chat", "kb:read", "finops:read"] }

# Response
{ "key": "tcg_live_xxxxxxxxxxxxxxxxxxxxxxxx", "name": "production" }

# Use it:
curl https://thecodegenius.com/api/public/chat \
  -H "Authorization: Bearer tcg_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Review this function"}],"agent":"code"}'

VS Code Extension

The Code Genius VS Code extension brings the routing engine directly into your editor. It replaces GitHub Copilot suggestions with cost-optimized completions routed through your org's policy and budget rules.

Inline completions

Tab-complete suggestions routed to the cheapest model that can handle the current context.

Code review panel

Right-click → "Review with AI" — sends the selection to the Code Agent.

Chat sidebar

Full agent chat panel inside VS Code, with your KB and cost tracking.

Cost indicator

Status bar shows estimated cost of the last completion in real time.

Install from the VS Code Marketplace: search "The Code Genius" or download the .vsix from your dashboard under Settings → Extensions.

Self-hosting

The Code Genius can be self-hosted for complete data sovereignty. The platform runs on Node.js + Next.js with SQLite (or PostgreSQL for production scale).

Quick start with Docker

bash
# 1. Clone the repo
git clone https://github.com/thecodegenius/platform
cd platform

# 2. Set environment variables
cp .env.example .env.local
# Edit .env.local — set DATABASE_URL, OLLAMA_URL, etc.

# 3. Start with Docker Compose
docker compose up -d

# 4. Initialize database
docker compose exec app npx prisma migrate deploy
docker compose exec app node scripts/seed.js

# Platform runs at http://localhost:3004

Required environment variables

bash
# .env.local
DATABASE_URL="file:./prisma/dev.db"       # or postgresql://...
NEXTAUTH_SECRET="your-secret-here"
NEXTAUTH_URL="https://yourdomain.com"

# AI Providers (add only what you want enabled)
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."
GROQ_API_KEY="gsk_..."
MISTRAL_API_KEY="..."
OLLAMA_URL="http://localhost:11434"       # local Ollama instance
OLLAMA_MODEL="llama3.2"

# Stripe (optional, for billing)
STRIPE_SECRET_KEY="sk_live_..."
STRIPE_WEBHOOK_SECRET="whsec_..."

# WAB integration (optional)
WAB_INTERNAL_URL="http://localhost:3003"

Ready to cut your AI costs?

Start free with 3 projects. No credit card required. Routing engine starts saving immediately.