Self-hosted AI coding agent: what it means in 2026

Search “self hosted AI agent” or “self-hosted coding agent” and you will see two meanings: (1) the orchestration runs on your servers, or (2) everything including the model is on-prem. Liminal supports both: separately. Canonical guide: Self-hosted AI coding agent.

Use case: Self-hosted AI coding agent. Compare: vs GitHub Copilot.

Layer 1: The harness (always local)

When you run npm run web or npm run tui, AgentHarness on your machine:

Schedules the ReAct loop and tool dispatcher
Writes session JSONL to .agent_sessions/
Executes run_shell, git_*, edit_file, browser_* under your OS user
Stores memory in .agent_notes.json and optional Obsidian files
Enforces approvals and resource locks

No Vireon cloud is required for this layer in Community Edition.

Layer 2: Inference (your choice)

Mode	Config	Tradeoff
Cloud API	`AGENT_API_BASE_URL` → OpenRouter/OpenAI/Anthropic	Best model quality; data leaves machine as API payloads
On-prem OpenAI-compatible	Ollama, vLLM, local gateway	Air-gap friendly; you operate GPUs
Hybrid	Fast model local, main model cloud	Cost/latency tuning

Embeddings for memory (AGENT_EMBED_MODEL) follow the same rule: point at a cloud embed model or a local endpoint.

Layer 3: Optional Vireon account

Pro/Team licenses are orthogonal to self-hosting the harness. Billing unlocks features on the account; the agent still runs from your clone.

Contrast with pure SaaS copilots

GitHub Copilot and hosted chat products keep more session state in vendor infrastructure. Liminal inverts control: you own traces and memory files; the vendor (if any) only sees what you send to the LLM API.

Practical self-hosted stack

AGENT_API_BASE_URL=http://127.0.0.1:11434/v1
AGENT_MODEL=your-ollama-model
AGENT_FAST_MODEL=your-ollama-model
AGENT_VAULT_PATH=/data/vault
AGENT_BROWSER=1

Add OpenRouter later for hard tasks without reinstalling.

Multi-agent self-hosting

spawn_agent forks child harnesses on the same machine: still your metal. run_workflow scales fan-out with AGENT_WORKFLOW_MAX_CONCURRENT caps. See multi-agent orchestration.

Ollama guide · Install

Layer 1: The harness (always local)#

Layer 2: Inference (your choice)#

Layer 3: Optional Vireon account#

Contrast with pure SaaS copilots#

Practical self-hosted stack#

Multi-agent self-hosting#

Related reading