Search “self hosted AI coding agent” and you will see two meanings: (1) the orchestration runs on your servers, or (2) everything including the model is on-prem. Liminal supports both — separately.

Use case: Local-first AI coding. Compare: vs GitHub Copilot.

Layer 1: The harness (always local)

When you run npm run web or npm run tui, AgentHarness on your machine:

  • Schedules the ReAct loop and tool dispatcher
  • Writes session JSONL to .agent_sessions/
  • Executes run_shell, git_*, edit_file, browser_* under your OS user
  • Stores memory in .agent_notes.json and optional Obsidian files
  • Enforces approvals and resource locks

No Vireon cloud is required for this layer in Community Edition.

Layer 2: Inference (your choice)

ModeConfigTradeoff
Cloud APIAGENT_API_BASE_URL → OpenRouter/OpenAI/AnthropicBest model quality; data leaves machine as API payloads
On-prem OpenAI-compatibleOllama, vLLM, local gatewayAir-gap friendly; you operate GPUs
HybridFast model local, main model cloudCost/latency tuning

Comparison table

Embeddings for memory (AGENT_EMBED_MODEL) follow the same rule — point at a cloud embed model or a local endpoint.

Layer 3: Optional Vireon account

Pro/Team licenses are orthogonal to self-hosting the harness. Billing unlocks features on the account; the agent still runs from your clone.

Contrast with pure SaaS copilots

GitHub Copilot and hosted chat products keep more session state in vendor infrastructure. Liminal inverts control: you own traces and memory files; the vendor (if any) only sees what you send to the LLM API.

Practical self-hosted stack

AGENT_API_BASE_URL=http://127.0.0.1:11434/v1
AGENT_MODEL=your-ollama-model
AGENT_FAST_MODEL=your-ollama-model
AGENT_VAULT_PATH=/data/vault
AGENT_BROWSER=1

Add OpenRouter later for hard tasks without reinstalling.

Multi-agent self-hosting

spawn_agent forks child harnesses on the same machine — still your metal. run_workflow scales fan-out with AGENT_WORKFLOW_MAX_CONCURRENT caps. See multi-agent orchestration.

Ollama guide · Install