Search “self hosted AI coding agent” and you will see two meanings: (1) the orchestration runs on your servers, or (2) everything including the model is on-prem. Liminal supports both — separately.
Use case: Local-first AI coding. Compare: vs GitHub Copilot.
Layer 1: The harness (always local)
When you run npm run web or npm run tui, AgentHarness on your machine:
- Schedules the ReAct loop and tool dispatcher
- Writes session JSONL to
.agent_sessions/ - Executes
run_shell,git_*,edit_file,browser_*under your OS user - Stores memory in
.agent_notes.jsonand optional Obsidian files - Enforces approvals and resource locks
No Vireon cloud is required for this layer in Community Edition.
Layer 2: Inference (your choice)
| Mode | Config | Tradeoff |
|---|---|---|
| Cloud API | AGENT_API_BASE_URL → OpenRouter/OpenAI/Anthropic | Best model quality; data leaves machine as API payloads |
| On-prem OpenAI-compatible | Ollama, vLLM, local gateway | Air-gap friendly; you operate GPUs |
| Hybrid | Fast model local, main model cloud | Cost/latency tuning |
Comparison table
Embeddings for memory (AGENT_EMBED_MODEL) follow the same rule — point at a cloud embed model or a local endpoint.
Layer 3: Optional Vireon account
Pro/Team licenses are orthogonal to self-hosting the harness. Billing unlocks features on the account; the agent still runs from your clone.
Contrast with pure SaaS copilots
GitHub Copilot and hosted chat products keep more session state in vendor infrastructure. Liminal inverts control: you own traces and memory files; the vendor (if any) only sees what you send to the LLM API.
Practical self-hosted stack
AGENT_API_BASE_URL=http://127.0.0.1:11434/v1
AGENT_MODEL=your-ollama-model
AGENT_FAST_MODEL=your-ollama-model
AGENT_VAULT_PATH=/data/vault
AGENT_BROWSER=1
Add OpenRouter later for hard tasks without reinstalling.
Multi-agent self-hosting
spawn_agent forks child harnesses on the same machine — still your metal. run_workflow scales fan-out with AGENT_WORKFLOW_MAX_CONCURRENT caps. See multi-agent orchestration.