Ollama + Liminal

Name: Liminal AI
Brand: Vireon Dynamics

Point AGENT_API_BASE_URL at Ollama's OpenAI-compatible server for fully local inference while keeping Liminal's full tool harness on your machine.

When to use Ollama with Liminal

Choose Ollama when mail, code, or client data must not leave your network, or when you already run models on a GPU workstation. The agent loop, file tools, git, memory, and session JSONL still run locally; only the model weights execute on your hardware.

1. Start Ollama

Install from ollama.com and pull a code-capable model. Examples: qwen2.5-coder, deepseek-coder, or llama3.1. Tool calling quality varies by model; prefer models marketed for agents or coding.

2. Configure Liminal

Add to `.env` or Settings:

Use the same slug as ollama list shows.
Set AGENT_EMBED_MODEL empty if you want BM25-only memory without cloud embeddings.

AGENT_API_BASE_URL=http://127.0.0.1:11434/v1
AGENT_MODEL=qwen2.5-coder:latest
AGENT_FAST_MODEL=qwen2.5-coder:latest
AGENT_API_KEY=ollama

3. Desktop app path

Open Settings in the Liminal desktop app, paste the base URL and model slug, and send a smoke-test prompt. Integrations (Slack, Xero) still work; only LLM traffic stays on localhost.

Performance expectations

Local models are slower than cloud APIs on most laptops but improve with a discrete GPU. Long agent turns with many tool rounds need a model that follows tool schemas reliably; if loops stall, try a larger quant or switch the fast model to a cloud endpoint only.

Docs: Matching section

Install All guides

FAQ

Common questions

Is local inference slower?

Depends on GPU/CPU. You trade cloud latency for data staying on your network.

Can I mix Ollama and OpenRouter?

Yes. Some users run the main model on OpenRouter and point AGENT_FAST_MODEL at a small local model, or the reverse for privacy-sensitive repos only.

Does Ollama work on Apple Silicon?

Yes. Ollama uses Metal on Mac; pair with the macOS install guide for the desktop app.