← Back to system overview
Subsystem deep-dive

Local AI

Models that run on your own hardware, with no cloud and no token bill, and the only thing allowed to read raw private data. Claude orchestrates; the local models do the work.

0
Local models
0
MCP servers shared
0
Callable tools
0
Tokens billed
How it's wired
One tool layer, two front-ends

The same MCP tool servers are shared by Claude Code (the orchestrator) and by Open WebUI (your local chat surface), bridged by mcpo and all talking to Ollama on your machine.

🧰 MCP tool serversshared by both clients
filesystem · applescript · git crawl4ai · ollama-bridge · n8n
↓   the same tools, two ways in   ↓
🤖 Claude Codeorchestrator
talks to MCP directly (stdio / SSE)
💬 ConnorGPTOpen WebUI · Docker :3000
via mcpo bridge (MCP → OpenAPI)
🧠 Ollamahost :11434 · on-device inference
8 local models reachable privately over Tailscale
The roster
Right tool per job

gemma4:12b

The default local reasoner and the Command Center's draft engine. Schema-constrained generation for the everyday structured jobs.

7.6 GB · primary

glm-4.7-flash

The resident driver. Won the 2026-05-24 tool-use bake-off at 96% tool-call correctness and runs the multi-step agent loop behind the Mac-control toolkit.

19 GB · driver

qwen3-coder:30b

Fast structured-output worker and bake-off runner-up (86%, needs an XML-parse shim). Strong on code and single-shot tool calls.

18 GB · worker

llama3.1:8b

A general-purpose mid-size worker for everyday local jobs.

4.9 GB · general

gemma3:4b

Small and fast. Powers the local Open WebUI chat assistant and quick on-device tasks.

3.3 GB · fast

llava

Vision (image → text), fully on-device. Image bytes never reach the cloud.

4.7 GB · vision
The framing
Workers, not agents

The honest scope: local models started as reliable workers Claude delegates to. The autonomous multi-tool agent was deliberately pared back because multi-step tool loops weren't reliable, but a 2026-05-24 bake-off shows that's changed, which has reopened the question.

Pattern A: Claude delegates

Claude orchestrates and hands a local model a single-shot job through the ollama-bridge MCP. Bulk work runs locally; Claude keeps the reasoning.

Pattern B: code orchestrates private work

A deterministic script drives a private-file job; the local model is the only LLM that ever sees raw personal content, and only a sanitized derivative crosses to Brain.

The multi-step bar just got cleared, and what that reopens

Single-shot tool calls always worked. Verified end-to-end (local model → mcpo → filesystem MCP → a real directory listing), with 56 tools wired across both clients.

Reliable multi-step tool loops were the weak spot, which is why the autonomous-agent ambition got scoped down. But the 2026-05-24 tool-use bake-off changed the verdict: glm-4.7-flash hit 96% tool-call correctness driving a 5-tool multi-step agent loop end-to-end (the Mac-control toolkit: app, messages, calendar, notes, screen-read) behind dry-run/confirm gates.

The hard privacy constraint is unchanged, and it's now the interesting part: an orchestrator must see what it orchestrates, so Claude still can't loop over Claude-denied private files, but a capable local agent can. So the open question is no longer "can local models do this" but "should a local agent take over the private-file orchestration that deterministic scripts do today." That's a scope call now actively reopened.

Why run models locally at all
Three things only on-device buys you

Privacy

Raw personal data (the Journal, health, finances) can be processed by an LLM without a single byte leaving the Mac.

Free + offline

Local inference costs zero tokens and works with no network. Bulk jobs don't run up a bill.

Private remote access

ConnorGPT is reachable from your phone over Tailscale, never exposed on the public internet.