Architecting a Local Agent Hub
The goal: run my workflow from the laptop—capture a 10–15 minute voice unload, transform it with GPT-5, and have agents file themes to knowledge, push tasks to boards, and draft client updates—without me doing the shuttling. I want a local-first hub that orchestrates this flow and enforces confidence thresholds (auto/approve/escalate) with a clear audit trail.
This is the architecture I’d build.
Design Goals (What “good” looks like)
- Local-first. Works offline; sync is optional. Single-file DB, deterministic jobs.
- Protocol-friendly. Speaks MCP for portable tools/connectors.
- Composable agents. Each agent does one thing well; orchestration does the glue.
- Observable. Human-readable audit log; replay any job; dry-run mode.
- Secure by default. Secrets isolated; least-privilege adapters.
- Confidence thresholds. Automatic when safe, review when ambiguous, escalate when risky.
- Zero yak-shaving. Start simple: TUI/CLI first, thin web UI later.
Mental Model
Think of the hub as 5 layers:
- Ingest — raw stuff enters (transcripts, notes, diffs).
- Normalize — convert to a canonical event (JSON) + attachments.
- Orchestrate — rules decide which agents run and how (with thresholds).
- Act — adapters perform side-effects (write markdown, open issues, draft emails).
- Observe — audit, notifications, replay.
High-Level Architecture
- Event Bus (local): append-only
events.sqlite
+events/*.jsonl
for durability. - State Store: SQLite (WAL mode) for jobs, rules, runs, artifacts.
- Rule Engine: YAML/JSON rules → compiled to predicates/functions.
- MCP Tooling: agent providers exposed as MCP tools (filesystem, GitHub, Notion, etc.).
- Queue/Workers: simple priority queues (
pending
,active
,deadletter
). - Threshold Gate: auto / needs-review / escalate routing.
- UI: TUI (terminal) for day-1; minimal web UI for review/approve.
Core Entities (Schema Sketch)
- Event:
{ id, type, source, payload, attachments[], received_at }
- Job:
{ id, event_id, rule_id, agent, status, confidence, created_at }
- Artifact:
{ id, job_id, kind, path/hash, preview, created_at }
- Decision:
{ id, job_id, action: "auto"|"approve"|"reject"|"escalate", by, reason }
- Rule:
{ id, name, match, plan, thresholds, enabled }
- Secret: stored via OS keychain; DB holds only references.
SQLite is enough. Keep everything local and commit selected artifacts to a git repo if you want history.
Rules: Human-Readable, Diffable
Rules live in hub/rules/*.yaml
. Example for the voice unload flow:
id: rule.voice-unload.v1
name: Voice Unload → Themes/Tasks
match:
event.type: "transcript.created"
payload.duration_minutes: ">=8" # simple numeric predicate
plan:
- step: "Summarize to canonical structure"
agent: "gpt5.summarizer"
input: "{{ event.attachments[0].text }}"
output: "artifact://unload/{{ event.id }}.summary.json"
- step: "Persist themes as markdown"
agent: "fs.markdown"
input: "{{ artifact('summary').themes }}"
output: "obsidian://daily/{{ event.date }}-unload.md"
- step: "Create tasks by domain"
agent: "tasks.router"
input: "{{ artifact('summary').tasks }}"
params:
mapping:
dev: "github://org/repo"
client: "notion://client-board"
ops: "todo://inbox"
thresholds:
auto:
- agent: "fs.markdown"
- agent: "gpt5.summarizer"
review:
- agent: "tasks.router"
escalate:
when:
- "artifact('summary').contains_sensitive == true"
notifications:
on_review: "notify://me?channel=desktop"
on_error: "notify://me?channel=desktop"
enabled: true
Interpretation
- Always summarize + persist themes automatically.
- Creating tasks goes to review; I approve with one tap.
- If PII/sensitive content is detected, escalate before any action.
Agents & Adapters (Keep Them Small)
gpt5.summarizer (MCP tool) → input: raw text; output: normalized JSON
{ themes[], domains[], tasks{type,domain,title,notes,confidence}, flags{} }
fs.markdown → writes Obsidian-friendly MD with frontmatter + backlinks.
tasks.router → splits tasks by
domain
andtype
into downstream adapters:github.issues
(scoped repo/project)notion.tasks
(database + status)todo.inbox
(local system / Things / Apple Reminders via bridge)
notify → desktop notifications for review/approve; can also post to a local inbox.
Each adapter runs in a sandbox (separate process) with scoped secrets.
Confidence Thresholds (How Decisions Happen)
Every agent returns a confidence
score and optional risk_flags
. The Threshold Gate applies rule policy:
- If
auto
list contains the agent → run without human input. - If in
review
→ create a Decision record and pause job until approval. - If
escalate
predicate matches → require explicit confirmation with context.
One-tap approvals: The UI shows the diff (“I will create 5 issues in org/repo”) and a single approve button. No modal marathons.
File/Repo Layout (Local-first)
~/agent-hub/
hub.sqlite # state store
events/ # newline-delimited JSON events
artifacts/ # generated JSON/MD for inspection
rules/
voice-unload.yaml
repo-sync.yaml
agents/
gpt5/
fs/
tasks/
secrets/ # references only; actual secrets in OS keychain
ui/ # TUI & optional web UI
logs/ # structured logs
Commit rules/
and selected artifacts/
to git for change history.
Observability & Replay
- Audit log: append-only entries:
time, event_id, job_id, agent, action, outcome
. - Replay: pick an
event_id
→ re-run through current rules (great for testing rule changes). - Dry-run: run agents in “explain mode” → show what would happen; no side effects.
Security Model (Local-First, Least Privilege)
- Secrets stored in OS keychain (Keychain/Pass/Windows Credential Manager). DB holds aliases, not values.
- Each adapter gets scoped tokens only (e.g., a repo-scoped GitHub token).
- Content scanners (PII/keys) run before any “Act” step; if hits → escalate.
- Network egress can be deny-listed by adapter to avoid surprise calls.
The Human Touchpoints
- Inbox (Review): a queue of pending items (e.g., “Create 7 GitHub issues”). Approve/Reject with a note.
- Timeline: event → jobs → artifacts → outcomes (click through details).
- Health: quick indicators: “MCP connected”, “GitHub ok”, “Notion token expired”.
- Search: by theme/domain across
artifacts/
+ Obsidian.
TUI first (fast, reliable); small web UI later for approvals from phone.
Example Artifact (Normalized Summary)
{
"event_id": "evt_2025-09-19_0932",
"themes": ["Payments", "StrongStart Courses", "Local Ops"],
"domains": ["client", "product", "ops"],
"tasks": [
{"type":"followup","domain":"client","title":"Email Nick about automation runway","confidence":0.86},
{"type":"research","domain":"product","title":"Evaluate rituals module outline","confidence":0.78},
{"type":"ops","domain":"ops","title":"Price out pressure-washing gear","confidence":0.73}
],
"flags": {"contains_sensitive": false}
}
This is the single truth everything else reads.
Event Flow (Voice Unload Use Case)
- Ingest:
transcript.created
event with text attachment. - Rule match:
rule.voice-unload.v1
fires. - Summarize (auto): GPT-5 → normalized artifact.
- Persist (auto): fs.markdown →
2025-09-19-unload.md
in Obsidian vault. - Route tasks (review): tasks.router proposes 1 GitHub issue, 1 Notion card, 1 local task.
- Approve: one tap; jobs dispatched.
- Notify: desktop ping with a compact summary + links.
If flags trip (sensitive), the plan halts at escalate with context.
Local DX (Developer Experience)
- Bootstrap in minutes:
hub init
,hub run
,hub ui
. - Hot-reload rules: hub watches
rules/*.yaml
. - Fixtures:
hub ingest transcript ./samples/day-2025-09-19.txt --as john
. - Explain:
hub plan evt_... --dry-run
(show agents, targets, thresholds). - Replay:
hub replay evt_... --rule rule.voice-unload.v2
.
Testing & Reliability
- Golden transcripts: known inputs + expected artifacts; run in CI.
- Contract tests for adapters: mock GitHub/Notion; assert payload shapes.
- Idempotency: every job has an
idempotency_key
; reruns don’t duplicate issues. - Deadletter queue: broken jobs go here with retry metadata and error snapshots.
Closing Notes
Designing a local-first agent hub isn’t just an exercise in architecture—it’s a way of reclaiming control over how AI fits into my daily work. Instead of scattering prompts and outputs across apps, the hub becomes a single home where rules, agents, and context live together.
The experience has shown me that the value of AI isn’t in the novelty of generation—it’s in the systems we design to receive it. By owning the pipeline end-to-end, from voice unload to task routing, I can finally step back from the manual “doing” and focus on the strategy, review, and momentum that only I can provide.