The Problem
Agent systems waste tokens. Every hop in a multi-step workflow re-sends the full context — the same files, the same decisions, the same instructions. A 10-step agent loop with 4KB of context sends ~40KB total. Most of it is duplication.
What Relay Does
Relay replaces agent-to-agent prose with durable state refs and artifact refs. Instead of re-sending 1000 tokens of context, agents reference state_ref: v7. Long content gets stored locally, and only a 2KB preview goes to the LLM.
BEFORE relay WITH relay
agent A ──▶ "Here's all the agent A ──▶ state_ref: v7
context again: artifact_ref: abc123
[1000 tokens] │
[2000 tokens] local store
[3000 tokens]" │
▼
agent B ◀── re-reads it all agent B ◀── preview: 2KB
6000 tokens ref on disk
~200 tokens
Typical savings: 95% token reduction. A naive 48K token session becomes 2.4K actual.
Core Concepts
Thread — execution context with a unique ID. Everything lives inside it:
thread_id: 550e8400-e29b-41d4-a716-446655440000
state_ref: v7
artifacts: [abc123, def456, ghi789]
events: [...append-only log...]
state_ref — bounded memory view sent to LLM (~500 tokens):
| Field | Bound | Description |
|---|---|---|
top_facts | 10 | Key decisions and context |
top_constraints | 5 | Rules and limitations |
open_questions | 5 | Unresolved items only |
next_steps | 5 | Pending tasks only |
artifact_refs | 10 | Pointers to stored content |
metrics | — | Cache hits, tokens avoided |
State updates happen via RFC 6902 JSON Patch — canonical state is patched, never overwritten.
artifact_ref — any stored content (markdown, JSON, HTML, binary, tool output). Each gets an opaque ID, SHA-256 hash, and provenance tracking. Only a preview (first 2KB) travels to the LLM.
Provider Support
Wrap any LLM CLI with relay wrap run --cli <binary>:
- Anthropic (default)
- OpenAI
- Ollama (local)
- Any CLI tool — Claude Code, Gemini, sgpt
For complex workflows, define DAG pipelines in YAML with parallel execution across multiple agents. Compatible with Google’s A2A protocol.
Built in Go with SQLite. Ships as a static binary for macOS and Linux.