Relay - Neha Chaudhari

The Problem

Agent systems waste tokens. Every hop in a multi-step workflow re-sends the full context — the same files, the same decisions, the same instructions. A 10-step agent loop with 4KB of context sends ~40KB total. Most of it is duplication.

What Relay Does

Relay replaces agent-to-agent prose with durable state refs and artifact refs. Instead of re-sending 1000 tokens of context, agents reference state_ref: v7. Long content gets stored locally, and only a 2KB preview goes to the LLM.

BEFORE relay                    WITH relay

agent A ──▶ "Here's all the    agent A ──▶ state_ref: v7
             context again:                 artifact_ref: abc123
             [1000 tokens]                        │
             [2000 tokens]                   local store
             [3000 tokens]"                       │
                                                  ▼
agent B ◀── re-reads it all    agent B ◀── preview: 2KB
             6000 tokens                    ref on disk
                                            ~200 tokens

Typical savings: 95% token reduction. A naive 48K token session becomes 2.4K actual.

Core Concepts

Thread — execution context with a unique ID. Everything lives inside it:

thread_id:   550e8400-e29b-41d4-a716-446655440000
state_ref:   v7
artifacts:   [abc123, def456, ghi789]
events:      [...append-only log...]

state_ref — bounded memory view sent to LLM (~500 tokens):

Field	Bound	Description
`top_facts`	10	Key decisions and context
`top_constraints`	5	Rules and limitations
`open_questions`	5	Unresolved items only
`next_steps`	5	Pending tasks only
`artifact_refs`	10	Pointers to stored content
`metrics`	—	Cache hits, tokens avoided

State updates happen via RFC 6902 JSON Patch — canonical state is patched, never overwritten.

artifact_ref — any stored content (markdown, JSON, HTML, binary, tool output). Each gets an opaque ID, SHA-256 hash, and provenance tracking. Only a preview (first 2KB) travels to the LLM.

Provider Support

Wrap any LLM CLI with relay wrap run --cli <binary>:

Anthropic (default)
OpenAI
Ollama (local)
Any CLI tool — Claude Code, Gemini, sgpt

For complex workflows, define DAG pipelines in YAML with parallel execution across multiple agents. Compatible with Google’s A2A protocol.

Built in Go with SQLite. Ships as a static binary for macOS and Linux.