Skip to main content
← All Projects

Relay

Cuts LLM token usage with persistent state and artifact caching. Local-first.

Go SQLite REST API A2A Protocol
Relay preview

The Problem

Agent systems waste tokens. Every hop in a multi-step workflow re-sends the full context — the same files, the same decisions, the same instructions. A 10-step agent loop with 4KB of context sends ~40KB total. Most of it is duplication.

What Relay Does

Relay replaces agent-to-agent prose with durable state refs and artifact refs. Instead of re-sending 1000 tokens of context, agents reference state_ref: v7. Long content gets stored locally, and only a 2KB preview goes to the LLM.

BEFORE relay                    WITH relay

agent A ──▶ "Here's all the    agent A ──▶ state_ref: v7
             context again:                 artifact_ref: abc123
             [1000 tokens]                        │
             [2000 tokens]                   local store
             [3000 tokens]"                       │

agent B ◀── re-reads it all    agent B ◀── preview: 2KB
             6000 tokens                    ref on disk
                                            ~200 tokens

Typical savings: 95% token reduction. A naive 48K token session becomes 2.4K actual.

Core Concepts

Thread — execution context with a unique ID. Everything lives inside it:

thread_id:   550e8400-e29b-41d4-a716-446655440000
state_ref:   v7
artifacts:   [abc123, def456, ghi789]
events:      [...append-only log...]

state_ref — bounded memory view sent to LLM (~500 tokens):

FieldBoundDescription
top_facts10Key decisions and context
top_constraints5Rules and limitations
open_questions5Unresolved items only
next_steps5Pending tasks only
artifact_refs10Pointers to stored content
metricsCache hits, tokens avoided

State updates happen via RFC 6902 JSON Patch — canonical state is patched, never overwritten.

artifact_ref — any stored content (markdown, JSON, HTML, binary, tool output). Each gets an opaque ID, SHA-256 hash, and provenance tracking. Only a preview (first 2KB) travels to the LLM.

Provider Support

Wrap any LLM CLI with relay wrap run --cli <binary>:

  • Anthropic (default)
  • OpenAI
  • Ollama (local)
  • Any CLI tool — Claude Code, Gemini, sgpt

For complex workflows, define DAG pipelines in YAML with parallel execution across multiple agents. Compatible with Google’s A2A protocol.

Built in Go with SQLite. Ships as a static binary for macOS and Linux.