WORKS WITH CLAUDE CODE, CURSOR & ANY MCP HOST

Stop hitting the 4-hour Claude limit. Keep your momentum.

A local task offloader for coding agents.

TZRO intercepts heavy local file-reading commands and runs them locally so you don't burn your Claude Code/Cursor/Codex API keys.

Download Free Read Docs

posix-sh — install

$ curl -fsSL https://get.tzro.ai | bash

# Output pipeline expectation:

➔ Downloading runtime binaries from secure s3 allocation...

✔ System architecture validated (x86_64 / arm64)

✔ Symlinked tzro binary wrapper to local user bin space.

Zero-Overhead Local Agent Execution

Architecture

Claude Code delegates token-heavy, non-cognitive tasks to TZRO via standard MCP tool calls (like tzro_code or tzro_run).

PHASE 1: THE OFFLOAD Delegated

Claude Code

Active CLI Workspace

› Claude needs to read large local files, index context, or traverse deeply nested directories.
› Instead of wasting cloud resources, the task is immediately handed over to TZRO as an MCP tool.

PHASE 2: LOCAL WORK On-Device

TZRO Daemon

Local Go-Native Core

› TZRO takes the execution instructions locally.
› Scans workspace folders, reads local file structures, and formats raw data directly on your computer for free.

PHASE 3: CLEAN INJECTION Injected

Claude Code

Resumes Instantly

› Claude receives a highly condensed, pre-digested summary of the heavy local work.
› The LLM continues its generation with 0% token waste and absolute zero rate limit penalty.

# stdout pipeline debug inspect ❯ Daemon compiling task graph locally. Local workspace search complete.

Performance Validated

We Benchmark on real-world, multi-step workflows

We don't test on simple single-turn prompts. We benchmark TZRO against a suite of 15 complex, multi-step agent tasks (like compiling full codebase documentation, refactoring database queries, and generating local API wrappers).

What We Track	Standard Agent Loop	TZRO Cooperative Loop	The Developer Win
Cloud Token Usage	116,250 tokens	4,780 tokens	96% cheaper API bills 🟢
Average Task Speed	92 seconds	617 seconds	Cloud is ~7× faster — you trade speed for 96% cost savings 🟡
Inference Quality	5.00 / 5.00	4.90 / 5.00	LLM-judged documentation accuracy 🟢

🛑

How We Stopped the 'Runaway Agent Loop'

In our early builds, open-ended tasks like "document this entire repository" would occasionally cause our local Probe nodes to enter a runaway exploration loop. We fixed it.

Task: Generate a comprehensive README for a 100MB Go repository

Without Loop Dampening

The agent generated 41 local tool calls and consumed 183,534 local tokens trying to map the directory.

With Loop Dampening

Our dynamic termination engine collapsed this to only 2 tool calls and 16,573 local tokens — while actually improving the documentation quality score from 3.75 to 4.75.

The Result: We completely eliminate local token waste and execution lag on heavy directory sweeps, passing a tight, finalized context summary straight to your active terminal.

Inference Arbitrage Engine

Core Execution Surfaces

Heavy token processing runs locally on your hardware. Only compressed, high-value payloads escalate to frontier cloud models. This is how tzro cuts 96% of your cloud token bill without sacrificing output quality.

Local Inference Surface

ON-DEVICE

4B parameter model · Zero cloud tokens · Full tool access

local_inference — active

→ Kahn Compiler: topological sort complete (7 nodes, 3 levels)

→ GBNF grammar constraint: XML→JSON coercion OK

→ Probe node #3: read_file → list_dir → search_files (3 hops)

✔ Context compacted: 183,534 → 4,780 tokens (97.4% reduction)

⚡ Kahn DAG Compiler LIVE

🛡️ GBNF + Semantic Validator LIVE

🧠 Probe Node Thought Chains LIVE

🔄 KV Cache Preemption LIVE

💾 Hybrid Vector + Knowledge Graph LIVE

Cloud Escalation Surface

FRONTIER

Compressed payloads only · Frontier reasoning · Confidence-gated

cloud_escalation — standby

↑ Edge Thought confidence: 0.42 (threshold: 0.70)

↑ Escalating to frontier model: architectural judgment required

↑ Payload: 4,780 tokens (vs. 116,250 baseline)

✔ Terminal synthesis complete — micro-skill extracted

🎯 Confidence-Gated Escalation LIVE

🧬 Edge Thought Evaluation LIVE

📦 Terminal Synthesis LIVE

🔁 Dual Micro-Skill Extraction LIVE

☁️ Multi-Provider Fallback ROADMAP

96% Token Reduction

4,780 Tokens to Cloud (vs 116K baseline)

4.90 Quality Score (out of 5.00)

System Architecture

Every OS primitive, implemented for AI agents

tzro maps every classical operating system concept to an agentic equivalent. Click any block to explore.

Kernel

tzrod

Process Scheduler

Kahn Compiler

Processes

Tasks & Workflows

Virtual Memory

Context Compaction

Filesystem

SQLite

IPC

StreamBus

Device Drivers

Tool Registry

System Bus

SubagentChannel

System Daemons

Background Agents

Permissions

Proactivity Ladder

CPU

Inference Backend

DMA / Offload

Cloud Escalation

Shell

CLI & MCP

GUI / Window Manager

Generative Dashboard

Boot Loader

install.sh

Portable Disk

tzro.db

Get Started

One command. Zero config.

Install TZRO with a single command. It detects your system, downloads the right binary, and symlinks it to your PATH.

Terminal

$ curl -fsSL https://get.tzro.ai | bash

Under the Hood

What else ships inside the jumpdrive

Six more proprietary subsystems — all implemented, tested, and shipping.

🛡️

Zero Syntax Failures

GBNF Grammar + Semantic Validator

The Local Model generates high-speed XML under shallow GBNF structural constraints, then a deterministic Semantic Validator coerces it into strict JSON tool parameters. Type coercion, default imputation, fuzzy matching — all in one boundary seam. 0% syntax failure rate for tool arguments.

⚡

Zero Chat Latency

Priority KV Cache Preemption

When a user sends a chat message, the OS dumps the running background task's KV attention state to disk, erases the slot, and processes the chat instantly — sub-450ms time-to-first-token. When chat completes, background state restores from disk without re-evaluating historical tokens.

🧠

Pure Local Memory

Hybrid Vector Search & Knowledge Graph

Zero cloud dependencies for long-term recall. SQLite FTS5 keyword pre-filtering → ONNX cosine similarity ranking → Neighborhood Multi-Hop traversal across a relational knowledge graph. Rich, contextual subgraphs injected directly into the model.

🔄

Self-Improving Inference

Dual Micro-Skill Extraction

Successful trajectories synthesize Procedural Micro-Skills. Failed-then-succeeded pairs extract Corrective Micro-Skills — teaching the local model to self-correct on specific failure patterns without weight updates. The OS gets better the more you use it.

👁️

Background Intelligence

Observer + Sentinel + Attention Scheduler

Three background agents form an autonomous nervous system. The Observer reflects on completed tasks. The Sentinel proactively correlates workspace activity. The Attention Scheduler enforces preemption, safety gates (L0–L4), and resource budgets.

🖥️

Generative Dashboard

Agent-Composed Observability

The dashboard isn't a static monitoring page — it's a Generative UI surface where the Local Model analyzes system state and composes the layout from 15 primitives. The agent decides panel ordering, emphasis, and which tasks deserve spotlight attention.