Cut your LLM bill
without touching a line of agent code.
turbo-flow saves money three ways: caps runaway bursts at the kernel, rewrites opus→sonnet in-flight, and caches identical prompts across your whole fleet. Zero SDK changes.
Three levers. Stackable.
Real dollars, not just dashboards.
Stop runaway bursts
Agent stuck in a retry loop? CI job that forgot to exit? The kernel drops its TCP packets the moment the budget is blown. You pay $0 for the rest of the burst.
Rewrite expensive models
Most agent tasks don't need Opus. --downgrade-to sonnet rewrites the model field in-flight (TLS-terminated proxy). Same API, same response shape, 5× cheaper per token.
Pay once per unique prompt
100 CI workers fire the same prompt at the same second → 1 upstream call, 99 followers share the response. Identical repeat calls in the next 24h → zero upstream tokens.
turbo_flow_proxy_rewrote_saved_usd_totalturbo_flow_proxy_cache_saved_usd_totalturbo_flow_proxy_coalesced_saved_usd_totalObservability tells you the bill arrived.
We refuse to let it leave.
Three probes. One bit flip.
eBPF uprobe on SSL_write
Every agent hitting libssl, BoringSSL, or crypto/tls triggers a uprobe. 384-byte preview + direction into a ring buffer.
User-space policy engine
Rust daemon drains the ring, classifies model tier, debits a shared rolling 60s budget. Response usage reconciles estimates.
TC egress drops packets
Budget flipped? One bit flips in an eBPF map. Matching TCP packets return TC_ACT_SHOT. Well-behaved PIDs keep flowing.
[agent] → SSL_write → [uprobe] → ring-buf → [daemon] → { JSONL, Prom, ENFORCE_MAP }
│
▼
[tc-egress] budget blown? SHOT.Drop it into a running host.
# shadow — observe & cost-track
sudo turbo-flow start \
--budget 100000 --iface eth0 \
--metrics-port 9191
# enforce — hard cap
sudo turbo-flow start \
--budget 100000 --iface eth0 --enforce \
--target-port 443 \
--alert-webhook https://hooks.slack.com/…Prometheus + Grafana.
Out of the box.
| model | rewrote | cache |
|---|---|---|
| claude-opus-4 | $188.02 | $12.40 |
| claude-sonnet-4-5 | — | $28.10 |
| claude-haiku-4-5 | — | $11.20 |
| gpt-4o | — | $7.90 |
Everyone else asks you to change the agent.
| DIMENSION | TURBO-FLOW ★ | SDK WRAPPER | APP PROXY | DASHBOARD |
|---|---|---|---|---|
| Integration | ✓Zero | Wrap every call | Change every URL | Add middleware |
| Enforcement | ✓Kernel drops packets | App-level (bypassable) | Proxy (bypassable) | Alert only |
| Prompts leave host | ✓Never | Sometimes | Yes | Yes |
| Runtimes | ✓Py · Node · Go · Ruby+ | One per SDK | Any (URL change) | Any (SDK change) |
| Retry dedup | ✓SHA-hash at probe time | Manual | Counts as new | Post-hoc |
| Deploy | ✓One binary + sudo | Rebuild agents | Network redesign | Instrument all |
This isn't a weekend project.
21 scenarios under sim-validate
8 agent personalities × fake Anthropic. Each asserts charges, retry dedup, direction flags.
Lifecycle leak guard
Start/SIGTERM × 3 in Lima. Asserts no clsact dup, no stale links, no bpf pin growth.
Struct layout tests
#[repr(C)] shared between eBPF & userspace. Size drift fails the build before verifier.
Response reconciliation
Uretprobe stashes buffer on entry, reads on return. usage block swaps estimate for truth.
Port-scoped enforcement
Classifier drops only dport == --target-port. SSH, metrics scrape — untouched.
Canonical cache keys
JSON key order / whitespace / per-caller metadata normalized. SDK diversity stops fragmenting cache.
Install in 30 seconds.
Delete in one.
Apache-2.0 · Rust · self-hosted · any Linux 5.15+