cost-guard
Category: safety · Cloud + Local · Status: v1 — production
Estimates request cost from input tokens × pricing. If any limit would be breached, returns a 429-shaped error response without calling the provider. Tracks actual spend after each call.
What it does
You wake up to find a runaway script burned $5,000 over the weekend. cost-guard is the module that prevents that.
When to use it
✅ Production apps where you can’t ride blind on the provider’s rate limits ✅ Multi-tenant apps where each user has a daily budget ✅ Free tiers (cap users at $0.50/day so abusers can’t drain you) ✅ During chaotic dev sprints (one bad loop, no $500 surprise)
❌ Apps where users bring their own provider key (cost-guard caps by your tracking, not their bill)
Configuration
cost-guard:
perRequest: 0.10 # USD; reject single requests above this
perDay: 5.00 # USD; per-user daily cap
perMonth: 100.00 # USD; per-user monthly cap
keyPrefix: 'cost' # storage key namespace (rarely change)
All three limits are optional. Set only what you need. Setting none makes the module a no-op.
Metrics emitted
cost.estimated(USD; pre-hook)cost.actual(USD; post-hook)cost.day_spent(USD; running total)cost.month_spent(USD; running total)
Examples
Tight per-request cap — block expensive prompts:
cost-guard:
perRequest: 0.05
Free tier — give users $0.50/day to play with:
cost-guard:
perDay: 0.50
Production app — multiple guards in concert:
cost-guard:
perRequest: 0.50 # nothing absurd
perDay: 25.00 # daily user budget
perMonth: 500.00 # monthly user budget
What “exceeded” looks like
When a limit is breached, the client receives a 429 response with this shape:
{
"type": "error",
"error": {
"type": "cost_limit_per_day",
"message": "Daily cost cap exceeded",
"limit": 5.00,
"spent": 4.87,
"estimated": 0.21,
"resets_at": "2026-04-28T00:00:00.000Z"
}
}
The resets_at field tells your client when to retry without burning a request.
How it works
Pre hook
- Estimate request cost:
inputTokens × pricing[model]. - Attach
metadata['cost.estimated']. - If
perRequestset and estimate > limit → short-circuit with 429. - If
perDayset: read today’s spend from KV, checktoday + estimated > limit→ short-circuit if exceeded. - If
perMonthset: same with the month bucket. - Otherwise continue.
Post hook
- Skip on error responses.
- Calculate actual cost from response usage (
input_tokens + cached_input_tokens × cache_rate + output_tokens). - Increment the day + month buckets in KV.
- Day buckets TTL after 48h. Month buckets TTL after 35 days.
Cost-guard tracks estimated spend during the pre check and actual spend in the post hook. There’s a race window where bursts of concurrent requests can each pass the check before any of them increment the counter. In practice the overage is bounded by your concurrency level. An atomic check-and-increment path is planned.
Cloud vs Local
| Mode | Pricing source |
|---|---|
| Cloud | Static pricing map keyed by model prefix (updated periodically). |
| Local | Same — included in the binary. |
The public local pricing map is in src/lib/cost.ts. Submit a PR against the local edition if you spot stale pricing.