cost-guard

Category: safety · Cloud + Local · Status: v1 — production

Estimates request cost from input tokens × pricing. If any limit would be breached, returns a 429-shaped error response without calling the provider. Tracks actual spend after each call.

What it does

You wake up to find a runaway script burned $5,000 over the weekend. cost-guard is the module that prevents that.

When to use it

✅ Production apps where you can’t ride blind on the provider’s rate limits ✅ Multi-tenant apps where each user has a daily budget ✅ Free tiers (cap users at $0.50/day so abusers can’t drain you) ✅ During chaotic dev sprints (one bad loop, no $500 surprise)

❌ Apps where users bring their own provider key (cost-guard caps by your tracking, not their bill)

Configuration

cost-guard:
  perRequest: 0.10          # USD; reject single requests above this
  perDay: 5.00              # USD; per-user daily cap
  perMonth: 100.00          # USD; per-user monthly cap
  keyPrefix: 'cost'         # storage key namespace (rarely change)

All three limits are optional. Set only what you need. Setting none makes the module a no-op.

Metrics emitted

cost.estimated (USD; pre-hook)
cost.actual (USD; post-hook)
cost.day_spent (USD; running total)
cost.month_spent (USD; running total)

Examples

Tight per-request cap — block expensive prompts:

cost-guard:
  perRequest: 0.05

Free tier — give users $0.50/day to play with:

cost-guard:
  perDay: 0.50

Production app — multiple guards in concert:

cost-guard:
  perRequest: 0.50          # nothing absurd
  perDay: 25.00             # daily user budget
  perMonth: 500.00          # monthly user budget

What “exceeded” looks like

When a limit is breached, the client receives a 429 response with this shape:

{
  "type": "error",
  "error": {
    "type": "cost_limit_per_day",
    "message": "Daily cost cap exceeded",
    "limit": 5.00,
    "spent": 4.87,
    "estimated": 0.21,
    "resets_at": "2026-04-28T00:00:00.000Z"
  }
}

The resets_at field tells your client when to retry without burning a request.

How it works

Pre hook

Estimate request cost: inputTokens × pricing[model].
Attach metadata['cost.estimated'].
If perRequest set and estimate > limit → short-circuit with 429.
If perDay set: read today’s spend from KV, check today + estimated > limit → short-circuit if exceeded.
If perMonth set: same with the month bucket.
Otherwise continue.

Post hook

Skip on error responses.
Calculate actual cost from response usage (input_tokens + cached_input_tokens × cache_rate + output_tokens).
Increment the day + month buckets in KV.
Day buckets TTL after 48h. Month buckets TTL after 35 days.

Cost-guard tracks estimated spend during the pre check and actual spend in the post hook. There’s a race window where bursts of concurrent requests can each pass the check before any of them increment the counter. In practice the overage is bounded by your concurrency level. An atomic check-and-increment path is planned.

Cloud vs Local

Mode	Pricing source
Cloud	Static pricing map keyed by model prefix (updated periodically).
Local	Same — included in the binary.

The public local pricing map is in src/lib/cost.ts. Submit a PR against the local edition if you spot stale pricing.

Source

src/modules/cost-guard.ts