prxy.monster API-key BYOK is live. Start free

Customer support bot

For chat agents that field hundreds of similar questions per day. Built around aggressive caching plus tight cost caps to keep per-conversation cost bounded.

What this pipeline is good at

The pipeline

PRXY_PIPE='exact-cache,semantic-cache,cost-guard,patterns'

Use guardrails first and router last:

PRXY_PIPE='guardrails,exact-cache,semantic-cache,cost-guard,patterns,router'

Why this order

  1. Caches first — exact then semantic. Most common questions hit the cache, never hit the provider.
  2. cost-guard after caches — no point burning cap budget on a request the cache would have answered.
  3. patterns last — only relevant for cache misses; injects context before the provider call.
  4. guardrails before everything — strip PII before any module sees the request.
  5. router after everything else — picks the cheapest model that can handle what’s left after all the optimizations.

Cost math

For a support bot doing 10,000 conversations/month, ~3 turns each:

Without prxy.monsterWith this pipeline
30,000 calls × $0.02 each = $600/mo18,000 cache misses × $0.02 + 12,000 hits × $0 = $360/mo

Plus cap protection: a single user looping a bug at 10 calls/sec costs you $0.50 max instead of $50.

Variants

Knowledge-base Q&A only (no per-user data):

pipeline:
  - semantic-cache:
      similarity: 0.85          # looser — KB answers tolerate more variance
      scope: 'global'
      ttlSeconds: 604800        # week
  - cost-guard: { perRequest: 0.10 }

High-traffic with strict redaction (with production modules):

pipeline:
  - guardrails:
      pii_redact: true
      custom_patterns: ['/sk-[a-zA-Z0-9]{32,}/']
  - exact-cache: { ttlSeconds: 3600 }
  - semantic-cache: { similarity: 0.90, scope: 'global' }
  - cost-guard: { perRequest: 0.05, perDay: 0.50 }

See also