FAQ

What is prxy.monster?

A composable AI gateway. You set one env var (ANTHROPIC_BASE_URL or OPENAI_BASE_URL), and your LLM calls flow through a configurable pipeline of middleware before hitting the provider. Modules handle caching, cost limits, prompt optimization, persistent learning.

How is it different from OpenRouter / Portkey / Helicone / LiteLLM?

See Migration for detailed comparisons. Short version: prxy.monster’s emphasis is composable modules + local mode + persistent memory. Other gateways are stronger on specific axes (OpenRouter on routing, Helicone on observability) — we trade some breadth in those for depth elsewhere.

Do you store my prompts?

Cloud mode: prompts pass through and aren’t logged in plaintext. The patterns module saves successful “fix X with Y” snippets to your private pattern store. The semantic-cache module saves embeddings + responses for cache lookups, scoped to your user by default. Account-scoped data can be deleted through support while the hosted console is rolling out.

Local mode: everything stays on your machine. Nothing is sent to us at all.

See Local mode → Privacy for full details.

Do I need to use your provider keys?

No. Bring your own key (BYOK) is the default. You pay your provider for tokens. We charge a flat tier for the gateway. No markup.

In v1, pass provider keys per-request or configure them at the account layer through support. v1.1 adds optional hosted-key tiers for users who’d rather not manage keys at all.

Does the gateway add latency?

A few milliseconds for the proxy itself. The pipeline modules each add their own:

exact-cache hit: net negative latency (ms instead of ~1s provider call).
semantic-cache lookup: ~10–30ms when a key is set; instant on hit.
mcp-optimizer: ~20–50ms for embedding (cached after first request).
cost-guard: under 1ms (KV lookup).
patterns injection: ~10–20ms (vector search).

For most workloads the modules save more time than they cost.

Can I run my own modules?

Yes. See SDK → Module interface. Modules are TypeScript objects implementing a six-field interface.

In v1, custom modules load from a local file path. v1.1 adds npm-package loading.

What providers are supported?

In v1: Anthropic, OpenAI, Google (Gemini), Groq.

The router module (v1.1) makes adding more straightforward — implement a ProviderClient interface and the router can route to it.

Does it work with streaming?

Yes. SSE streams pass through. Cache hits on streaming requests are replayed as synthetic streams (your client cannot tell the difference).

In v1, post-hooks are skipped on streaming responses (caches still write, via stream accumulation). Full post-hook streaming support lands in v1.1.

Does it work with tool use / function calling?

Yes. Tools are part of the canonical request shape. mcp-optimizer operates on them directly. Tool calls + tool results round-trip cleanly through both Anthropic and OpenAI shapes.

Can I A/B test pipelines?

Yes. Two ways:

Per-request override with the x-prxy-pipe header — overrides the pipeline for one call only.
Multiple keys — give the variant key a different pipelineConfig.

Is there a dashboard?

Cloud mode is API-first today: checkout, key provisioning, billing, and usage enforcement are live. The hosted web console ships next; until then, your first key arrives by email after checkout and account changes go through the key/billing APIs or support.

Local mode: no built-in dashboard in v1. Query local storage directly or wire your own. v1.1 ships an open-source dashboard package you can run alongside the local container.

What’s on the roadmap?

See Changelog for shipped + planned. Highlights:

v1.1: router, rehydrator, compaction-bridge, prompt-optimizer, tool-cache, guardrails modules.
v1.2: hybrid local-cloud sync, npm module registry, evals module.
v2.0: Q-learning router, collective patterns (opt-in cross-user memory).

Where do I report bugs / request features?

Email [email protected]. For open-source local edition issues, use prxy-monster-local.