mcp-optimizer
Category: optimization · Cloud + Local · Status: v1 — production
Embeds each tool’s name + description, scores each one against the current user message, drops the tools that score below the threshold. The kept set is stable per-session so provider prompt caches don’t shatter.
What it does
MCP-using agents often ship broad tool catalogs on every request, even when the user is asking about one narrow task. mcp-optimizer keeps the relevant subset.
Latest local benchmark fixture:
Before: 120 tools, 13,265 tool-definition tokens
After: 52-103 tools, depending on task scenario
Range: 13.8% to 53.0% token reduction
Mean: 33.4% token reduction
Benchmark numbers are fixture results, not production averages. Run pnpm --filter @prxy/benchmarks bench locally or prxy bench --remote against your own authenticated endpoint.
When to use it
✅ Any agent that uses MCP tools ✅ Claude Code, Cline, Continue.dev, custom MCP clients ✅ Multi-server MCP setups (filesystem + GitHub + Slack + …)
❌ Apps that don’t use tools ❌ Apps where every tool is always relevant (rare)
Configuration
mcp-optimizer:
relevanceThreshold: 0.6 # 0.0 - 1.0; lower = keep more tools
preserveTools: [] # always keep these (by name)
embeddingModel: 'voyage-3-lite' # or 'text-embedding-3-small'
minTools: 1 # never drop below this many
Metrics emitted
mcp-optimizer.tools.before(number)mcp-optimizer.tools.after(number)mcp-optimizer.tokens.saved(number)mcp-optimizer.duration_ms(number)
Examples
Conservative — keep most tools, drop only obvious mismatches:
mcp-optimizer:
relevanceThreshold: 0.4
Aggressive — strip hard, save tokens:
mcp-optimizer:
relevanceThreshold: 0.75
preserveTools: ['read_file', 'write_file', 'bash']
Coding-assistant tuned — keep file ops always, drop the rest by relevance:
mcp-optimizer:
relevanceThreshold: 0.6
preserveTools:
- read_file
- write_file
- bash
- grep
- glob
How it works
- Pre hook: extract the user’s last message text.
- Embed the user message.
- For each tool in
request.tools: embed${name}: ${description}. (Cached by tool hash — first request pays the cost, subsequent ones hit the cache.) - Compute cosine similarity. Keep tools above threshold + any in
preserveTools. Always keep at leastminTools. - Replace
request.toolswith the kept subset. - Attach
metadata['mcp-optimizer.tokens.saved']for downstream visibility.
The kept subset is stable per session. We hash the input set + threshold + user message into a session key — so the same cache prefix lands at Anthropic on the next turn. Critical for prompt cache hit rates.
Cloud vs Local
| Mode | Embedding backend |
|---|---|
| Cloud | Voyage AI (configurable) — falls back to deterministic stub if no key |
| Local | Same — uses your VOYAGE_API_KEY if set, else stub |
The stub is a SHA256-of-trigrams projected to 256-dim. Quality is poor but stable, so caches behave deterministically in tests.