mcp-optimizer

Category: optimization · Cloud + Local · Status: v1 — production

Embeds each tool’s name + description, scores each one against the current user message, drops the tools that score below the threshold. The kept set is stable per-session so provider prompt caches don’t shatter.

What it does

MCP-using agents often ship broad tool catalogs on every request, even when the user is asking about one narrow task. mcp-optimizer keeps the relevant subset.

Latest local benchmark fixture:

Before: 120 tools, 13,265 tool-definition tokens
After:  52-103 tools, depending on task scenario
Range:  13.8% to 53.0% token reduction
Mean:   33.4% token reduction

Benchmark numbers are fixture results, not production averages. Run pnpm --filter @prxy/benchmarks bench locally or prxy bench --remote against your own authenticated endpoint.

When to use it

✅ Any agent that uses MCP tools ✅ Claude Code, Cline, Continue.dev, custom MCP clients ✅ Multi-server MCP setups (filesystem + GitHub + Slack + …)

❌ Apps that don’t use tools ❌ Apps where every tool is always relevant (rare)

Configuration

mcp-optimizer:
  relevanceThreshold: 0.6        # 0.0 - 1.0; lower = keep more tools
  preserveTools: []              # always keep these (by name)
  embeddingModel: 'voyage-3-lite' # or 'text-embedding-3-small'
  minTools: 1                    # never drop below this many

Metrics emitted

mcp-optimizer.tools.before (number)
mcp-optimizer.tools.after (number)
mcp-optimizer.tokens.saved (number)
mcp-optimizer.duration_ms (number)

Examples

Conservative — keep most tools, drop only obvious mismatches:

mcp-optimizer:
  relevanceThreshold: 0.4

Aggressive — strip hard, save tokens:

mcp-optimizer:
  relevanceThreshold: 0.75
  preserveTools: ['read_file', 'write_file', 'bash']

Coding-assistant tuned — keep file ops always, drop the rest by relevance:

mcp-optimizer:
  relevanceThreshold: 0.6
  preserveTools:
    - read_file
    - write_file
    - bash
    - grep
    - glob

How it works

Pre hook: extract the user’s last message text.
Embed the user message.
For each tool in request.tools: embed ${name}: ${description}. (Cached by tool hash — first request pays the cost, subsequent ones hit the cache.)
Compute cosine similarity. Keep tools above threshold + any in preserveTools. Always keep at least minTools.
Replace request.tools with the kept subset.
Attach metadata['mcp-optimizer.tokens.saved'] for downstream visibility.

The kept subset is stable per session. We hash the input set + threshold + user message into a session key — so the same cache prefix lands at Anthropic on the next turn. Critical for prompt cache hit rates.

Cloud vs Local

Mode	Embedding backend
Cloud	Voyage AI (configurable) — falls back to deterministic stub if no key
Local	Same — uses your `VOYAGE_API_KEY` if set, else stub

The stub is a SHA256-of-trigrams projected to 256-dim. Quality is poor but stable, so caches behave deterministically in tests.

Source

src/modules/mcp-optimizer.ts