Using prxy.monster with the OpenAI SDK

prxy.monster exposes an OpenAI Chat Completions-compatible API at https://api.prxy.monster/v1. Clients using chat.completions.create can point at prxy.monster with a base URL change.

Install

npm install openai
# or
pip install openai

Configure

The official OpenAI client respects OPENAI_BASE_URL for Chat Completions calls in Node and Python.

export OPENAI_BASE_URL=https://api.prxy.monster/v1
export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx

Code change

None. Both openai (Node) and openai (Python) auto-pick up the env var.

Node

// Before AND after — no diff
import OpenAI from 'openai';
 
const client = new OpenAI();
 
const r = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'hi' }],
});

If you prefer explicit:

const client = new OpenAI({
  baseURL: 'https://api.prxy.monster/v1',
  apiKey: process.env.OPENAI_API_KEY,
});

Python

# Before AND after — no diff
from openai import OpenAI
 
client = OpenAI()
 
r = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hi"}],
)

If you prefer explicit:

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.prxy.monster/v1",
    api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)

Verify

curl https://api.prxy.monster/health

Or, with the CLI:

prxy doctor

What you get

Infinite context — chat.completions.create calls compress old turns instead of dropping them.
Semantic cache — similar prompts hit cache, return in 15-30ms.
Pattern memory — successful answers get learned and re-injected.
Cost guards — hard per-request budget caps before the OpenAI bill arrives.

Recommended pipeline

PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipc

For batch / cost-sensitive workloads, add exact-cache first:

PRXY_PIPE=exact-cache,semantic-cache,cost-guard,patterns

Streaming

const stream = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'tell a story' }],
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Works identically. Cache hits replay as synthetic SSE.

Common issues

Function calling / tools — pass-through. mcp-optimizer prunes irrelevant tool defs automatically if you ship many.
response_format: { type: 'json_object' } — pass-through.
Responses API (/v1/responses) — planned, not proxied today.
Assistants API (/v1/assistants, threads, runs) — not proxied. Use Chat Completions instead.
Realtime API — not proxied.
Embeddings (/v1/embeddings) — not a public proxy route today; embeddings are used internally by cache modules.

Full example

Plain Node script: github.com/Ekkos-Technologies-Inc/prxy-monster-examples/tree/main/examples/openai-quickstart

prxy.monster speaks the OpenAI Chat Completions wire format. Newer OpenAI features (Responses API, Realtime API) are not yet proxied — track /changelog for support.