ipc — Infinite Persistent Context
Category: context · Cloud + Local · Status: v1 — production
Long sessions normally hit the model’s context limit and start dropping turns. ipc compresses old turns into shorter representations as the conversation grows, so the session can keep running for hours, days, or weeks without losing structure.
What it does
Watches the running token count. When it crosses targetUtilization of the model’s context window, the oldest uncompressed turns get compressed. They’re never deleted from the archive — just replaced in the active prompt with shorter forms.
Active context (sent to model):
[L0] recent turns — verbatim
[L1] older turns — tool results truncated
[L2] older turns — full summary
[L3] oldest turns — single sentence per block
Archive (kept in storage):
every original message, byte-for-byte, available for rehydration
When to use it
✅ Long coding sessions (hours of back-and-forth) ✅ Research agents (read 200 sources, compose 1 report) ✅ Customer-support agents handling complex multi-turn tickets ✅ Any agent that hits “context window full” today
❌ One-shot Q&A ❌ Apps that already keep their own message history outside the LLM context
Configuration
ipc:
targetUtilization: 0.75 # compress when prompt > 75% of context window
llmCompression: false # planned: use a small LLM for L2 summaries
archiveToBlob: true # back up evicted messages for rehydration
preserveLastTurns: 5 # never compress the N most recent turns
preserveSystem: true # never compress the system message
Metrics emitted
ipc.tokens.before(number)ipc.tokens.after(number)ipc.tokens.saved(number)ipc.compressed_turns(number)
Examples
Default — kicks in when you’re getting close to the wall:
ipc:
targetUtilization: 0.75
Aggressive — compress earlier, keep more headroom:
ipc:
targetUtilization: 0.5
preserveLastTurns: 3
Quality-first — wait until you’re really pushing it:
ipc:
targetUtilization: 0.9
preserveLastTurns: 10
How it works
-
Pre hook:
- Estimate total tokens in the request.
- Get model’s context window from the pricing table.
- If
tokens / contextSize < targetUtilization: do nothing. - Otherwise, walk turns oldest-to-newest. Compress each one a level until under target.
-
Compression levels:
- L0 → no change (recent turns).
- L1 → tool results truncated to 200 chars + length suffix.
- L2 → first sentence of each turn (extractive summary).
- L3 → one-line block summary across multiple adjacent turns.
-
Archive (if
archiveToBlob: true):- Original messages are written to blob storage keyed by session + index.
- The
rehydratormodule can pull them back when the user references something we compressed.
The current module uses extractive compression (first-sentence-per-turn). It’s deterministic, fast, and free. Abstractive compression with a small LLM call for L2+ levels is planned; that should improve summaries at the cost of one extra round trip per compression event.
Compatibility
ipc plays nicely with:
mcp-optimizer— run mcp-optimizer first (drops tools), thenipcmeasures the actual prompt size.prompt-optimizer— run afteripcso cache markers land on the compressed-stable prefix.rehydrator— depends onipcfor the archive.
Cloud vs Local
Cloud archives are scoped to your account or workspace. Local archives stay in your configured local data volume. The archive is keyed by session ID, so even if a user starts a new session the old archive stays available for rehydrator lookups.