ipc — Infinite Persistent Context

Category: context · Cloud + Local · Status: v1 — production

Long sessions normally hit the model’s context limit and start dropping turns. ipc compresses old turns into shorter representations as the conversation grows, so the session can keep running for hours, days, or weeks without losing structure.

What it does

Watches the running token count. When it crosses targetUtilization of the model’s context window, the oldest uncompressed turns get compressed. They’re never deleted from the archive — just replaced in the active prompt with shorter forms.

Active context (sent to model):

[L0]  recent turns           — verbatim
[L1]  older turns            — tool results truncated
[L2]  older turns            — full summary
[L3]  oldest turns           — single sentence per block

Archive (kept in storage):
   every original message, byte-for-byte, available for rehydration

When to use it

✅ Long coding sessions (hours of back-and-forth) ✅ Research agents (read 200 sources, compose 1 report) ✅ Customer-support agents handling complex multi-turn tickets ✅ Any agent that hits “context window full” today

❌ One-shot Q&A ❌ Apps that already keep their own message history outside the LLM context

Configuration

ipc:
  targetUtilization: 0.75   # compress when prompt > 75% of context window
  llmCompression: false     # planned: use a small LLM for L2 summaries
  archiveToBlob: true       # back up evicted messages for rehydration
  preserveLastTurns: 5      # never compress the N most recent turns
  preserveSystem: true      # never compress the system message

Metrics emitted

ipc.tokens.before (number)
ipc.tokens.after (number)
ipc.tokens.saved (number)
ipc.compressed_turns (number)

Examples

Default — kicks in when you’re getting close to the wall:

ipc:
  targetUtilization: 0.75

Aggressive — compress earlier, keep more headroom:

ipc:
  targetUtilization: 0.5
  preserveLastTurns: 3

Quality-first — wait until you’re really pushing it:

ipc:
  targetUtilization: 0.9
  preserveLastTurns: 10

How it works

Pre hook:
- Estimate total tokens in the request.
- Get model’s context window from the pricing table.
- If tokens / contextSize < targetUtilization: do nothing.
- Otherwise, walk turns oldest-to-newest. Compress each one a level until under target.
Compression levels:
- L0 → no change (recent turns).
- L1 → tool results truncated to 200 chars + length suffix.
- L2 → first sentence of each turn (extractive summary).
- L3 → one-line block summary across multiple adjacent turns.
Archive (if archiveToBlob: true):
- Original messages are written to blob storage keyed by session + index.
- The rehydrator module can pull them back when the user references something we compressed.

The current module uses extractive compression (first-sentence-per-turn). It’s deterministic, fast, and free. Abstractive compression with a small LLM call for L2+ levels is planned; that should improve summaries at the cost of one extra round trip per compression event.

Compatibility

ipc plays nicely with:

mcp-optimizer — run mcp-optimizer first (drops tools), then ipc measures the actual prompt size.
prompt-optimizer — run after ipc so cache markers land on the compressed-stable prefix.
rehydrator — depends on ipc for the archive.

Cloud vs Local

Cloud archives are scoped to your account or workspace. Local archives stay in your configured local data volume. The archive is keyed by session ID, so even if a user starts a new session the old archive stays available for rehydrator lookups.

Source

src/modules/ipc.ts