Claude Code Leaked, Learn from the Best 05

How does Claude Code do compression/compaction? Every turn, you send the entire conversation history to the API. Long sessions accumulate hundreds of messages. Eventually you hit the context window limit and the API refuses
your request. Compaction is how Claude Code survives long sessions without hitting that wall.

There are four distinct compaction strategies, each triggered by different conditions. Think of them as a waterfall — try the lightest one first, escalate to heavier ones only if needed.

Strategy 1: Microcompact — Silent Tool Result Trimming

What it does: Removes the content of old tool results without removing the messages themselves.

Before:
[tool_result for grep call #1] → “matches: file.py:42 function foo…” ← 800 tokens
[tool_result for grep call #2] → “matches: config.ts:15 api_key…” ← 600 tokens
[tool_result for grep call #3] → “matches: utils.py:99 helper…” ← 500 tokens

After microcompact:
[tool_result for grep call #1] → “[Old tool result content cleared]” ← 6 tokens
[tool_result for grep call #2] → “[Old tool result content cleared]” ← 6 tokens
[tool_result for grep call #3] → “matches: utils.py:99 helper…” ← kept (most recent)

Only these tool types are eligible: FileRead, Bash, Grep, Glob, WebSearch, WebFetch, FileEdit, FileWrite. These are all re-runnable — if the model needs the file again, it can
just read it again.

The key insight: you clear content but keep the structure. The API still sees a valid conversation with tool_use/tool_result pairs. The model just sees stale results were
cleared.

There are two sub-variants:

Regular microcompact — replaces content in the local message array directly.

Cached microcompact — does NOT touch the local messages at all. Instead it queues a cache_edits API instruction that tells the server “delete these tool results from your cached
prefix.” The server’s cached copy gets edited without invalidating the whole cache. This is the clever one — you save tokens AND keep the prompt cache warm.


Strategy 2: Time-based Microcompact

What triggers it: The user walks away and comes back. If the gap since the last assistant message exceeds a threshold (e.g., 30 minutes), the server-side cache has already
expired anyway. So there is no point trying to preserve it — just clear old tool results directly and move on.

const gapMinutes = (Date.now() – new Date(lastAssistant.timestamp).getTime()) / 60_000
if (gapMinutes >= config.gapThresholdMinutes) {
// server cache is cold anyway — clear old tool results in place
}

This runs before cached microcompact because if the cache is cold, cache editing is pointless.


Strategy 3: Autocompact — Full Conversation Summarization

What triggers it: Token count crosses a threshold:

Context window: 200,000 tokens
Reserved for output: -20,000 tokens (p99.99 of compact summaries)
Effective window: 180,000 tokens
Buffer before threshold: -13,000 tokens
──────────────────────────────────────────────
Autocompact fires at: 167,000 tokens

What it does: Runs a separate forked LLM call that reads your entire conversation and generates a summary. Then replaces the conversation history with:

  1. A compact boundary marker message
  2. The summary
  3. The most recent few messages (kept verbatim so context isn’t lost) Before autocompact:
    [200 messages, 170,000 tokens] After autocompact:

[compact_boundary_marker]

[summary: “The user asked to fix auth bug. We found issue in middleware.ts line 42…”]

[last 5 messages verbatim]

→ ~15,000 tokens total

The summary itself uses COMPACT_MAX_OUTPUT_TOKENS (50,000 tokens output budget) and images are stripped before the summarizer sees them — images aren’t needed for a text summary
and would push the summarizer itself over limit.

Circuit breaker: After 3 consecutive failures it stops trying. This prevents 250,000 wasted API calls per day (a real production incident they logged as a comment in the code).


Strategy 4: Snip — Selective Message Removal

What it does: Removes individual messages from the middle of the conversation (not the whole thing), targeting the largest/oldest ones. Runs before autocompact so that if snip
brings tokens under the threshold, autocompact doesn’t fire at all — preserving more granular context.


The Compaction Waterfall in query.ts

Each iteration of while(true):

  1. Time-based microcompact? ← gap since last message > threshold
    yes → clear old tool results, skip cached MC
  2. Cached microcompact? ← count of tool results > threshold
    yes → queue cache_edits for API layer, messages unchanged
  3. Snip? ← individual large messages to remove
    yes → remove them, track tokens freed
  4. Autocompact? ← total tokens > 167k threshold
    yes → fork summarizer agent, replace history with summary
  5. Blocking limit? ← absolute hard limit
    yes → reject request, tell user to /compact manually

Pre-run or prefetch: Claude Code starts slowly if it waits for everything sequentially. On macOS, reading two keychain entries (OAuth token + API key) takes ~65ms each, done sequentially = ~130ms of blocked startup before even parsing CLI arguments. Pre-run (prefetch) is the pattern of starting slow work as early as possible so it finishes in the background while fast work runs in the foreground.

 ---
  The Keychain Prefetch (the clearest example)

  // keychainPrefetch.ts

  export function startKeychainPrefetch(): void {
      // fired at main.tsx TOP LEVEL — before any imports finish

      const oauthSpawn = spawnSecurity('Claude Code-credentials')  // async subprocess
      const legacySpawn = spawnSecurity('Claude Code')             // async subprocess

      // both run IN PARALLEL with each other AND with all the imports loading
      prefetchPromise = Promise.all([oauthSpawn, legacySpawn]).then(([oauth, legacy]) => {
          primeKeychainCacheFromPrefetch(oauth.stdout)   // prime cache
          legacyApiKeyPrefetch = { stdout: legacy.stdout }
      })
  }

  // later, after imports are done:
  export async function ensureKeychainPrefetchCompleted(): Promise<void> {
      if (prefetchPromise) await prefetchPromise   // nearly free — already done
  }

  The comment in the code says it perfectly: "nearly free since the subprocesses finish during import evaluation". You are hiding ~130ms of I/O behind work that had to happen
  anyway (loading JS modules).

  ---
  The Memory Prefetch (per-turn)

  In query.ts line 301:

  // fired once per USER TURN — before the LLM call starts
  using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
      state.messages,
      state.toolUseContext,
  )

  This starts reading CLAUDE.md memory files from disk before the API call is made. The API call takes 2-30 seconds. By the time the model responds and tools run, the memory files
   are already loaded in the background. They get consumed only when they've settled:

  // after tool execution — consume only if already settled
  if (pendingMemoryPrefetch.settledAt !== null && pendingMemoryPrefetch.consumedOnIteration === -1) {
      const memoryAttachments = await pendingMemoryPrefetch.promise
      // inject into context
  }

  If it hasn't settled yet, it waits for the next iteration — giving it another chance without blocking the current one.

  ---
  The Skill Discovery Prefetch

  Same pattern applied to skill files:

  // start BEFORE the model responds
  const pendingSkillPrefetch = skillPrefetch?.startSkillDiscoveryPrefetch(messages, toolUseContext)

  // [LLM call happens for 2-30 seconds]
  // [tool execution happens]

  // collect AFTER tools — hidden behind everything else
  const skillAttachments = await skillPrefetch.collectSkillDiscoveryPrefetch(pendingSkillPrefetch)

Lastly what does the ink file do? ink.ts is the terminal UI engine for Claude Code. It is what makes the terminal feel like an interactive app — streaming text, colored output, keyboard input, clickable links —
instead of a plain script that prints and exits.

The src/ink/ folder is a full custom implementation of the rendering pipeline. This is not just a thin wrapper — it’s a serious piece of engineering. Look at what’s inside:

ink/
├── dom.ts ← virtual DOM for the terminal (like browser’s DOM)
├── renderer.ts ← converts virtual DOM to screen characters
├── reconciler.ts ← React reconciler (connects React to Ink’s DOM)
├── layout/
│ ├── engine.ts ← calculates box sizes and positions
│ └── yoga.ts ← Facebook’s Yoga layout engine (flexbox for terminals)
├── output.ts ← writes ANSI codes to stdout
├── screen.ts ← double-buffering (front/back frame)
├── frame.ts ← one rendered frame
├── render-to-screen.ts ← diffs frames, emits only changed cells
├── render-node-to-output.ts ← walks DOM tree, calculates what to draw
├── wrap-text.ts ← word-wrap for terminal width
├── parse-keypress.ts ← converts raw stdin bytes to key events
├── termio/ ← raw terminal protocol (ANSI, CSI, OSC codes)
├── events/ ← event system (input, click, focus, keyboard)
├── hooks/ ← React hooks for terminal (useInput, useStdin, etc.)
└── components/ ← primitive components (Box, Text, Button, Link)

Claude Code Built Its Own Ink Instead of Using the npm Package, In the Python port (claw-code), ink.py reduced this to one function: def render_markdown_panel(text: str) -> str:

Leave a Reply