SumoCode — Complete Architecture Guide

§ 01

The 30-second pitch

Pi is the engine. SumoCode is the cathedral built on top of it.

SumoCode is a Pi extension that turns a generic terminal AI agent into a personal one — with persistent memory, a custom-built retained renderer, three themes, and five preattentive status colors.

The thesis

Terminals deserve design systems.

The default Pi UX is generic — same footer, same status indicators, same look as every other Pi user gets. SumoCode owns the experience layer entirely while delegating agent loop, LLM, sessions, tools, and MCP to Pi. It's the shadcn/ui of terminal AI agents.

Built on

@mariozechner/pi-mono

Pi is the engine: provider abstraction, agent loop, tool framework, sessions, MCP, skills, extension API. SumoCode adds a Node-native retained renderer on top.

Synced across

Two repos, one identity

This public repo (UI) pairs with a private sumocode-config repo (persona, memory, settings). Same persona on every machine. git pull = move identity.

The value proposition holds three pieces. First: identity persists. A persona file is appended to every system prompt; the agent introduces itself the same way on every session, every machine. Second: state is visible. A single colored dot in the footer tells you in <250ms whether the agent is idle, thinking, running a tool, awaiting approval, or writing to memory. Third: the renderer is real. SumoCode owns the alternate screen, mouse routing, in-app scroll, modal layers, and a Yoga-based flexbox layout — none of which Pi's default renderer offers.

§ 02

A frontend engineer's mental model

Translate what you already know. Almost every concept has a web equivalent.

Before any code, here is the cheat sheet. Once these mappings click, the rest of the architecture follows naturally:

Frontend concept	SumoCode / TUI equivalent	What it actually does
The browser	The terminal emulator	iTerm2, Ghostty, Alacritty — they parse ANSI escape codes the same way browsers parse HTML.
The DOM	CellBuffer (rows × cols of Cells)	A 2D grid where every cell holds char + fg + bg + bold/italic/dim. The "DOM" is fixed-grid, not a tree.
CSS	ANSI escape sequences	`\x1b[38;2;217;119;6m` sets foreground to `#D97706`. Looks gnarly, behaves like inline-style.
React Fiber	SumoNode tree (Yoga-backed)	Retained tree of layout nodes. Reconciles into a CellBuffer the way React reconciles into the DOM.
Virtual DOM diff	`diffFrames(prev, next)`	Cell-by-cell diff produces patches; only changed regions get written to the terminal. Saves 50-90% of bytes.
Flexbox	Yoga (the literal Facebook engine)	Same Yoga that powers React Native. Compiled to WebAssembly, ~87KB, computes layout in microseconds.
requestAnimationFrame	FrameScheduler	Adaptive 60fps coalescing while streaming, event-driven (idle 0fps) otherwise.
A modal / portal	Overlay layer + altscreen	Altscreen is the terminal's "fullscreen modal" — when you exit, your shell history is back, untouched.
Hover / click handlers	SGR mouse reporting	The terminal emits `\x1b[<0;42;7M` on click; we parse it into `{type:'down', row:6, col:41}`.
localStorage / IndexedDB	`~/.sumocode/` JSONL files	Diagnostics, session caches, crash logs. Plus `sumocode-config/memory/` for cross-machine state.
Web Workers	Pi sub-processes (task tool)	Spawned via `node-pty` for parallel Pi instances. ACP protocol over stdio.
shadcn/ui	The closest analogy for SumoCode	You don't replace the framework, you decorate it. Pi = framework. SumoCode = the design system + components.

Once you accept that "the terminal" is just a 2D grid of styled cells with an event stream, the rest is just React. — the only real conceptual leap

§ 03

TUI primer · the things nobody tells you

Five concepts that unlock the entire codebase.

Concept 1

Altscreen

Terminals have a second screen buffer. Sending \x1b[?1049h switches into it; the user's shell history is preserved underneath. \x1b[?1049l on exit restores it. This is how full-screen TUIs (vim, htop, SumoCode) take over without nuking your scrollback.

Concept 2

ANSI escape codes

The "CSS of terminals." A magic byte (\x1b, ESC) followed by control sequences. Examples: [2J clears screen, [H moves cursor home, [38;2;R;G;Bm sets 24-bit foreground color. Verbose, but deterministic.

Concept 3

SGR mouse reporting

Without enabling mouse mode, a mouse wheel sends arrow keys (up/down). With \x1b[?1006h, you get structured events like \x1b[<0;42;7M. SumoCode parses these into MouseEvent objects and routes them through hit-testing.

Concept 4

Cell width ≠ character count

Emoji, CJK characters, and combining marks occupy 2 cells. JavaScript's "日".length === 1, but on screen it's 2 columns wide. Pi's visibleWidth() handles this; SumoCode uses Intl.Segmenter for grapheme clustering.

Concept 5

Kitty keyboard protocol

Modern terminals support distinguishing Ctrl+I from Tab (they're the same byte historically). SumoCode pushes the kitty flags (\x1b[>7u) on entry, pops them on exit. Without this, half your keybindings collide with Tab/Enter/Esc.

Concept 6

The 4 cleanup escapes

The hardest TUI bug: when your process crashes and the terminal is left in mouse-on / altscreen-on / kitty-on state. SumoCode registers signal handlers for SIGINT/SIGTERM/SIGHUP/SIGTSTP/SIGCONT and an uncaughtException hook. All four cleanup paths converge on one cleanup sequence.

⚠ Why this matters

Of the ~38 working days budgeted for sumo-tui, roughly half is spent on robustness — signal handlers, escape sequence cleanup, mouse SGR parsing, kitty keyboard handshakes, paste filtering, cursor visibility forcing. The "fun" rendering work is the easy part. Making it never break your shell when it crashes is the hard part.

Here's a tiny sample of what an ANSI-encoded chat row actually looks like, expanded so you can see the structure:

          raw bytes for one footer row
          ~80 columns
        

// What you see on screen:                                       
// ~/sumocode (main) · ↑12k ↓8k · $0.42 · ● READY · sonnet-4.5

\x1b[H                                  // move cursor home
\x1b[38;2;245;230;200m                // fg = #F5E6C8 (parchment)
\x1b[48;2;26;21;17m                  // bg = #1A1511 (cathedral bg)
~/sumocode \x1b[2m(main)\x1b[22m · ↑12k ↓8k · $0.42 ·
\x1b[38;2;127;176;105m●\x1b[0m READY · sonnet-4.5
\x1b[K                                 // clear to end of line

§ 04

The five layers

From your fingertips down to the silicon. Read top-to-bottom.

5

Cathedral UX layer

Splash, footer, sidebar, top chrome, working indicator, themes. The pieces a user sees and forms an opinion about.

src/cathedral/ src/themes/

4

SumoCode extension modules

The Pi extension entry point and its install* functions: question tool, answer wizard, approval modal, native task, command palette, slash commands, memory editor.

src/extension.ts src/commands/

3

SumoTUI · the retained renderer

Yoga layout tree, CellBuffer compositor, frame diff, ANSI writer, mouse SGR parser, key router, frame scheduler, ChatPager scroll widget, modal layer, owned-shell renderer.

src/sumo-tui/ ~11k lines

↕ patch

A 12-line patch on Pi's dist/main.js swaps Pi's InteractiveMode for SumoInteractiveMode when SUMO_TUI=1. Loaded via jiti at the boundary.

2

Pi · the agent engine

LLM provider abstraction, agent loop, tool execution (bash/read/write/edit/mcp), session management, MCP server gateway, skills system, extension API.

@mariozechner/pi-coding-agent v0.70.x

1

Terminal & OS

The terminal emulator (Ghostty, iTerm2, Alacritty), Node.js runtime (≥20), and the OS (macOS for v1). Altscreen, ANSI parser, raw stdin/stdout, signal delivery, fork+exec for git/bash.

Node 20+ macOS

Each layer only knows about the layer immediately below. The Cathedral layer doesn't know how Yoga works; it just registers components with ctx.ui.setFooter(...). SumoTUI doesn't know how Pi's tools work; it just renders ChatBlock objects from a transcript view-model. This separation is what makes the codebase scalable despite spanning 96 modules.

flowchart TB subgraph Browser["Frontend equivalent"] direction TB React["React
fiber tree"] --> ReactDOM["react-dom
commit phase"] ReactDOM --> DOM["DOM
rendered tree"] DOM --> Pixels["Pixels
browser paint"] end subgraph SumoCode["SumoCode equivalent"] direction TB Modules["Extension modules
setFooter, setWidget"] --> SumoTUI["SumoTUI
Yoga + composite"] SumoTUI --> CellBuf["CellBuffer
2D cell grid"] CellBuf --> ANSI["ANSI bytes
terminal paint"] end Browser ~~~ SumoCode classDef left fill:#241D17,stroke:#D97706,color:#F5E6C8 classDef right fill:#241D17,stroke:#D97706,color:#F5E6C8 class React,ReactDOM,DOM,Pixels left class Modules,SumoTUI,CellBuf,ANSI right

React's pipeline (left) and SumoTUI's pipeline (right) are structurally identical.

§ 05

Inside the retained renderer

Six stages from a state mutation to bytes on the wire.

SumoTUI is a retained renderer. That word matters: the alternative ("immediate mode") is what Pi originally used — every render rebuilds the entire frame from scratch, every time. Retained means we keep a tree of nodes around between frames and only re-layout / re-composite the parts that change.

Here's what happens when, say, the working indicator ticks one frame forward:

requestRender()

A node calls requestRender(). The frame scheduler enqueues a dirty token.

FrameScheduler

Coalesce

The scheduler waits 0–16ms for more events to coalesce. Bursty input collapses to one paint.

adaptive 60fps

Yoga layout

root.calculateLayout(width, height, LTR) resolves flex sizes for the entire tree.

WASM, ~0.5ms

Composite

composite() walks the tree depth-first, painting each node's cells into a fresh CellBuffer.

CellBuffer

Diff frames

diffFrames(prev, next) finds changed cell ranges row-by-row. Stable cells produce zero output.

FrameDiffPatch[]

Write ANSI

Each patch becomes an ANSI sequence. Cursor moves + styled bytes streamed to stdout.

terminal-controller

The CellBuffer · your «DOM»

Every visible character on screen is one Cell. A cell holds:

          src/sumo-tui/render/cell.ts
          typescript
        

interface Cell {
    char: string;           // "A", "日" (2-wide), or "" (continuation)
    fg:   string | null;    // "#F5E6C8"
    bg:   string | null;    // "#1A1511"
    attrs: {
      bold: boolean;
      italic: boolean;
      underline: boolean;
      dim: boolean;
      inverse: boolean;
    }
}

// CellBuffer is rows × cols of these
class CellBuffer {
    private chars: Uint16Array;  // hot path uses typed arrays
    private fg: Map<number, string>;     // sparse: most cells share fg/bg
    private bg: Map<number, string>;
    private attrs: Map<number, number>;   // packed bitfield
}

The optimization that matters: most cells in a frame are blank or share the same style as their neighbors. Storing fg/bg/attrs in sparse Maps instead of dense arrays cuts memory by 90% in typical frames. The diff algorithm then walks rows in linear time, finding the leftmost and rightmost differing column per row, and emits the smallest possible ANSI patch.

↗ Performance win

The per-row column-range diff is borrowed from OpenTUI's renderer.zig (lines 1331-1349). On a streaming chat update where only the bottom row changes, this saves 50-90% of bytes per frame compared to full-row repaints. Cursor blinks no longer cost a screen-wide repaint.

Why Yoga, specifically

Yoga is Facebook's flexbox engine — the same one that powers React Native, Litho, and ComponentKit. SumoCode uses yoga-wasm-web: 87KB of WASM, no native build step, identical layout semantics to web flex. The retained tree isn't custom; it's CSS flex with terminal-cell units.

This means the splash screen's "vertical center" isn't padding math. It's:

          src/sumo-tui/cathedral/splash-tree.ts
          typescript
        

Root(flexDirection: column, flexGrow: 1)
  ├─ TopSpacer(flexGrow: 1)     // fills available space
  ├─ Splash(flexShrink: 0)      // fixed: cat + wordmark + quote
  └─ BottomSpacer(flexGrow: 1)  // fills available space

// Yoga splits free rows 50/50 between the spacers.
// Resize the terminal? Layout recomputes for free.

§ 06

Boot sequence · zero to first paint

Following one process from $ sumocode to a rendered cathedral.

Six phases, ~700-1100ms cold start. Each phase has its own actor:

sequenceDiagram autonumber participant U as User participant Sh as Shell participant Pi as Pi (engine) participant Sumo as SumoInteractiveMode participant Y as Yoga (WASM) participant Ext as SumoCode extension participant T as Terminal U->>Sh: $ sumocode Sh->>Sh: parse args, find Pi binary Sh->>Sh: check Pi has loadSumoInteractiveMode patch Sh->>Pi: exec pi -e src/extension.ts Pi->>Pi: bootstrap (config, providers) Pi->>Sumo: import('SUMO_TUI_MODULE') Sumo->>Sumo: jiti transpile sumo-tui (~300ms) Sumo->>Y: loadYoga() — read WASM, init Y-->>Sumo: yoga ready Sumo->>T: enter altscreen + mouse SGR + kitty kbd Sumo->>Pi: upstream.init() — load extensions Pi->>Ext: load src/extension.ts Ext->>Pi: register install* handlers Pi->>Pi: emit session_start Pi-->>Ext: session_start fires for all handlers Ext->>Sumo: setWidget, setFooter, setHeader Sumo->>Y: calculateLayout(cols, rows) Sumo->>Sumo: composite() → CellBuffer Sumo->>Sumo: diffFrames(undefined, frame) Sumo->>T: write ANSI patches T-->>U: cathedral splash visible

The complete boot sequence — every actor and handoff from invocation to first paint.

The jiti transpile step (~300ms in the cold path) is the single largest cost. It exists because sumo-interactive-mode.js bridges Pi's CommonJS-loaded patch into our TypeScript source on the fly. Pre-compiling that entry point into a real JS bundle would cut cold start by half — already filed as a P0 in the perf audit.

Phases in human terms:

Shell handshake — bash launcher inspects Pi's dist/main.js for the patch marker, sets SUMO_TUI_MODULE to a file:// URL pointing at our bridge, then execs Pi.
Pi bootstrap — provider config, model registry, MCP server connections, session manager. Pi-owned.
Bridge load — Pi's patched main.js reaches the constructor site, sees SUMO_TUI=1, dynamically imports our bridge, and instantiates SumoInteractiveMode instead of InteractiveMode.
Retained runtime start — Yoga WASM initialization, splash tree creation, ChatPager allocation, frame scheduler bootstrap, terminal session entry (altscreen + mouse + kitty + paste mode).
Extension install — Pi loads src/extension.ts, which calls 14 install* functions. Each registers handlers but defers all DOM-equivalent work to session_start.
First paint — Pi emits session_start; the cascade of handlers populates widgets; Yoga lays out; the compositor produces the first CellBuffer; diff produces patches; ANSI hits stdout; user sees the cat.

§ 07

Tools, extensions & the override system

The interesting part: how Pi's built-in tools coexist with SumoCode's overrides.

Pi exposes its tool system through three different surfaces, and SumoCode has to integrate with each one differently. This is the table that took me longest to internalize:

Tool layer	Examples	How SumoCode interacts
Pi built-ins	bash · read · write · edit · mcp	Never re-register. Intercept via `pi.on("tool_call")` for approval gating; render results via the transcript view-model pipeline.
Pi example exts	question	Override. Register a tool with the same name; SumoCode's wins. Our `question` tool maps to the Divine Query overlay.
SumoCode-only	task · /answer	Register fresh. Native task tool spawns Pi sub-processes for parallel work. /answer is a wizard for multi-question flows.
Pi internal UI	showExtensionSelector · showExtensionConfirm	Cannot intercept without upstream Pi changes. SumoCode-owned code calls our themed overlays directly instead.

The transcript view-model

Every chat message — user, assistant, tool, skill, delegation — flows through one shared view-model before any rendering happens:

          src/sumo-tui/transcript/view-model.ts
          discriminated union
        

type ChatBlock =
  | { type: "markdown"; text: string }
  | { type: "code"; lang: string; source: string }
  | { type: "tool"; tool: ToolCallViewModel }
  | { type: "skill"; name: string; expanded: boolean }
  | { type: "question"; question: QuestionViewModel }
  | { type: "delegation"; delegation: DelegationViewModel };

type ChatMessageViewModel = {
    id: string;
    role: "user" | "sumo" | "system";
    blocks: ChatBlock[];
};

This abstraction is the lever that makes everything downstream possible. The visual harness can build deterministic transcripts without running an LLM. The chat renderer can switch on block type without parsing strings. New block types (like delegation pills for sub-process scrolls) ship as additions to the union — no renderer changes needed elsewhere.

The 14 install hooks

The extension entry point is intentionally boring — it's just an ordered list of installations. Order matters: render diagnostics installs first so it can wrap every later setFooter/setWidget call; session cache installs second so its invalidation runs alongside producer updates.

          src/extension.ts · the load order
          14 install hooks
        

export default function sumocode(pi: ExtensionAPI): void {
    installRenderDiagnostics(pi);    // 01: wrap UI calls
    installSessionCache(pi);          // 02: cache token tally + git branch
    installAltscreen(pi);             // 03: lifecycle + signal cleanup
    installTopChrome(pi);             // 04: top header bar
    installSplash(pi);                // 05: cathedral splash widget
    installFooter(pi);                // 06: status footer
    installCathedralEditor(pi);       // 07: input frame chrome
    installInputHints(pi);            // 08: keybind hint row
    installApprovalGate(pi);          // 09: dangerous bash guard
    taskTool({...})(pi);              // 10: native parallel task
    installQuestionTool(pi);          // 11: divine query override
    installAnswerTool(pi);            // 12: /answer wizard
    installWorkingIndicator(pi);      // 13: theme-aware spinner
    installSumoInteractions(pi);      // 14: slash commands + shortcuts
}

§ 08

The Pi patch · why we forked

The smallest possible patch on the smallest possible surface, treated as a maintenance contract.

Pi is a public, maintained npm package. SumoCode is a Pi extension — but extensions only get to register components, not replace Pi's interactive constructor. To own the alternate screen lifecycle, mouse routing, scroll, and modal layers, we need to be the InteractiveMode that Pi instantiates.

The fix is a 12-line patch on Pi's dist/main.js that swaps the constructor when an env var is set:

          patches/@mariozechner__pi-coding-agent@0.70.2.patch
          +12 / −2 lines
        

-const interactiveMode = new InteractiveMode(runtime, options);
+const useSumoTui = isTruthy(process.env.SUMO_TUI) || parsed.unknownFlags.has("sumo-tui");
+const interactiveMode = useSumoTui
    ? await loadSumoInteractiveMode(runtime, options)
    : new InteractiveMode(runtime, options);

async function loadSumoInteractiveMode(...args) {
    const spec = process.env.SUMO_TUI_MODULE ?? "@dhruvkelawala/sumocode/sumo-interactive-mode";
    const { SumoInteractiveMode } = await import(spec);
    return new SumoInteractiveMode(...args);
}

The contract is explicit:

Default-off. Without SUMO_TUI=1, Pi behaves exactly like vanilla Pi. Other Pi users are unaffected.
Runtime-gated. Activation is one env var. Disable per-launch with --no-sumo-tui.
Wrapper-checked. The sumocode shell launcher inspects the patch marker before activating. Patch missing? Falls back to legacy Pi UI with a warning.
Pinned to Pi 0.70.x. Pi version bumps follow a documented smoke matrix (docs/research/pi-fork-upgrade.md).
Revisitable. When Pi exposes a public interactive-mode injection API, we delete the patch.

This isn't ideology — it's pragmatism. opentui-island's sidecar architecture would have cost ~500ms cold start and ~400MB RSS for our four chrome regions. Forking Pi entirely would have meant re-implementing the LLM/agent/tool/MCP surface. The 12-line patch is the smallest mutation that gets us where we need to be.

§ 09

Themes & the five preattentive states

Color as information density. Color as identity.

Preattentive processing is the visual-perception term for "things you notice before you decide to look." Cone-density-aware research shows you can disambiguate ~5 hues simultaneously in your peripheral vision. SumoCode picks five and assigns one agent state to each:

READY

Idle. Awaiting input. Sage green — calm, low-chroma, doesn't pull focus.

MEDITATING

LLM is generating. Warm gold — active, inviting, doesn't read as alarm.

ILLUMINATING

A tool is running. Mid-saturation blue — distinct from gold/green at a glance.

DEFERRING

Approval needed. Crimson — the only desaturated red on the surface; hijacks attention.

INSCRIBING

Writing to memory. Soft purple — rare, signals "long-term effect on the agent."

The dot lives in the footer's right zone. The state name (uppercase, Cathedral verb) appears next to it. Both are theme-driven — switching to Obsidian Temple swaps colors but keeps the semantics identical.

The Cathedral palette

The default theme is named for its visual reference: a 19th-century scriptorium. Warm walnut surfaces, parchment foreground, burnt-orange accent. Every color is a typed token in src/themes/cathedral.ts:

          src/themes/cathedral.ts
          typed theme tokens
        

export const CATHEDRAL_THEME: Theme = {
    name: "cathedral",
    tokens: { colors: {
        background: "#1A1511",      // walnut deep
        surface: "#241D17",         // walnut mid (sidebar bg)
        foreground: "#F5E6C8",      // parchment
        foregroundDim: "#8B7A63",   // muted brown for dim text
        accent: "#D97706",          // burnt orange — single accent
        states: {
            idle: "#7FB069", thinking: "#E8B339",
            tool: "#5B9BD5", approval: "#C1443E",
            learning: "#8E7AB5"
        },
    } },
    workingIndicator: { frames: ["◌", "✦", "❖", "✺", "❋", "❉"], intervalMs: 150 },
    chrome: { ...DEFAULT_CHROME },                           // box-drawing glyphs
};

The chrome object holds the structural vocabulary — frame corners, dividers, bullets, section glyphs. Themes can override these to feel completely different even with similar colors. Obsidian Temple uses the same five state hues but at higher saturation against a near-black surface and adds neon glow effects via terminal-supported underlines.

Switching themes is Ctrl+Shift+T. The runtime emits a theme_changed event; every retained component clears its frame cache; the next render produces fresh ANSI for the new palette. Zero flicker, zero re-layout.

§ 10

The visual parity harness

How we test "the cathedral renders pixel-perfectly" without humans staring at terminals.

This is my favorite piece. Visual regressions in TUIs are notoriously hard to catch — a single off-by-one column or stale ANSI reset can make a perfectly correct algorithm produce a broken-looking screen. SumoCode runs three convergent verification lanes:

Lane 1

Component lane

Deterministic fixtures → ANSI. Each component (footer, sidebar row, tool pill, code block) renders in isolation against a known input. Tests assert exact ANSI output.

Lane 2

Fixture lane

A whole TranscriptViewModel fixture renders the full scene (top chrome + chat + footer + sidebar). No live Pi needed. Used for completed-response and tool/overlay states.

Lane 3

Runtime lane

./bin/sumocode.sh launches under node-pty with a fixed terminal size. Real end-to-end. Captures actual ANSI output to compare against.

All three converge into a shared verification pipeline:

flowchart LR C[Component
fixture] --> A[ANSI bytes] F[TranscriptViewModel
fixture] --> A R[Runtime PTY
capture] --> A A --> X["@xterm/headless
replay"] X --> S[Cell snapshot
JSON] S --> CD[Styled cell diff
vs Bible HTML] S --> GA[Geometry audit
vs scenarios.json] S --> PNG[DOM render →
Playwright PNG] CD --> RT[review pack
+ CI gate] GA --> RT PNG --> RT classDef src fill:#241D17,stroke:#D97706,color:#F5E6C8 classDef proc fill:#3D3024,stroke:#8B7A63,color:#F5E6C8 classDef out fill:#241D17,stroke:#7FB069,color:#F5E6C8 class C,F,R src class A,X,S,CD,GA,PNG proc class RT out

Three input lanes converge through the same xterm replay → cell-grid → diff pipeline.

The non-obvious decision: the styled-cell diff is the primary CI gate, not the PNG diff. Pixel-level PNG comparison is flaky (font rendering, sub-pixel anti-aliasing, OS color profiles). Comparing per-cell {char, fg, bg, bold, dim} against a parsed Bible HTML reference is deterministic across machines.

📐 Geometry audit

The geometry audit lane is unique to this codebase. Each row in the captured frame gets classified — top-bar, chat-frame-top, hint-row, footer, blank — and the column bounds checked against a declared geometrySpec. This catches structural drift (sidebar starting one column too late, hint row missing) that no per-cell diff would flag.

The Bible HTML files at docs/ui/bible/*.html are the canonical visual reference — hand-built mockups exported from Stitch, then promoted to source-of-truth. PNG renders of those Bible files exist as review evidence, not gates.

§ 11

Tech stack · everything that's running

96 modules, 30k lines of TypeScript, three external dependencies.

Runtime

Node.js≥20 TypeScriptstrict jitiruntime TS @mariozechner/pi-coding-agent0.70.2 @mariozechner/pi-tuiprimitives yoga-wasm-web87KB typeboxschemas

Test & harness

vitestunit node-ptyPTY integration @xterm/headlesscell replay pixelmatchPNG diff playwrightDOM render

Build & release

pnpm8.x no bundlerjiti runs TS git tagsrelease channel pi updateconsumer pull

Module shape

src/ — extension entry, cathedral chrome modules, themes, commands, slash command registry. ~7k lines.
src/sumo-tui/ — retained renderer foundation. Layout, render, input, runtime, widgets, transcript, pi-compat. ~11k lines.
src/sumo-tui/cathedral/ — Cathedral-specific retained nodes (splash tree, tool pills, message frames). Sits on top of sumo-tui primitives.
src/sumo-tui/pi-compat/ — the only place sumo-tui touches Pi internals. SumoInteractiveMode, owned-shell renderer, Pi noise filter, foreign extension warning.
test/integration/ — node-pty PTY tests for altscreen cleanup, mouse routing, signal handling, splash centering.
docs/ — ADRs, research artifacts, visual contracts, Bible HTML/PNG references.
patches/ — the 12-line Pi constructor patch.
bin/sumocode.sh — the launcher with doctor/diag/dry-run/--no-sumo-tui modes.

§ 12

Talking points for the launch

What I'd lead with, what I'd not lead with, and a tweet thread structure.

What's defensible to claim

"Built a retained terminal renderer in TypeScript" — true, ~11k lines under src/sumo-tui/. Yoga + CellBuffer + frame diff + altscreen + mouse SGR.
"~50-90% byte savings per streaming frame" — true, from the per-row column-range diff (cite OpenTUI as inspiration).
"Five preattentive state colors, <250ms recognition" — true, this is the explicit UX target.
"Three themes, one keypress to cycle" — true, Ctrl+Shift+T.
"Twelve-line patch on Pi to swap the interactive constructor" — true, default-off, runtime-gated, wrapper-checked.
"Verified with three convergent visual lanes" — true, component + fixture + runtime PTY all converge through xterm replay.

What I would not lead with

The cold-start time — currently ~700-1100ms, dominated by jiti transpile. Fix this before launch (see perf audit) or avoid the topic.
"Better than <tool X>" — SumoCode is a personal extension built on Pi. Frame it as "what's possible when you own the experience layer", not as a competitor.
Cross-platform claims — macOS only for v1. Linux/Windows is on the roadmap, not shipped.
Memory features — Remnic memory daemon is in PRD, not shipped in v0.1.0. Don't promise what's not built.

A draft tweet thread

built a Pi extension called SumoCode — a Cathedral-themed terminal AI agent UI.

retained renderer, Yoga flexbox layout, CellBuffer compositor, frame diff, mouse SGR routing. all in TypeScript on top of @mariozechner/pi-mono.

here's how it works ↓

terminals are just a 2D grid of styled cells with an event stream.

if that clicks, the rest is just React for terminals.

• Yoga (the same one in React Native) for layout
• CellBuffer = your DOM
• diffFrames = your virtual DOM diff
• ANSI escape codes = your CSS

why «retained» matters: the alternative is rebuilding every frame from scratch.

retained = keep the layout tree alive, only re-composite changed nodes.

the per-row column-range diff (borrowed from OpenTUI) saves 50-90% of bytes per streaming frame.

five preattentive state colors in the footer:

🟢 READY · 🟡 MEDITATING · 🔵 ILLUMINATING · 🔴 DEFERRING · 🟣 INSCRIBING

you can tell what the agent is doing in <250ms without looking directly at the dot. that's the whole goal.

owning the alt-screen properly is the hardest part of TUI work.

SIGINT, SIGTERM, SIGHUP, SIGTSTP, SIGCONT, uncaughtException — all five paths converge on one cleanup sequence (kitty pop, modifyOtherKeys off, paste off, mouse off, altscreen off).

your shell stays clean even when SumoCode crashes.

three convergent verification lanes for visual regressions:

• component (fixture → ANSI)
• fixture (TranscriptViewModel → full scene)
• runtime (real PTY capture)

all converge through @xterm/headless → cell-grid → diff vs Bible HTML reference. the cell diff is the CI gate, not pixels.

the public/private split:

• sumocode (this repo) = UI, MIT, public
• sumocode-config (private) = persona, memory, settings, MCP

one git pull moves my identity between machines. no tooling, no secrets in the public repo.

install:

pi install git:github.com/dhruvkelawala/sumocode

README + arch guide in the repo. v0.1.0 today; theme system, memory daemon, full visual harness landing in v0.2-v1.0 over the next eight weeks.

built for me. shared because the patterns might be useful to you.

One-line elevator

SumoCode is shadcn/ui for terminal AI agents — built on @mariozechner/pi-mono, with a retained renderer that owns the alternate screen and treats every state as a typed token.