A complete architectural walk-through, written for someone who has shipped React for years but has never written a single byte of ANSI.
Pi is the engine. SumoCode is the cathedral built on top of it.
SumoCode is a Pi extension that turns a generic terminal AI agent into a personal one — with persistent memory, a custom-built retained renderer, three themes, and five preattentive status colors.
The default Pi UX is generic — same footer, same status indicators, same look as every other Pi user gets. SumoCode owns the experience layer entirely while delegating agent loop, LLM, sessions, tools, and MCP to Pi. It's the shadcn/ui of terminal AI agents.
Pi is the engine: provider abstraction, agent loop, tool framework, sessions, MCP, skills, extension API. SumoCode adds a Node-native retained renderer on top.
This public repo (UI) pairs with a private sumocode-config repo (persona, memory, settings). Same persona on every machine. git pull = move identity.
The value proposition holds three pieces. First: identity persists. A persona file is appended to every system prompt; the agent introduces itself the same way on every session, every machine. Second: state is visible. A single colored dot in the footer tells you in <250ms whether the agent is idle, thinking, running a tool, awaiting approval, or writing to memory. Third: the renderer is real. SumoCode owns the alternate screen, mouse routing, in-app scroll, modal layers, and a Yoga-based flexbox layout — none of which Pi's default renderer offers.
Translate what you already know. Almost every concept has a web equivalent.
Before any code, here is the cheat sheet. Once these mappings click, the rest of the architecture follows naturally:
| Frontend concept | SumoCode / TUI equivalent | What it actually does |
|---|---|---|
| The browser | The terminal emulator | iTerm2, Ghostty, Alacritty — they parse ANSI escape codes the same way browsers parse HTML. |
| The DOM | CellBuffer (rows × cols of Cells) | A 2D grid where every cell holds char + fg + bg + bold/italic/dim. The "DOM" is fixed-grid, not a tree. |
| CSS | ANSI escape sequences | \x1b[38;2;217;119;6m sets foreground to #D97706. Looks gnarly, behaves like inline-style. |
| React Fiber | SumoNode tree (Yoga-backed) | Retained tree of layout nodes. Reconciles into a CellBuffer the way React reconciles into the DOM. |
| Virtual DOM diff | diffFrames(prev, next) |
Cell-by-cell diff produces patches; only changed regions get written to the terminal. Saves 50-90% of bytes. |
| Flexbox | Yoga (the literal Facebook engine) | Same Yoga that powers React Native. Compiled to WebAssembly, ~87KB, computes layout in microseconds. |
| requestAnimationFrame | FrameScheduler | Adaptive 60fps coalescing while streaming, event-driven (idle 0fps) otherwise. |
| A modal / portal | Overlay layer + altscreen | Altscreen is the terminal's "fullscreen modal" — when you exit, your shell history is back, untouched. |
| Hover / click handlers | SGR mouse reporting | The terminal emits \x1b[<0;42;7M on click; we parse it into {type:'down', row:6, col:41}. |
| localStorage / IndexedDB | ~/.sumocode/ JSONL files |
Diagnostics, session caches, crash logs. Plus sumocode-config/memory/ for cross-machine state. |
| Web Workers | Pi sub-processes (task tool) | Spawned via node-pty for parallel Pi instances. ACP protocol over stdio. |
| shadcn/ui | The closest analogy for SumoCode | You don't replace the framework, you decorate it. Pi = framework. SumoCode = the design system + components. |
Five concepts that unlock the entire codebase.
Terminals have a second screen buffer. Sending \x1b[?1049h switches into it; the user's shell history is preserved underneath. \x1b[?1049l on exit restores it. This is how full-screen TUIs (vim, htop, SumoCode) take over without nuking your scrollback.
The "CSS of terminals." A magic byte (\x1b, ESC) followed by control sequences. Examples: [2J clears screen, [H moves cursor home, [38;2;R;G;Bm sets 24-bit foreground color. Verbose, but deterministic.
Without enabling mouse mode, a mouse wheel sends arrow keys (up/down). With \x1b[?1006h, you get structured events like \x1b[<0;42;7M. SumoCode parses these into MouseEvent objects and routes them through hit-testing.
Emoji, CJK characters, and combining marks occupy 2 cells. JavaScript's "日".length === 1, but on screen it's 2 columns wide. Pi's visibleWidth() handles this; SumoCode uses Intl.Segmenter for grapheme clustering.
Modern terminals support distinguishing Ctrl+I from Tab (they're the same byte historically). SumoCode pushes the kitty flags (\x1b[>7u) on entry, pops them on exit. Without this, half your keybindings collide with Tab/Enter/Esc.
The hardest TUI bug: when your process crashes and the terminal is left in mouse-on / altscreen-on / kitty-on state. SumoCode registers signal handlers for SIGINT/SIGTERM/SIGHUP/SIGTSTP/SIGCONT and an uncaughtException hook. All four cleanup paths converge on one cleanup sequence.
Of the ~38 working days budgeted for sumo-tui, roughly half is spent on robustness — signal handlers, escape sequence cleanup, mouse SGR parsing, kitty keyboard handshakes, paste filtering, cursor visibility forcing. The "fun" rendering work is the easy part. Making it never break your shell when it crashes is the hard part.
Here's a tiny sample of what an ANSI-encoded chat row actually looks like, expanded so you can see the structure:
// What you see on screen: // ~/sumocode (main) · ↑12k ↓8k · $0.42 · ● READY · sonnet-4.5 \x1b[H // move cursor home \x1b[38;2;245;230;200m // fg = #F5E6C8 (parchment) \x1b[48;2;26;21;17m // bg = #1A1511 (cathedral bg) ~/sumocode \x1b[2m(main)\x1b[22m · ↑12k ↓8k · $0.42 · \x1b[38;2;127;176;105m●\x1b[0m READY · sonnet-4.5 \x1b[K // clear to end of line
From your fingertips down to the silicon. Read top-to-bottom.
Splash, footer, sidebar, top chrome, working indicator, themes. The pieces a user sees and forms an opinion about.
The Pi extension entry point and its install* functions: question tool, answer wizard, approval modal, native task, command palette, slash commands, memory editor.
Yoga layout tree, CellBuffer compositor, frame diff, ANSI writer, mouse SGR parser, key router, frame scheduler, ChatPager scroll widget, modal layer, owned-shell renderer.
dist/main.js swaps Pi's InteractiveMode for SumoInteractiveMode when SUMO_TUI=1. Loaded via jiti at the boundary.LLM provider abstraction, agent loop, tool execution (bash/read/write/edit/mcp), session management, MCP server gateway, skills system, extension API.
The terminal emulator (Ghostty, iTerm2, Alacritty), Node.js runtime (≥20), and the OS (macOS for v1). Altscreen, ANSI parser, raw stdin/stdout, signal delivery, fork+exec for git/bash.
Each layer only knows about the layer immediately below. The Cathedral layer doesn't know how Yoga works; it just registers components with ctx.ui.setFooter(...). SumoTUI doesn't know how Pi's tools work; it just renders ChatBlock objects from a transcript view-model. This separation is what makes the codebase scalable despite spanning 96 modules.
Six stages from a state mutation to bytes on the wire.
SumoTUI is a retained renderer. That word matters: the alternative ("immediate mode") is what Pi originally used — every render rebuilds the entire frame from scratch, every time. Retained means we keep a tree of nodes around between frames and only re-layout / re-composite the parts that change.
Here's what happens when, say, the working indicator ticks one frame forward:
requestRender(). The frame scheduler enqueues a dirty token.root.calculateLayout(width, height, LTR) resolves flex sizes for the entire tree.composite() walks the tree depth-first, painting each node's cells into a fresh CellBuffer.diffFrames(prev, next) finds changed cell ranges row-by-row. Stable cells produce zero output.Every visible character on screen is one Cell. A cell holds:
interface Cell { char: string; // "A", "日" (2-wide), or "" (continuation) fg: string | null; // "#F5E6C8" bg: string | null; // "#1A1511" attrs: { bold: boolean; italic: boolean; underline: boolean; dim: boolean; inverse: boolean; } } // CellBuffer is rows × cols of these class CellBuffer { private chars: Uint16Array; // hot path uses typed arrays private fg: Map<number, string>; // sparse: most cells share fg/bg private bg: Map<number, string>; private attrs: Map<number, number>; // packed bitfield }
The optimization that matters: most cells in a frame are blank or share the same style as their neighbors. Storing fg/bg/attrs in sparse Maps instead of dense arrays cuts memory by 90% in typical frames. The diff algorithm then walks rows in linear time, finding the leftmost and rightmost differing column per row, and emits the smallest possible ANSI patch.
The per-row column-range diff is borrowed from OpenTUI's renderer.zig (lines 1331-1349). On a streaming chat update where only the bottom row changes, this saves 50-90% of bytes per frame compared to full-row repaints. Cursor blinks no longer cost a screen-wide repaint.
Yoga is Facebook's flexbox engine — the same one that powers React Native, Litho, and ComponentKit. SumoCode uses yoga-wasm-web: 87KB of WASM, no native build step, identical layout semantics to web flex. The retained tree isn't custom; it's CSS flex with terminal-cell units.
This means the splash screen's "vertical center" isn't padding math. It's:
Root(flexDirection: column, flexGrow: 1) ├─ TopSpacer(flexGrow: 1) // fills available space ├─ Splash(flexShrink: 0) // fixed: cat + wordmark + quote └─ BottomSpacer(flexGrow: 1) // fills available space // Yoga splits free rows 50/50 between the spacers. // Resize the terminal? Layout recomputes for free.
Following one process from $ sumocode to a rendered cathedral.
Six phases, ~700-1100ms cold start. Each phase has its own actor:
The jiti transpile step (~300ms in the cold path) is the single largest cost. It exists because sumo-interactive-mode.js bridges Pi's CommonJS-loaded patch into our TypeScript source on the fly. Pre-compiling that entry point into a real JS bundle would cut cold start by half — already filed as a P0 in the perf audit.
Phases in human terms:
dist/main.js for the patch marker, sets SUMO_TUI_MODULE to a file:// URL pointing at our bridge, then execs Pi.main.js reaches the constructor site, sees SUMO_TUI=1, dynamically imports our bridge, and instantiates SumoInteractiveMode instead of InteractiveMode.src/extension.ts, which calls 14 install* functions. Each registers handlers but defers all DOM-equivalent work to session_start.session_start; the cascade of handlers populates widgets; Yoga lays out; the compositor produces the first CellBuffer; diff produces patches; ANSI hits stdout; user sees the cat.The interesting part: how Pi's built-in tools coexist with SumoCode's overrides.
Pi exposes its tool system through three different surfaces, and SumoCode has to integrate with each one differently. This is the table that took me longest to internalize:
| Tool layer | Examples | How SumoCode interacts |
|---|---|---|
| Pi built-ins | bash · read · write · edit · mcp | Never re-register. Intercept via pi.on("tool_call") for approval gating; render results via the transcript view-model pipeline. |
| Pi example exts | question | Override. Register a tool with the same name; SumoCode's wins. Our question tool maps to the Divine Query overlay. |
| SumoCode-only | task · /answer | Register fresh. Native task tool spawns Pi sub-processes for parallel work. /answer is a wizard for multi-question flows. |
| Pi internal UI | showExtensionSelector · showExtensionConfirm | Cannot intercept without upstream Pi changes. SumoCode-owned code calls our themed overlays directly instead. |
Every chat message — user, assistant, tool, skill, delegation — flows through one shared view-model before any rendering happens:
type ChatBlock = | { type: "markdown"; text: string } | { type: "code"; lang: string; source: string } | { type: "tool"; tool: ToolCallViewModel } | { type: "skill"; name: string; expanded: boolean } | { type: "question"; question: QuestionViewModel } | { type: "delegation"; delegation: DelegationViewModel }; type ChatMessageViewModel = { id: string; role: "user" | "sumo" | "system"; blocks: ChatBlock[]; };
This abstraction is the lever that makes everything downstream possible. The visual harness can build deterministic transcripts without running an LLM. The chat renderer can switch on block type without parsing strings. New block types (like delegation pills for sub-process scrolls) ship as additions to the union — no renderer changes needed elsewhere.
The extension entry point is intentionally boring — it's just an ordered list of installations. Order matters: render diagnostics installs first so it can wrap every later setFooter/setWidget call; session cache installs second so its invalidation runs alongside producer updates.
export default function sumocode(pi: ExtensionAPI): void { installRenderDiagnostics(pi); // 01: wrap UI calls installSessionCache(pi); // 02: cache token tally + git branch installAltscreen(pi); // 03: lifecycle + signal cleanup installTopChrome(pi); // 04: top header bar installSplash(pi); // 05: cathedral splash widget installFooter(pi); // 06: status footer installCathedralEditor(pi); // 07: input frame chrome installInputHints(pi); // 08: keybind hint row installApprovalGate(pi); // 09: dangerous bash guard taskTool({...})(pi); // 10: native parallel task installQuestionTool(pi); // 11: divine query override installAnswerTool(pi); // 12: /answer wizard installWorkingIndicator(pi); // 13: theme-aware spinner installSumoInteractions(pi); // 14: slash commands + shortcuts }
The smallest possible patch on the smallest possible surface, treated as a maintenance contract.
Pi is a public, maintained npm package. SumoCode is a Pi extension — but extensions only get to register components, not replace Pi's interactive constructor. To own the alternate screen lifecycle, mouse routing, scroll, and modal layers, we need to be the InteractiveMode that Pi instantiates.
The fix is a 12-line patch on Pi's dist/main.js that swaps the constructor when an env var is set:
-const interactiveMode = new InteractiveMode(runtime, options); +const useSumoTui = isTruthy(process.env.SUMO_TUI) || parsed.unknownFlags.has("sumo-tui"); +const interactiveMode = useSumoTui ? await loadSumoInteractiveMode(runtime, options) : new InteractiveMode(runtime, options); async function loadSumoInteractiveMode(...args) { const spec = process.env.SUMO_TUI_MODULE ?? "@dhruvkelawala/sumocode/sumo-interactive-mode"; const { SumoInteractiveMode } = await import(spec); return new SumoInteractiveMode(...args); }
The contract is explicit:
SUMO_TUI=1, Pi behaves exactly like vanilla Pi. Other Pi users are unaffected.--no-sumo-tui.sumocode shell launcher inspects the patch marker before activating. Patch missing? Falls back to legacy Pi UI with a warning.docs/research/pi-fork-upgrade.md).This isn't ideology — it's pragmatism. opentui-island's sidecar architecture would have cost ~500ms cold start and ~400MB RSS for our four chrome regions. Forking Pi entirely would have meant re-implementing the LLM/agent/tool/MCP surface. The 12-line patch is the smallest mutation that gets us where we need to be.
Color as information density. Color as identity.
Preattentive processing is the visual-perception term for "things you notice before you decide to look." Cone-density-aware research shows you can disambiguate ~5 hues simultaneously in your peripheral vision. SumoCode picks five and assigns one agent state to each:
Idle. Awaiting input. Sage green — calm, low-chroma, doesn't pull focus.
LLM is generating. Warm gold — active, inviting, doesn't read as alarm.
A tool is running. Mid-saturation blue — distinct from gold/green at a glance.
Approval needed. Crimson — the only desaturated red on the surface; hijacks attention.
Writing to memory. Soft purple — rare, signals "long-term effect on the agent."
The dot lives in the footer's right zone. The state name (uppercase, Cathedral verb) appears next to it. Both are theme-driven — switching to Obsidian Temple swaps colors but keeps the semantics identical.
The default theme is named for its visual reference: a 19th-century scriptorium. Warm walnut surfaces, parchment foreground, burnt-orange accent. Every color is a typed token in src/themes/cathedral.ts:
export const CATHEDRAL_THEME: Theme = { name: "cathedral", tokens: { colors: { background: "#1A1511", // walnut deep surface: "#241D17", // walnut mid (sidebar bg) foreground: "#F5E6C8", // parchment foregroundDim: "#8B7A63", // muted brown for dim text accent: "#D97706", // burnt orange — single accent states: { idle: "#7FB069", thinking: "#E8B339", tool: "#5B9BD5", approval: "#C1443E", learning: "#8E7AB5" }, } }, workingIndicator: { frames: ["◌", "✦", "❖", "✺", "❋", "❉"], intervalMs: 150 }, chrome: { ...DEFAULT_CHROME }, // box-drawing glyphs };
The chrome object holds the structural vocabulary — frame corners, dividers, bullets, section glyphs. Themes can override these to feel completely different even with similar colors. Obsidian Temple uses the same five state hues but at higher saturation against a near-black surface and adds neon glow effects via terminal-supported underlines.
Switching themes is Ctrl+Shift+T. The runtime emits a theme_changed event; every retained component clears its frame cache; the next render produces fresh ANSI for the new palette. Zero flicker, zero re-layout.
How we test "the cathedral renders pixel-perfectly" without humans staring at terminals.
This is my favorite piece. Visual regressions in TUIs are notoriously hard to catch — a single off-by-one column or stale ANSI reset can make a perfectly correct algorithm produce a broken-looking screen. SumoCode runs three convergent verification lanes:
Deterministic fixtures → ANSI. Each component (footer, sidebar row, tool pill, code block) renders in isolation against a known input. Tests assert exact ANSI output.
A whole TranscriptViewModel fixture renders the full scene (top chrome + chat + footer + sidebar). No live Pi needed. Used for completed-response and tool/overlay states.
./bin/sumocode.sh launches under node-pty with a fixed terminal size. Real end-to-end. Captures actual ANSI output to compare against.
All three converge into a shared verification pipeline:
The non-obvious decision: the styled-cell diff is the primary CI gate, not the PNG diff. Pixel-level PNG comparison is flaky (font rendering, sub-pixel anti-aliasing, OS color profiles). Comparing per-cell {char, fg, bg, bold, dim} against a parsed Bible HTML reference is deterministic across machines.
The geometry audit lane is unique to this codebase. Each row in the captured frame gets classified — top-bar, chat-frame-top, hint-row, footer, blank — and the column bounds checked against a declared geometrySpec. This catches structural drift (sidebar starting one column too late, hint row missing) that no per-cell diff would flag.
The Bible HTML files at docs/ui/bible/*.html are the canonical visual reference — hand-built mockups exported from Stitch, then promoted to source-of-truth. PNG renders of those Bible files exist as review evidence, not gates.
96 modules, 30k lines of TypeScript, three external dependencies.
What I'd lead with, what I'd not lead with, and a tweet thread structure.
src/sumo-tui/. Yoga + CellBuffer + frame diff + altscreen + mouse SGR.Ctrl+Shift+T.@mariozechner/pi-mono.
sumocode (this repo) = UI, MIT, publicsumocode-config (private) = persona, memory, settings, MCP
git pull moves my identity between machines. no tooling, no secrets in the public repo.
pi install git:github.com/dhruvkelawala/sumocode
SumoCode is shadcn/ui for terminal AI agents — built on @mariozechner/pi-mono, with a retained renderer that owns the alternate screen and treats every state as a typed token.
Install in any Pi-enabled terminal. Defaults to the Cathedral theme. Ctrl+Shift+T cycles. Ctrl+/ opens the command palette.