Chapter 5: File Search

Searching for files and code is one of the most frequent operations a coding agent performs. How agents implement search reveals fundamental design choices about tool philosophy, dependency management, and trust models.

The Core Question

Should the agent provide a built-in search tool with controlled input/output, or let the model run shell commands directly?

Approach Agent Pros Cons
Shell-delegated Codex Flexible, model controls everything Depends on system tools, raw output
Built-in tool Claude Code Portable, token-optimized, permissioned Fixed interface, less flexible

Codex: Shell-Delegated Search

Codex has no built-in grep or glob tool exposed to the model. Search is done through shell commands.

How It Works

The system prompt (default.md, line 264) instructs:

"When searching for text or files, prefer using rg or rg --files respectively because rg is much faster than alternatives like grep. (If the rg command is not found, then use alternatives.)"

The model then generates shell commands like:

rg "pattern" --type py           # content search
rg --files | grep "pattern"      # file search
find . -name "*.ts"              # fallback if rg not found
grep -r "pattern" src/           # another fallback

These run through the shell tool → kernel sandbox → system-installed rg.

What Codex Does Have

  1. list_dir tool — Directory listing with pagination (offset, limit, depth). Returns formatted file tree. Not pattern-based — just browsing.

  2. codex-file-search crate — A fuzzy file finder for the TUI file picker:

    • Uses nucleo (fzf-like fuzzy matcher from the Helix editor)
    • Multi-threaded directory walking via the ignore crate
    • Streaming: walk → inject into matcher → report results
    • Not exposed as a model tool — only used by the TUI
  3. Abandoned grep_files tool — Test file exists (grep_files_tests.rs) with run_rg_search() function and rg_available() checks. No implementation file. This was a planned built-in grep tool that was never shipped or was removed.

Ripgrep Dependency

Codex does not bundle ripgrep. It depends entirely on the system having rg installed. The ignore crate (v0.4.23) — the same Rust library that ripgrep is built on — is used for the TUI file picker's directory walking, but that's a library dependency, not the rg binary.

If rg is not installed:

  1. The model runs rg → gets "command not found" error
  2. The prompt says "use alternatives" → model falls back to grep or find
  3. No pre-flight check — Codex doesn't verify rg exists before the model tries it

Code Location

Claude Code: Built-In Search Tools

Claude Code wraps ripgrep into two purpose-built tools with rich parameters and token-optimized output.

GlobTool — File Name Search

Finds files by pattern using rg --files --glob.

Parameters:

Output:

{
  "filenames": ["src/App.tsx", "src/index.tsx", ...],
  "numFiles": 42,
  "truncated": false
}

GrepTool — Content Search

Full ripgrep wrapper with three output modes.

Parameters:

Output (content mode):

src/App.tsx:14: const [x, setX] = useState(0)
src/App.tsx:28: const [y, setY] = useState("")

Output (files_with_matches mode):

{
  "filenames": ["src/App.tsx", "src/utils.ts"],
  "numFiles": 2
}

Ripgrep Binary Management

Claude Code ships its own ripgrep in three tiers (tried in order):

  1. Embedded — In official Bun builds, ripgrep is compiled into the Bun binary. Claude Code spawns itself with argv0='rg' to dispatch to the embedded binary.

  2. Vendored — Pre-compiled binaries at vendor/ripgrep/{arch}-{platform}/rg (e.g., aarch64-darwin/rg). Includes macOS codesigning handling.

  3. System — User's installed rg. Only used if USE_BUILTIN_RIPGREP=false.

Error Handling

System Prompt Enforcement

ALWAYS use Grep for search tasks. NEVER invoke `grep` or `rg` as a Bash command.
The Grep tool has been optimized for correct permissions and access.

Code Location

Indexing

Neither agent builds a project index. Both search on-the-fly every time via fresh filesystem traversal.

Codex Claude Code
File index at startup? No No
AST / symbol index? No No (delegates to LSP servers lazily)
Search database? No No
In-memory index? TUI file picker only (nucleo, ephemeral) File-read cache (max 1000 entries, mtime-invalidated)
Project scanning at startup? Nothing File count only (for telemetry, rounded to power of 10)

Why No Indexing?

Both agents make the same implicit bet: fast-enough linear search beats maintaining an index for the interactive coding agent use case.

LSP as Lazy Indexing (Claude Code)

Claude Code's LSPTool provides on-demand code intelligence via Language Server Protocol:

Comparison Table

Aspect Codex Claude Code
Search approach Shell commands (rg, grep, find) Built-in GlobTool + GrepTool
Bundles ripgrep? No Yes (embedded or vendored)
Depends on system rg? Yes No
Output optimization None — raw shell output Relative paths, line cap, pagination
Timeout handling Shell timeout (if configured) 20s with escalation, EAGAIN retry
Permission checks Via kernel sandbox Per-tool checkReadPermission()
Flexibility Full shell — pipe, chain, combine Fixed parameters — can't pipe
Indexing None None (LSP for semantic queries)