Chapter 5: File Search
Searching for files and code is one of the most frequent operations a coding agent performs. How agents implement search reveals fundamental design choices about tool philosophy, dependency management, and trust models.
The Core Question
Should the agent provide a built-in search tool with controlled input/output, or let the model run shell commands directly?
| Approach | Agent | Pros | Cons |
|---|---|---|---|
| Shell-delegated | Codex | Flexible, model controls everything | Depends on system tools, raw output |
| Built-in tool | Claude Code | Portable, token-optimized, permissioned | Fixed interface, less flexible |
Codex: Shell-Delegated Search
Codex has no built-in grep or glob tool exposed to the model. Search is done through shell commands.
How It Works
The system prompt (default.md, line 264) instructs:
"When searching for text or files, prefer using
rgorrg --filesrespectively becausergis much faster than alternatives likegrep. (If thergcommand is not found, then use alternatives.)"
The model then generates shell commands like:
rg "pattern" --type py # content search
rg --files | grep "pattern" # file search
find . -name "*.ts" # fallback if rg not found
grep -r "pattern" src/ # another fallback
These run through the shell tool → kernel sandbox → system-installed rg.
What Codex Does Have
-
list_dirtool — Directory listing with pagination (offset, limit, depth). Returns formatted file tree. Not pattern-based — just browsing. -
codex-file-searchcrate — A fuzzy file finder for the TUI file picker:- Uses
nucleo(fzf-like fuzzy matcher from the Helix editor) - Multi-threaded directory walking via the
ignorecrate - Streaming: walk → inject into matcher → report results
- Not exposed as a model tool — only used by the TUI
- Uses
-
Abandoned
grep_filestool — Test file exists (grep_files_tests.rs) withrun_rg_search()function andrg_available()checks. No implementation file. This was a planned built-in grep tool that was never shipped or was removed.
Ripgrep Dependency
Codex does not bundle ripgrep. It depends entirely on the system having rg installed. The ignore crate (v0.4.23) — the same Rust library that ripgrep is built on — is used for the TUI file picker's directory walking, but that's a library dependency, not the rg binary.
If rg is not installed:
- The model runs
rg→ gets "command not found" error - The prompt says "use alternatives" → model falls back to
greporfind - No pre-flight check — Codex doesn't verify
rgexists before the model tries it
Code Location
- File search crate:
codex-rs/file-search/src/lib.rs - List dir handler:
codex-rs/core/src/tools/handlers/list_dir.rs - Shell handler:
codex-rs/core/src/tools/handlers/shell.rs - Abandoned grep tests:
codex-rs/core/src/tools/handlers/grep_files_tests.rs
Claude Code: Built-In Search Tools
Claude Code wraps ripgrep into two purpose-built tools with rich parameters and token-optimized output.
GlobTool — File Name Search
Finds files by pattern using rg --files --glob.
Parameters:
pattern(required): Glob pattern (e.g.,**/*.tsx,src/**/*.ts)path(optional): Directory to search in (defaults to CWD)
Output:
{
"filenames": ["src/App.tsx", "src/index.tsx", ...],
"numFiles": 42,
"truncated": false
}
- Results sorted by modification time (newest first)
- Default limit: 100 files
- Paths converted to relative (saves tokens)
GrepTool — Content Search
Full ripgrep wrapper with three output modes.
Parameters:
pattern(required): Regex patternpath(optional): File or directory to searchglob(optional): File filter (e.g.,*.js,**/*.tsx)type(optional): File type filter (e.g.,js,py,rust)output_mode:content|files_with_matches(default) |count-A,-B,-C: Context lines after/before/around matches-i: Case-insensitive-n: Line numbers (default true for content mode)multiline: Enable cross-line matching (rg -U --multiline-dotall)head_limit: Max results (default 250, pass 0 for unlimited)offset: Skip first N results (pagination)
Output (content mode):
src/App.tsx:14: const [x, setX] = useState(0)
src/App.tsx:28: const [y, setY] = useState("")
Output (files_with_matches mode):
{
"filenames": ["src/App.tsx", "src/utils.ts"],
"numFiles": 2
}
Ripgrep Binary Management
Claude Code ships its own ripgrep in three tiers (tried in order):
-
Embedded — In official Bun builds, ripgrep is compiled into the Bun binary. Claude Code spawns itself with
argv0='rg'to dispatch to the embedded binary. -
Vendored — Pre-compiled binaries at
vendor/ripgrep/{arch}-{platform}/rg(e.g.,aarch64-darwin/rg). Includes macOS codesigning handling. -
System — User's installed
rg. Only used ifUSE_BUILTIN_RIPGREP=false.
Error Handling
- Timeout: 20 seconds default (60s on WSL). SIGTERM → wait 5s → SIGKILL escalation.
- EAGAIN retry: If resource-constrained (Docker, CI), ripgrep fails with "Resource temporarily unavailable." Auto-retries with
-j 1(single-threaded mode). - Max buffer: 20MB for stdout/stderr (handles large monorepos with 200k+ files).
- Line length cap: 500 characters per line (prevents base64/minified content bloat).
System Prompt Enforcement
ALWAYS use Grep for search tasks. NEVER invoke `grep` or `rg` as a Bash command.
The Grep tool has been optimized for correct permissions and access.
Code Location
- GlobTool:
src/tools/GlobTool/ - GrepTool:
src/tools/GrepTool/ - Ripgrep configuration:
src/utils/ripgrep.ts - Glob utility:
src/utils/glob.ts
Indexing
Neither agent builds a project index. Both search on-the-fly every time via fresh filesystem traversal.
| Codex | Claude Code | |
|---|---|---|
| File index at startup? | No | No |
| AST / symbol index? | No | No (delegates to LSP servers lazily) |
| Search database? | No | No |
| In-memory index? | TUI file picker only (nucleo, ephemeral) | File-read cache (max 1000 entries, mtime-invalidated) |
| Project scanning at startup? | Nothing | File count only (for telemetry, rounded to power of 10) |
Why No Indexing?
Both agents make the same implicit bet: fast-enough linear search beats maintaining an index for the interactive coding agent use case.
- ripgrep is fast — searches at ~2GB/s on modern SSDs. Most repos take <1 second.
- Indexing is expensive — building a symbol index delays startup by seconds to minutes.
- Indexes get stale — files change during the session; index must be maintained.
- Context window is the bottleneck — even with an index, the model can only process ~200K tokens per turn.
LSP as Lazy Indexing (Claude Code)
Claude Code's LSPTool provides on-demand code intelligence via Language Server Protocol:
goToDefinition,findReferences,hover,documentSymbol,workspaceSymbol- LSP servers are started lazily (only when first needed for a file type)
- Each server manages its own internal index — Claude Code doesn't build or manage it
- Not a replacement for grep — used for semantic queries, not text search
Comparison Table
| Aspect | Codex | Claude Code |
|---|---|---|
| Search approach | Shell commands (rg, grep, find) |
Built-in GlobTool + GrepTool |
| Bundles ripgrep? | No | Yes (embedded or vendored) |
Depends on system rg? |
Yes | No |
| Output optimization | None — raw shell output | Relative paths, line cap, pagination |
| Timeout handling | Shell timeout (if configured) | 20s with escalation, EAGAIN retry |
| Permission checks | Via kernel sandbox | Per-tool checkReadPermission() |
| Flexibility | Full shell — pipe, chain, combine | Fixed parameters — can't pipe |
| Indexing | None | None (LSP for semantic queries) |