What is a Coding Agent?

A coding agent is an autonomous system that can understand a task, explore a codebase, make changes, and verify its work - with minimal human intervention.


The Core Idea

A coding agent is not autocomplete. It's not a chatbot that answers questions about code. It's an autonomous system that:

  1. Receives a task - "Add user authentication to the API"
  2. Explores - Reads files, understands structure, finds relevant code
  3. Plans - Decides what changes to make
  4. Acts - Edits files, creates new files, runs commands
  5. Verifies - Tests changes, checks for errors
  6. Iterates - Fixes issues, refines until done

The key word is autonomous. You give it a task, it does the work. You review the output, not every step.


What Makes It "Coding"

Not all AI agents are coding agents. A coding agent specifically:

Capability Why It's Essential
Read code Must understand existing codebase
Write code Must produce syntactically correct changes
Navigate structure Must find relevant files in large repos
Run commands Must execute tests, builds, scripts
Understand errors Must interpret compiler/runtime errors
Edit precisely Must change specific lines without breaking context

A general-purpose chatbot can discuss code. A coding agent can change code.


What Makes It "Agent"

The "agent" part means:

1. Tool Use

The agent has tools it can invoke:

read_file(path) → contents
edit_file(path, old, new) → success
run_command(cmd) → output
search_code(pattern) → matches

2. Autonomous Loop

The agent runs a loop until the task is complete:

while not done:
    observe current state
    decide next action
    execute action
    evaluate result

3. Goal-Directed

The agent works toward completing a task, not just responding to prompts. It maintains intent across multiple steps.

4. Self-Correcting

When something fails, the agent tries to fix it:

edit file → run tests → tests fail → read error → fix edit → run tests → pass

The Simplest Coding Agent

At minimum, a coding agent needs:

┌─────────────────────────────────────────────┐
│                CODING AGENT                  │
│                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐ │
│  │  LLM    │───▶│  Tools  │───▶│  Files  │ │
│  │         │◀───│         │◀───│         │ │
│  └─────────┘    └─────────┘    └─────────┘ │
│       │                                     │
│       ▼                                     │
│  ┌─────────┐                               │
│  │ System  │                               │
│  │ Prompt  │                               │
│  └─────────┘                               │
└─────────────────────────────────────────────┘

Components:

  1. LLM - The brain (Claude, GPT, etc.)
  2. Tools - Actions the agent can take
  3. Files - The codebase being modified
  4. System Prompt - Instructions for how to behave

That's it. Everything else is optimization.


The Agent Loop

Every coding agent runs some version of this loop:

┌──────────────────────────────────────────────┐
│                                              │
│    ┌─────────┐                              │
│    │  USER   │                              │
│    │  TASK   │                              │
│    └────┬────┘                              │
│         │                                    │
│         ▼                                    │
│    ┌─────────┐      ┌─────────┐            │
│ ┌─▶│  THINK  │─────▶│   ACT   │            │
│ │  └─────────┘      └────┬────┘            │
│ │                        │                  │
│ │                        ▼                  │
│ │                   ┌─────────┐            │
│ │                   │ OBSERVE │            │
│ │                   └────┬────┘            │
│ │                        │                  │
│ │       ┌────────────────┼────────────┐    │
│ │       │                │            │    │
│ │       ▼                ▼            ▼    │
│ │  ┌─────────┐     ┌─────────┐  ┌───────┐ │
│ └──│  MORE   │     │  ERROR  │  │ DONE  │ │
│    │  WORK   │     │  RETRY  │  │       │ │
│    └─────────┘     └─────────┘  └───────┘ │
│                                            │
└──────────────────────────────────────────────┘

States:

  • Think: LLM decides what to do next
  • Act: Execute a tool (read file, edit, run command)
  • Observe: See the result
  • More Work: Task not complete, continue
  • Error Retry: Something failed, try to fix
  • Done: Task complete, stop

Core Tools

Every coding agent needs these tools:

File Reading

Read(path) → file contents

The agent must see code to understand it.

File Editing

Edit(path, old_text, new_text) → success/failure

The agent must change code precisely. Usually diff-based, not full rewrites.

File Creation

Write(path, content) → success/failure

Sometimes new files are needed.

Code Search

Glob(pattern) → matching files
Grep(pattern) → matching lines

The agent must find relevant code in large repos.

Command Execution

Bash(command) → output

The agent must run tests, builds, scripts.

These five capabilities cover most coding tasks.


What Makes Agents Hard

Building a coding agent is easy. Building a good coding agent is hard.

Problem 1: Context Limits

LLMs have finite context windows. Codebases have millions of lines.

Context window: 200,000 tokens
Average codebase: 1,000,000+ tokens
Problem: Can't see everything at once

Solutions:

  • Smart file selection (only load relevant files)
  • Summarization (compress what you've seen)
  • Handoff (extract key context when switching tasks)

Problem 2: Knowing When You're Wrong

LLMs confidently report success on failed tasks.

Agent: "Done! I added the login button."
Reality: Edit failed silently, file unchanged.

Solutions:

  • Verification steps (run tests, check file)
  • Course correction (separate model reviews work)
  • Human checkpoints (approve before critical actions)

Problem 3: Precise Edits

LLMs struggle with exact string matching.

Task: Change "color" to "colour" on line 47
Risk: Agent changes wrong occurrence, breaks code

Solutions:

  • Diff-based editing (specify exact old text)
  • Larger context (include surrounding lines)
  • Verification (read file after edit to confirm)

Problem 4: Long Tasks

Complex tasks exceed single conversation limits.

Task: "Migrate codebase from React to Vue"
Reality: Hundreds of files, days of work

Solutions:

  • Task decomposition (break into subtasks)
  • Persistent memory (remember progress across sessions)
  • Handoff (compress context, continue later)

Production vs Toy Agents

The difference between a demo and production:

Aspect Toy Agent Production Agent
Context Fits in one prompt Manages 1M+ token codebases
Errors Crashes or hallucinates Recovers and retries
Verification Trusts itself Verifies its work
Memory Forgets between sessions Remembers project rules
Safety Runs anything Sandboxed, permission-gated
Feedback None Learns from corrections

This guide focuses on production-grade patterns.


The Thesis

Why do companies like Anthropic and Reflection AI believe coding agents are the path to AGI?

1. Coding is Verifiable

Unlike essays or conversations, code either works or doesn't. Clear feedback signal.

2. Coding is Iterative

Write → run → see error → fix → repeat. Tight feedback loops.

3. Coding is Self-Improving

An AI that can write code can improve the AI that writes code.

4. Coding is Universal

Software encodes solutions to arbitrary problems. Master coding, master problem-solving.

"We think that autonomous coding is AGI complete. So if you show that you have a super intelligent software developer, then that's all it takes, that's an AGI." — Ioannis Antonoglou, Reflection AI


What You'll Build

By the end of this guide, you'll understand how to build a coding agent that can:

  • Take a natural language task
  • Explore a codebase to find relevant files
  • Make precise edits without breaking things
  • Run commands and interpret results
  • Recover from errors
  • Remember project-specific rules
  • Know when it's done (and when it's not)

Not a toy demo. A real agent that does real work.


Next

02-architecture-overview.md - The components and how they connect