Autonomous Agent Patterns Toolkit

Overview

The Autonomous Agent Patterns Toolkit provides a comprehensive collection of design patterns for building self-directed coding agents. These agents operate through iterative reasoning loops -- observing their environment, deciding on actions, executing tools, and learning from results -- all without constant human supervision. Inspired by production systems like Claude Code, Cline, and OpenAI Codex, this toolkit codifies the architecture, tool design, permission systems, browser automation, and context management patterns that make autonomous agents reliable and safe.

The difference between a demo agent and a production agent lies entirely in patterns: how you handle errors, enforce permissions, manage context windows, sandbox execution, and recover from failures. This toolkit covers all of those areas with implementation-ready code.

When to Use

Building autonomous coding agents that read, write, and test code without step-by-step human guidance
Designing tool/function calling interfaces that LLMs can use reliably
Implementing permission and approval systems for risky operations (file writes, shell commands, deployments)
Creating browser automation for agents that need to interact with web interfaces
Designing human-in-the-loop workflows where agents operate independently but escalate when uncertain
Building MCP (Model Context Protocol) servers to extend agent capabilities dynamically
Implementing checkpoint and resume functionality for long-running agent tasks

Quick Start


# Set up a minimal agent project
mkdir -p agent-toolkit/{core,tools,permissions,browser,context}
cd agent-toolkit

# Python setup
python -m venv .venv && source .venv/bin/activate
pip install anthropic openai pydantic playwright

# Or TypeScript
npm init -y
npm install @anthropic-ai/sdk openai zod playwright


# core/agent.py - Minimal agent loop
import json
from anthropic import Anthropic

class MinimalAgent:
    """Think-Decide-Act-Observe loop in ~40 lines."""

    def __init__(self, client: Anthropic, tools: list, max_steps: int = 50):
        self.client = client
        self.tools = {t["name"]: t for t in tools}
        self.tool_handlers = {}
        self.max_steps = max_steps
        self.messages = []

    def register_handler(self, name: str, fn):
        self.tool_handlers[name] = fn

    def run(self, task: str) -> str:
        self.messages = [{"role": "user", "content": task}]

        for step in range(self.max_steps):
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                tools=list(self.tools.values()),
                messages=self.messages,
            )

            # Collect assistant content
            self.messages.append({"role": "assistant", "content": response.content})

            # If no tool use, agent is done
            tool_uses = [b for b in response.content if b.type == "tool_use"]
            if not tool_uses:
                text_blocks = [b.text for b in response.content if hasattr(b, "text")]
                return "\n".join(text_blocks)

            # Execute each tool call
            results = []
            for tool_use in tool_uses:
                handler = self.tool_handlers.get(tool_use.name)
                try:
                    output = handler(**tool_use.input) if handler else "Unknown tool"
                    results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": str(output),
                    })
                except Exception as e:
                    results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": f"Error: {e}",
                        "is_error": True,
                    })

            self.messages.append({"role": "user", "content": results})

        return "Max steps reached without completion"

Core Concepts

1. The Agent Loop (Think-Decide-Act-Observe)

Every autonomous agent follows the same fundamental cycle. The agent receives a task, reasons about it, decides which tool to call, executes the tool, observes the result, and loops until the task is complete or a limit is hit.

  +----------+     +----------+     +---------+
  |  THINK   |---->|  DECIDE  |---->|   ACT   |
  | (Reason) |     |  (Plan)  |     |(Execute)|
  +----------+     +----------+     +---------+
       ^                                 |
       |           +----------+          |
       +-----------| OBSERVE  |<---------+
                   | (Result) |
                   +----------+

The key design decisions within the loop:


# core/loop.py - Production-grade agent loop with multi-model support
from enum import Enum
from dataclasses import dataclass, field
from typing import Any

class TaskComplexity(Enum):
    SIMPLE = "simple"      # Classification, extraction, short Q&A
    MODERATE = "moderate"   # Code edits, multi-step reasoning
    COMPLEX = "complex"     # Architecture, debugging, multi-file refactors

@dataclass
class AgentConfig:
    max_iterations: int = 50
    models: dict = field(default_factory=lambda: {
        TaskComplexity.SIMPLE: "claude-haiku-4",
        TaskComplexity.MODERATE: "claude-sonnet-4-20250514",
        TaskComplexity.COMPLEX: "claude-sonnet-4-20250514",
    })
    token_budget: int = 100_000
    retry_on_error: bool = True
    max_retries_per_tool: int = 3

class ProductionAgentLoop:
    def __init__(self, config: AgentConfig):
        self.config = config
        self.total_tokens = 0
        self.tool_error_counts: dict[str, int] = {}

    def select_model(self, complexity: TaskComplexity) -> str:
        return self.config.models[complexity]

    def should_continue(self, iteration: int) -> bool:
        if iteration >= self.config.max_iterations:
            return False
        if self.total_tokens >= self.config.token_budget:
            return False
        return True

    def handle_tool_error(self, tool_name: str, error: Exception) -> str:
        count = self.tool_error_counts.get(tool_name, 0) + 1
        self.tool_error_counts[tool_name] = count

        if count >= self.config.max_retries_per_tool:
            return f"Tool '{tool_name}' failed {count} times. Skipping. Last error: {error}"
        return f"Error (attempt {count}/{self.config.max_retries_per_tool}): {error}"

2. Tool Design Patterns

The quality of your tool schemas directly determines agent reliability. The LLM never sees your implementation -- it only sees the JSON Schema descriptions. A perfectly coded tool with a vague description will fail; a simple tool with crystal-clear documentation will succeed.


# tools/base.py - Tool design with validation
from pydantic import BaseModel, Field
from typing import Any, Optional
from abc import ABC, abstractmethod

class ToolResult(BaseModel):
    success: bool
    output: str
    error: Optional[str] = None
    metadata: dict = {}

class BaseTool(ABC):
    @property
    @abstractmethod
    def name(self) -> str: ...

    @property
    @abstractmethod
    def description(self) -> str: ...

    @property
    @abstractmethod
    def parameters(self) -> dict: ...

    @abstractmethod
    def execute(self, **kwargs) -> ToolResult: ...

    def to_schema(self) -> dict:
        return {
            "name": self.name,
            "description": self.description,
            "input_schema": {
                "type": "object",
                "properties": self.parameters,
                "required": self._required_params(),
            },
        }

    def _required_params(self) -> list[str]:
        return [k for k, v in self.parameters.items() if "default" not in v]


class ReadFileTool(BaseTool):
    name = "read_file"
    description = (
        "Read the contents of a file at the given absolute path. "
        "Returns the file content as a string with line numbers. "
        "Use this before editing to understand current file state."
    )

    @property
    def parameters(self):
        return {
            "path": {
                "type": "string",
                "description": "Absolute path to the file to read",
            },
            "start_line": {
                "type": "integer",
                "description": "First line to read (1-indexed). Omit to start from beginning.",
            },
            "end_line": {
                "type": "integer",
                "description": "Last line to read (inclusive). Omit to read to end of file.",
            },
        }

    def execute(self, path: str, start_line: int = None, end_line: int = None) -> ToolResult:
        try:
            with open(path, "r") as f:
                lines = f.readlines()

            if start_line or end_line:
                s = (start_line or 1) - 1
                e = end_line or len(lines)
                lines = lines[s:e]

            numbered = [f"{i+1}\t{line}" for i, line in enumerate(lines)]
            return ToolResult(success=True, output="".join(numbered))
        except FileNotFoundError:
            return ToolResult(success=False, output="", error=f"File not found: {path}")
        except PermissionError:
            return ToolResult(success=False, output="", error=f"Permission denied: {path}")


class EditFileTool(BaseTool):
    name = "edit_file"
    description = (
        "Edit a file by replacing an exact string match with new content. "
        "The old_string must match exactly (including whitespace and indentation). "
        "Use read_file first to see the current content."
    )

    @property
    def parameters(self):
        return {
            "path": {"type": "string", "description": "Absolute path to the file"},
            "old_string": {"type": "string", "description": "Exact text to find and replace"},
            "new_string": {"type": "string", "description": "Replacement text"},
        }

    def execute(self, path: str, old_string: str, new_string: str) -> ToolResult:
        try:
            with open(path, "r") as f:
                content = f.read()

            count = content.count(old_string)
            if count == 0:
                return ToolResult(success=False, output="", error="old_string not found in file")
            if count > 1:
                return ToolResult(
                    success=False, output="",
                    error=f"old_string found {count} times. Provide more context to make it unique.",
                )

            new_content = content.replace(old_string, new_string, 1)
            with open(path, "w") as f:
                f.write(new_content)

            return ToolResult(success=True, output=f"Replaced 1 occurrence in {path}")
        except Exception as e:
            return ToolResult(success=False, output="", error=str(e))

3. Permission and Safety System

Autonomous agents must have layered permissions. Low-risk operations (reading files, searching code) run automatically. Medium-risk operations (writing files) may require one-time approval per session. High-risk operations (running shell commands, deleting files) require explicit approval every time.


# permissions/manager.py
from enum import Enum
from dataclasses import dataclass

class PermissionLevel(Enum):
    AUTO = "auto"           # No approval needed
    SESSION_ONCE = "once"   # Approve once per session
    EVERY_TIME = "every"    # Approve each invocation
    BLOCKED = "blocked"     # Never allowed

PERMISSION_DEFAULTS = {
    "read_file": PermissionLevel.AUTO,
    "list_directory": PermissionLevel.AUTO,
    "search_code": PermissionLevel.AUTO,
    "edit_file": PermissionLevel.SESSION_ONCE,
    "write_file": PermissionLevel.SESSION_ONCE,
    "run_command": PermissionLevel.EVERY_TIME,
    "delete_file": PermissionLevel.EVERY_TIME,
    "sudo_command": PermissionLevel.BLOCKED,
}

@dataclass
class ApprovalRequest:
    tool_name: str
    arguments: dict
    risk_assessment: str
    explanation: str

class PermissionManager:
    def __init__(self, config: dict = None, ui_callback=None):
        self.config = config or PERMISSION_DEFAULTS
        self.session_approvals: dict[str, bool] = {}
        self.ui_callback = ui_callback or self._default_prompt

    def check(self, tool_name: str, args: dict) -> bool:
        level = self.config.get(tool_name, PermissionLevel.EVERY_TIME)

        if level == PermissionLevel.AUTO:
            return True
        if level == PermissionLevel.BLOCKED:
            return False
        if level == PermissionLevel.SESSION_ONCE and tool_name in self.session_approvals:
            return self.session_approvals[tool_name]

        # Ask for approval
        risk = self._assess_risk(tool_name, args)
        request = ApprovalRequest(
            tool_name=tool_name,
            arguments=args,
            risk_assessment=risk,
            explanation=f"Agent wants to call {tool_name} with {args}",
        )
        approved = self.ui_callback(request)

        if level == PermissionLevel.SESSION_ONCE:
            self.session_approvals[tool_name] = approved

        return approved

    def _assess_risk(self, tool_name: str, args: dict) -> str:
        if tool_name == "run_command":
            cmd = args.get("command", "")
            dangerous = ["rm -rf", "sudo", "chmod 777", "mkfs", "> /dev/"]
            if any(d in cmd for d in dangerous):
                return "HIGH - destructive command detected"
        return "MEDIUM"

    @staticmethod
    def _default_prompt(request: ApprovalRequest) -> bool:
        print(f"\n--- APPROVAL REQUIRED ---")
        print(f"Tool: {request.tool_name}")
        print(f"Args: {request.arguments}")
        print(f"Risk: {request.risk_assessment}")
        response = input("Approve? (y/n): ").strip().lower()
        return response == "y"

4. Sandboxed Execution

Agents that run shell commands must be sandboxed. A rogue command can delete files, exfiltrate data, or crash systems. Sandbox at the filesystem, network, and process level.


# permissions/sandbox.py
import subprocess
import os
import shlex

class SandboxedExecutor:
    def __init__(self, workspace: str, timeout: int = 30):
        self.workspace = os.path.realpath(workspace)
        self.timeout = timeout
        self.allowed_commands = {"node", "npm", "npx", "python", "python3", "pip",
                                 "git", "ls", "cat", "grep", "find", "head", "tail",
                                 "wc", "sort", "uniq", "diff", "mkdir", "cp", "mv",
                                 "tsc", "eslint", "prettier", "jest", "pytest"}

    def validate(self, command: str) -> tuple[bool, str]:
        parts = shlex.split(command)
        if not parts:
            return False, "Empty command"

        base = os.path.basename(parts[0])
        if base not in self.allowed_commands:
            return False, f"Command '{base}' is not in the allowlist"

        # Check for shell injection patterns
        if any(c in command for c in [";", "&&", "||", "|", "`", "$("]):
            return False, "Shell operators are not allowed in sandboxed mode"

        return True, "OK"

    def execute(self, command: str) -> ToolResult:
        valid, reason = self.validate(command)
        if not valid:
            return ToolResult(success=False, output="", error=f"Blocked: {reason}")

        try:
            result = subprocess.run(
                shlex.split(command),
                cwd=self.workspace,
                capture_output=True,
                timeout=self.timeout,
                text=True,
                env={**os.environ, "HOME": self.workspace},
            )
            return ToolResult(
                success=result.returncode == 0,
                output=result.stdout[:10000],  # Truncate large outputs
                error=result.stderr[:5000] if result.returncode != 0 else None,
            )
        except subprocess.TimeoutExpired:
            return ToolResult(success=False, output="", error=f"Command timed out after {self.timeout}s")

5. Context Management

Autonomous agents often hit context window limits during long tasks. A context manager tracks what information the agent has seen, compresses old context, and injects relevant files on demand.


# context/manager.py
from dataclasses import dataclass, field

@dataclass
class ContextItem:
    source: str         # "file", "url", "command_output", "user_message"
    identifier: str     # file path, URL, command string
    content: str
    token_estimate: int
    priority: int = 5   # 1=highest, 10=lowest

class ContextManager:
    def __init__(self, max_tokens: int = 150_000):
        self.max_tokens = max_tokens
        self.items: list[ContextItem] = []

    def add(self, item: ContextItem) -> None:
        self.items.append(item)
        self._enforce_budget()

    def _enforce_budget(self):
        total = sum(i.token_estimate for i in self.items)
        if total <= self.max_tokens:
            return

        # Sort by priority (lower priority items removed first)
        self.items.sort(key=lambda x: x.priority)
        while total > self.max_tokens and self.items:
            removed = self.items.pop()
            total -= removed.token_estimate

    def get_context_string(self) -> str:
        parts = []
        for item in sorted(self.items, key=lambda x: x.priority):
            if item.source == "file":
                parts.append(f"## File: {item.identifier}\n```\n{item.content}\n```")
            elif item.source == "command_output":
                parts.append(f"## Command: {item.identifier}\n```\n{item.content}\n```")
            else:
                parts.append(f"## {item.source}: {item.identifier}\n{item.content}")
        return "\n\n".join(parts)

6. Checkpoint and Resume

Long-running agent tasks need checkpoint and resume capability so progress is not lost on failures, timeouts, or context window exhaustion.


# context/checkpoint.py
import json
import os
from datetime import datetime

class CheckpointManager:
    def __init__(self, storage_dir: str = ".agent_checkpoints"):
        self.storage_dir = storage_dir
        os.makedirs(storage_dir, exist_ok=True)

    def save(self, session_id: str, state: dict) -> str:
        checkpoint = {
            "timestamp": datetime.now().isoformat(),
            "session_id": session_id,
            "messages": state["messages"],
            "completed_steps": state.get("completed_steps", []),
            "pending_steps": state.get("pending_steps", []),
            "files_modified": state.get("files_modified", []),
        }
        path = os.path.join(self.storage_dir, f"{session_id}.json")
        with open(path, "w") as f:
            json.dump(checkpoint, f, indent=2)
        return path

    def restore(self, session_id: str) -> dict | None:
        path = os.path.join(self.storage_dir, f"{session_id}.json")
        if not os.path.exists(path):
            return None
        with open(path, "r") as f:
            return json.load(f)

    def create_resume_prompt(self, checkpoint: dict) -> str:
        completed = "\n".join(f"  - {s}" for s in checkpoint["completed_steps"])
        pending = "\n".join(f"  - {s}" for s in checkpoint["pending_steps"])
        modified = "\n".join(f"  - {f}" for f in checkpoint["files_modified"])

        return f"""Resuming from checkpoint.

Completed steps:
{completed}

Remaining steps:
{pending}

Files already modified:
{modified}

Continue from where you left off. Do not redo completed steps."""

Configuration Reference

Parameter	Type	Default	Description
`max_iterations`	int	`50`	Maximum think-act-observe cycles before forced stop
`token_budget`	int	`100000`	Total tokens allowed across all iterations
`retry_on_error`	bool	`true`	Whether to retry failed tool calls
`max_retries_per_tool`	int	`3`	Retries per tool before giving up
`sandbox_timeout`	int	`30`	Seconds before killing a sandboxed command
`permission_mode`	string	`"interactive"`	`interactive` prompts user, `strict` blocks all non-AUTO, `permissive` auto-approves everything
`checkpoint_interval`	int	`10`	Save checkpoint every N iterations
`context_max_tokens`	int	`150000`	Maximum tokens in the context window budget
`allowed_commands`	string[]	See sandbox defaults	Commands the sandbox permits
`model_routing`	object	See defaults	Maps TaskComplexity levels to model IDs

Best Practices

Start with a narrow tool set and expand only when needed. Agents with 5-8 well-designed tools outperform agents with 30 vague tools. Each additional tool increases the chance of the LLM selecting the wrong one.
Make tool descriptions unambiguous and include examples. The LLM chooses tools based on descriptions, not implementations. Include 1-2 usage examples directly in the description string.
Return structured errors from tools, not empty strings. When a tool fails, tell the agent exactly what went wrong so it can reason about recovery. "File not found: /src/utils.ts" is far better than returning an empty string.
Implement permission escalation, not permission bypass. If an agent needs a blocked operation, it should ask the user, not silently skip the safety check. Build the escalation path into the approval UI.
Checkpoint after every significant state change. File writes, command executions, and multi-step completions should all trigger a checkpoint save. This allows graceful recovery from crashes, timeouts, or context exhaustion.
Cap context window usage at 70-80% of the model's maximum. Leave room for the agent's reasoning, tool schemas, and the current response. Running at 100% capacity causes truncation and erratic behavior.
Log every tool call with inputs, outputs, duration, and approval status. This audit trail is essential for debugging agent behavior, reproducing issues, and understanding why the agent took a particular path.
Use the right model for the right step. Planning and reasoning benefit from frontier models. Simple file reads and searches can use fast, cheap models. Multi-model routing reduces cost by 40-60% without sacrificing quality.
Set hard limits on iterations, tokens, and wall-clock time. Without limits, a confused agent will loop indefinitely. Prefer failing loudly over running forever.
Test agents on recorded scenarios (VCR-style). Record tool inputs/outputs from successful runs and replay them in tests. This makes agent behavior deterministic and testable without real file system or API access.

Troubleshooting

Problem: Agent loops on the same tool call repeatedly. Solution: Track consecutive identical tool calls and inject a nudge message after 2-3 repetitions ("You have called read_file on the same path 3 times. The content has not changed. Consider a different approach."). Also check if your tool is returning unclear errors that the agent cannot reason about.

Problem: Agent calls tools with malformed arguments. Solution: Improve tool descriptions with explicit type annotations and examples. Add input validation in the tool's execute method and return a clear error message explaining the expected format. Consider using a simpler model for the planning step and a stronger model for execution.

Problem: Agent exceeds context window and starts producing garbage. Solution: Implement context compression. Summarize older conversation turns, remove duplicate file reads, and limit tool output length. Set a context_max_tokens budget and enforce it in the context manager.

Problem: Sandboxed commands fail that work in a normal shell. Solution: The sandbox restricts the PATH, HOME directory, and available environment variables. Check that the command's dependencies are accessible within the sandbox. Also verify that shell operators (pipes, redirects) are not being blocked by the validator.

Problem: Agent asks for approval on every single file edit. Solution: Switch the file editing permission to SESSION_ONCE mode, which asks once and remembers for the rest of the session. Or configure a workspace-scoped auto-approve that permits all operations within a specific directory.

⚠️ Loading Issue

Autonomous Agent Patterns Toolkit

Autonomous Agent Patterns Toolkit

Overview

When to Use

Quick Start

Core Concepts

1. The Agent Loop (Think-Decide-Act-Observe)

2. Tool Design Patterns

3. Permission and Safety System

4. Sandboxed Execution

5. Context Management

6. Checkpoint and Resume

Configuration Reference

Best Practices

Troubleshooting

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace