A

Autonomous Agent Patterns Toolkit

Boost productivity using this design, patterns, building, autonomous. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Autonomous Agent Patterns Toolkit

Overview

The Autonomous Agent Patterns Toolkit provides a comprehensive collection of design patterns for building self-directed coding agents. These agents operate through iterative reasoning loops -- observing their environment, deciding on actions, executing tools, and learning from results -- all without constant human supervision. Inspired by production systems like Claude Code, Cline, and OpenAI Codex, this toolkit codifies the architecture, tool design, permission systems, browser automation, and context management patterns that make autonomous agents reliable and safe.

The difference between a demo agent and a production agent lies entirely in patterns: how you handle errors, enforce permissions, manage context windows, sandbox execution, and recover from failures. This toolkit covers all of those areas with implementation-ready code.

When to Use

  • Building autonomous coding agents that read, write, and test code without step-by-step human guidance
  • Designing tool/function calling interfaces that LLMs can use reliably
  • Implementing permission and approval systems for risky operations (file writes, shell commands, deployments)
  • Creating browser automation for agents that need to interact with web interfaces
  • Designing human-in-the-loop workflows where agents operate independently but escalate when uncertain
  • Building MCP (Model Context Protocol) servers to extend agent capabilities dynamically
  • Implementing checkpoint and resume functionality for long-running agent tasks

Quick Start

# Set up a minimal agent project mkdir -p agent-toolkit/{core,tools,permissions,browser,context} cd agent-toolkit # Python setup python -m venv .venv && source .venv/bin/activate pip install anthropic openai pydantic playwright # Or TypeScript npm init -y npm install @anthropic-ai/sdk openai zod playwright
# core/agent.py - Minimal agent loop import json from anthropic import Anthropic class MinimalAgent: """Think-Decide-Act-Observe loop in ~40 lines.""" def __init__(self, client: Anthropic, tools: list, max_steps: int = 50): self.client = client self.tools = {t["name"]: t for t in tools} self.tool_handlers = {} self.max_steps = max_steps self.messages = [] def register_handler(self, name: str, fn): self.tool_handlers[name] = fn def run(self, task: str) -> str: self.messages = [{"role": "user", "content": task}] for step in range(self.max_steps): response = self.client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, tools=list(self.tools.values()), messages=self.messages, ) # Collect assistant content self.messages.append({"role": "assistant", "content": response.content}) # If no tool use, agent is done tool_uses = [b for b in response.content if b.type == "tool_use"] if not tool_uses: text_blocks = [b.text for b in response.content if hasattr(b, "text")] return "\n".join(text_blocks) # Execute each tool call results = [] for tool_use in tool_uses: handler = self.tool_handlers.get(tool_use.name) try: output = handler(**tool_use.input) if handler else "Unknown tool" results.append({ "type": "tool_result", "tool_use_id": tool_use.id, "content": str(output), }) except Exception as e: results.append({ "type": "tool_result", "tool_use_id": tool_use.id, "content": f"Error: {e}", "is_error": True, }) self.messages.append({"role": "user", "content": results}) return "Max steps reached without completion"

Core Concepts

1. The Agent Loop (Think-Decide-Act-Observe)

Every autonomous agent follows the same fundamental cycle. The agent receives a task, reasons about it, decides which tool to call, executes the tool, observes the result, and loops until the task is complete or a limit is hit.

  +----------+     +----------+     +---------+
  |  THINK   |---->|  DECIDE  |---->|   ACT   |
  | (Reason) |     |  (Plan)  |     |(Execute)|
  +----------+     +----------+     +---------+
       ^                                 |
       |           +----------+          |
       +-----------| OBSERVE  |<---------+
                   | (Result) |
                   +----------+

The key design decisions within the loop:

# core/loop.py - Production-grade agent loop with multi-model support from enum import Enum from dataclasses import dataclass, field from typing import Any class TaskComplexity(Enum): SIMPLE = "simple" # Classification, extraction, short Q&A MODERATE = "moderate" # Code edits, multi-step reasoning COMPLEX = "complex" # Architecture, debugging, multi-file refactors @dataclass class AgentConfig: max_iterations: int = 50 models: dict = field(default_factory=lambda: { TaskComplexity.SIMPLE: "claude-haiku-4", TaskComplexity.MODERATE: "claude-sonnet-4-20250514", TaskComplexity.COMPLEX: "claude-sonnet-4-20250514", }) token_budget: int = 100_000 retry_on_error: bool = True max_retries_per_tool: int = 3 class ProductionAgentLoop: def __init__(self, config: AgentConfig): self.config = config self.total_tokens = 0 self.tool_error_counts: dict[str, int] = {} def select_model(self, complexity: TaskComplexity) -> str: return self.config.models[complexity] def should_continue(self, iteration: int) -> bool: if iteration >= self.config.max_iterations: return False if self.total_tokens >= self.config.token_budget: return False return True def handle_tool_error(self, tool_name: str, error: Exception) -> str: count = self.tool_error_counts.get(tool_name, 0) + 1 self.tool_error_counts[tool_name] = count if count >= self.config.max_retries_per_tool: return f"Tool '{tool_name}' failed {count} times. Skipping. Last error: {error}" return f"Error (attempt {count}/{self.config.max_retries_per_tool}): {error}"

2. Tool Design Patterns

The quality of your tool schemas directly determines agent reliability. The LLM never sees your implementation -- it only sees the JSON Schema descriptions. A perfectly coded tool with a vague description will fail; a simple tool with crystal-clear documentation will succeed.

# tools/base.py - Tool design with validation from pydantic import BaseModel, Field from typing import Any, Optional from abc import ABC, abstractmethod class ToolResult(BaseModel): success: bool output: str error: Optional[str] = None metadata: dict = {} class BaseTool(ABC): @property @abstractmethod def name(self) -> str: ... @property @abstractmethod def description(self) -> str: ... @property @abstractmethod def parameters(self) -> dict: ... @abstractmethod def execute(self, **kwargs) -> ToolResult: ... def to_schema(self) -> dict: return { "name": self.name, "description": self.description, "input_schema": { "type": "object", "properties": self.parameters, "required": self._required_params(), }, } def _required_params(self) -> list[str]: return [k for k, v in self.parameters.items() if "default" not in v] class ReadFileTool(BaseTool): name = "read_file" description = ( "Read the contents of a file at the given absolute path. " "Returns the file content as a string with line numbers. " "Use this before editing to understand current file state." ) @property def parameters(self): return { "path": { "type": "string", "description": "Absolute path to the file to read", }, "start_line": { "type": "integer", "description": "First line to read (1-indexed). Omit to start from beginning.", }, "end_line": { "type": "integer", "description": "Last line to read (inclusive). Omit to read to end of file.", }, } def execute(self, path: str, start_line: int = None, end_line: int = None) -> ToolResult: try: with open(path, "r") as f: lines = f.readlines() if start_line or end_line: s = (start_line or 1) - 1 e = end_line or len(lines) lines = lines[s:e] numbered = [f"{i+1}\t{line}" for i, line in enumerate(lines)] return ToolResult(success=True, output="".join(numbered)) except FileNotFoundError: return ToolResult(success=False, output="", error=f"File not found: {path}") except PermissionError: return ToolResult(success=False, output="", error=f"Permission denied: {path}") class EditFileTool(BaseTool): name = "edit_file" description = ( "Edit a file by replacing an exact string match with new content. " "The old_string must match exactly (including whitespace and indentation). " "Use read_file first to see the current content." ) @property def parameters(self): return { "path": {"type": "string", "description": "Absolute path to the file"}, "old_string": {"type": "string", "description": "Exact text to find and replace"}, "new_string": {"type": "string", "description": "Replacement text"}, } def execute(self, path: str, old_string: str, new_string: str) -> ToolResult: try: with open(path, "r") as f: content = f.read() count = content.count(old_string) if count == 0: return ToolResult(success=False, output="", error="old_string not found in file") if count > 1: return ToolResult( success=False, output="", error=f"old_string found {count} times. Provide more context to make it unique.", ) new_content = content.replace(old_string, new_string, 1) with open(path, "w") as f: f.write(new_content) return ToolResult(success=True, output=f"Replaced 1 occurrence in {path}") except Exception as e: return ToolResult(success=False, output="", error=str(e))

3. Permission and Safety System

Autonomous agents must have layered permissions. Low-risk operations (reading files, searching code) run automatically. Medium-risk operations (writing files) may require one-time approval per session. High-risk operations (running shell commands, deleting files) require explicit approval every time.

# permissions/manager.py from enum import Enum from dataclasses import dataclass class PermissionLevel(Enum): AUTO = "auto" # No approval needed SESSION_ONCE = "once" # Approve once per session EVERY_TIME = "every" # Approve each invocation BLOCKED = "blocked" # Never allowed PERMISSION_DEFAULTS = { "read_file": PermissionLevel.AUTO, "list_directory": PermissionLevel.AUTO, "search_code": PermissionLevel.AUTO, "edit_file": PermissionLevel.SESSION_ONCE, "write_file": PermissionLevel.SESSION_ONCE, "run_command": PermissionLevel.EVERY_TIME, "delete_file": PermissionLevel.EVERY_TIME, "sudo_command": PermissionLevel.BLOCKED, } @dataclass class ApprovalRequest: tool_name: str arguments: dict risk_assessment: str explanation: str class PermissionManager: def __init__(self, config: dict = None, ui_callback=None): self.config = config or PERMISSION_DEFAULTS self.session_approvals: dict[str, bool] = {} self.ui_callback = ui_callback or self._default_prompt def check(self, tool_name: str, args: dict) -> bool: level = self.config.get(tool_name, PermissionLevel.EVERY_TIME) if level == PermissionLevel.AUTO: return True if level == PermissionLevel.BLOCKED: return False if level == PermissionLevel.SESSION_ONCE and tool_name in self.session_approvals: return self.session_approvals[tool_name] # Ask for approval risk = self._assess_risk(tool_name, args) request = ApprovalRequest( tool_name=tool_name, arguments=args, risk_assessment=risk, explanation=f"Agent wants to call {tool_name} with {args}", ) approved = self.ui_callback(request) if level == PermissionLevel.SESSION_ONCE: self.session_approvals[tool_name] = approved return approved def _assess_risk(self, tool_name: str, args: dict) -> str: if tool_name == "run_command": cmd = args.get("command", "") dangerous = ["rm -rf", "sudo", "chmod 777", "mkfs", "> /dev/"] if any(d in cmd for d in dangerous): return "HIGH - destructive command detected" return "MEDIUM" @staticmethod def _default_prompt(request: ApprovalRequest) -> bool: print(f"\n--- APPROVAL REQUIRED ---") print(f"Tool: {request.tool_name}") print(f"Args: {request.arguments}") print(f"Risk: {request.risk_assessment}") response = input("Approve? (y/n): ").strip().lower() return response == "y"

4. Sandboxed Execution

Agents that run shell commands must be sandboxed. A rogue command can delete files, exfiltrate data, or crash systems. Sandbox at the filesystem, network, and process level.

# permissions/sandbox.py import subprocess import os import shlex class SandboxedExecutor: def __init__(self, workspace: str, timeout: int = 30): self.workspace = os.path.realpath(workspace) self.timeout = timeout self.allowed_commands = {"node", "npm", "npx", "python", "python3", "pip", "git", "ls", "cat", "grep", "find", "head", "tail", "wc", "sort", "uniq", "diff", "mkdir", "cp", "mv", "tsc", "eslint", "prettier", "jest", "pytest"} def validate(self, command: str) -> tuple[bool, str]: parts = shlex.split(command) if not parts: return False, "Empty command" base = os.path.basename(parts[0]) if base not in self.allowed_commands: return False, f"Command '{base}' is not in the allowlist" # Check for shell injection patterns if any(c in command for c in [";", "&&", "||", "|", "`", "$("]): return False, "Shell operators are not allowed in sandboxed mode" return True, "OK" def execute(self, command: str) -> ToolResult: valid, reason = self.validate(command) if not valid: return ToolResult(success=False, output="", error=f"Blocked: {reason}") try: result = subprocess.run( shlex.split(command), cwd=self.workspace, capture_output=True, timeout=self.timeout, text=True, env={**os.environ, "HOME": self.workspace}, ) return ToolResult( success=result.returncode == 0, output=result.stdout[:10000], # Truncate large outputs error=result.stderr[:5000] if result.returncode != 0 else None, ) except subprocess.TimeoutExpired: return ToolResult(success=False, output="", error=f"Command timed out after {self.timeout}s")

5. Context Management

Autonomous agents often hit context window limits during long tasks. A context manager tracks what information the agent has seen, compresses old context, and injects relevant files on demand.

# context/manager.py from dataclasses import dataclass, field @dataclass class ContextItem: source: str # "file", "url", "command_output", "user_message" identifier: str # file path, URL, command string content: str token_estimate: int priority: int = 5 # 1=highest, 10=lowest class ContextManager: def __init__(self, max_tokens: int = 150_000): self.max_tokens = max_tokens self.items: list[ContextItem] = [] def add(self, item: ContextItem) -> None: self.items.append(item) self._enforce_budget() def _enforce_budget(self): total = sum(i.token_estimate for i in self.items) if total <= self.max_tokens: return # Sort by priority (lower priority items removed first) self.items.sort(key=lambda x: x.priority) while total > self.max_tokens and self.items: removed = self.items.pop() total -= removed.token_estimate def get_context_string(self) -> str: parts = [] for item in sorted(self.items, key=lambda x: x.priority): if item.source == "file": parts.append(f"## File: {item.identifier}\n```\n{item.content}\n```") elif item.source == "command_output": parts.append(f"## Command: {item.identifier}\n```\n{item.content}\n```") else: parts.append(f"## {item.source}: {item.identifier}\n{item.content}") return "\n\n".join(parts)

6. Checkpoint and Resume

Long-running agent tasks need checkpoint and resume capability so progress is not lost on failures, timeouts, or context window exhaustion.

# context/checkpoint.py import json import os from datetime import datetime class CheckpointManager: def __init__(self, storage_dir: str = ".agent_checkpoints"): self.storage_dir = storage_dir os.makedirs(storage_dir, exist_ok=True) def save(self, session_id: str, state: dict) -> str: checkpoint = { "timestamp": datetime.now().isoformat(), "session_id": session_id, "messages": state["messages"], "completed_steps": state.get("completed_steps", []), "pending_steps": state.get("pending_steps", []), "files_modified": state.get("files_modified", []), } path = os.path.join(self.storage_dir, f"{session_id}.json") with open(path, "w") as f: json.dump(checkpoint, f, indent=2) return path def restore(self, session_id: str) -> dict | None: path = os.path.join(self.storage_dir, f"{session_id}.json") if not os.path.exists(path): return None with open(path, "r") as f: return json.load(f) def create_resume_prompt(self, checkpoint: dict) -> str: completed = "\n".join(f" - {s}" for s in checkpoint["completed_steps"]) pending = "\n".join(f" - {s}" for s in checkpoint["pending_steps"]) modified = "\n".join(f" - {f}" for f in checkpoint["files_modified"]) return f"""Resuming from checkpoint. Completed steps: {completed} Remaining steps: {pending} Files already modified: {modified} Continue from where you left off. Do not redo completed steps."""

Configuration Reference

ParameterTypeDefaultDescription
max_iterationsint50Maximum think-act-observe cycles before forced stop
token_budgetint100000Total tokens allowed across all iterations
retry_on_errorbooltrueWhether to retry failed tool calls
max_retries_per_toolint3Retries per tool before giving up
sandbox_timeoutint30Seconds before killing a sandboxed command
permission_modestring"interactive"interactive prompts user, strict blocks all non-AUTO, permissive auto-approves everything
checkpoint_intervalint10Save checkpoint every N iterations
context_max_tokensint150000Maximum tokens in the context window budget
allowed_commandsstring[]See sandbox defaultsCommands the sandbox permits
model_routingobjectSee defaultsMaps TaskComplexity levels to model IDs

Best Practices

  1. Start with a narrow tool set and expand only when needed. Agents with 5-8 well-designed tools outperform agents with 30 vague tools. Each additional tool increases the chance of the LLM selecting the wrong one.

  2. Make tool descriptions unambiguous and include examples. The LLM chooses tools based on descriptions, not implementations. Include 1-2 usage examples directly in the description string.

  3. Return structured errors from tools, not empty strings. When a tool fails, tell the agent exactly what went wrong so it can reason about recovery. "File not found: /src/utils.ts" is far better than returning an empty string.

  4. Implement permission escalation, not permission bypass. If an agent needs a blocked operation, it should ask the user, not silently skip the safety check. Build the escalation path into the approval UI.

  5. Checkpoint after every significant state change. File writes, command executions, and multi-step completions should all trigger a checkpoint save. This allows graceful recovery from crashes, timeouts, or context exhaustion.

  6. Cap context window usage at 70-80% of the model's maximum. Leave room for the agent's reasoning, tool schemas, and the current response. Running at 100% capacity causes truncation and erratic behavior.

  7. Log every tool call with inputs, outputs, duration, and approval status. This audit trail is essential for debugging agent behavior, reproducing issues, and understanding why the agent took a particular path.

  8. Use the right model for the right step. Planning and reasoning benefit from frontier models. Simple file reads and searches can use fast, cheap models. Multi-model routing reduces cost by 40-60% without sacrificing quality.

  9. Set hard limits on iterations, tokens, and wall-clock time. Without limits, a confused agent will loop indefinitely. Prefer failing loudly over running forever.

  10. Test agents on recorded scenarios (VCR-style). Record tool inputs/outputs from successful runs and replay them in tests. This makes agent behavior deterministic and testable without real file system or API access.

Troubleshooting

Problem: Agent loops on the same tool call repeatedly. Solution: Track consecutive identical tool calls and inject a nudge message after 2-3 repetitions ("You have called read_file on the same path 3 times. The content has not changed. Consider a different approach."). Also check if your tool is returning unclear errors that the agent cannot reason about.

Problem: Agent calls tools with malformed arguments. Solution: Improve tool descriptions with explicit type annotations and examples. Add input validation in the tool's execute method and return a clear error message explaining the expected format. Consider using a simpler model for the planning step and a stronger model for execution.

Problem: Agent exceeds context window and starts producing garbage. Solution: Implement context compression. Summarize older conversation turns, remove duplicate file reads, and limit tool output length. Set a context_max_tokens budget and enforce it in the context manager.

Problem: Sandboxed commands fail that work in a normal shell. Solution: The sandbox restricts the PATH, HOME directory, and available environment variables. Check that the command's dependencies are accessible within the sandbox. Also verify that shell operators (pipes, redirects) are not being blocked by the validator.

Problem: Agent asks for approval on every single file edit. Solution: Switch the file editing permission to SESSION_ONCE mode, which asks once and remembers for the rest of the session. Or configure a workspace-scoped auto-approve that permits all operations within a specific directory.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates