Gepetto Reverse Engineering Assistant

Overview

A comprehensive skill for AI-assisted reverse engineering using Gepetto — a plugin that integrates LLMs into IDA Pro and Ghidra to accelerate binary analysis. Gepetto uses GPT-4, Claude, or local models to explain decompiled functions, rename variables, identify vulnerabilities, and provide contextual analysis of disassembled code — dramatically speeding up reverse engineering workflows.

When to Use

Analyzing decompiled C/C++ code from binaries
Need AI-powered function explanation in IDA Pro or Ghidra
Identifying vulnerabilities in compiled binaries
Renaming obfuscated variables and functions
Understanding malware behavior
CTF challenges involving binary exploitation
Auditing closed-source software security

Quick Start


# IDA Pro plugin
git clone https://github.com/JusticeRage/Gepetto
cp gepetto.py /path/to/ida/plugins/

# Ghidra extension
# Install via Ghidra Extension Manager

# Configure API key
# In IDA: Edit → Plugin Options → Gepetto
# Set OPENAI_API_KEY or ANTHROPIC_API_KEY


# Or use as a standalone library
from gepetto import analyze_function

result = analyze_function("""
int __fastcall sub_140001000(__int64 a1, unsigned int a2) {
    char v3[256];
    memcpy(v3, (const void *)(a1 + 8), a2);
    if (a2 > 0x100)
        return -1;
    return process_buffer(v3, a2);
}
""")

print(result.explanation)
print(result.renamed_variables)
print(result.vulnerabilities)

Core Features

Function Analysis

Right-click on function in IDA → Gepetto → Explain Function

Output:
┌─────────────────────────────────────────────────────┐
│ Function: sub_140001000                              │
│                                                     │
│ Purpose: Copies user-provided data into a local     │
│ buffer and processes it.                             │
│                                                     │
│ Parameters:                                          │
│ - a1 (struct*): Pointer to data structure with      │
│   buffer at offset +8                                │
│ - a2 (size_t): Size of data to copy                 │
│                                                     │
│ Vulnerability: Buffer overflow — memcpy copies a2   │
│ bytes into v3[256] BEFORE checking if a2 > 0x100.   │
│ An attacker can overflow the stack buffer.           │
│                                                     │
│ Suggested names:                                     │
│ - sub_140001000 → copy_and_process_data             │
│ - a1 → input_struct                                  │
│ - a2 → data_size                                     │
│ - v3 → local_buffer                                  │
└─────────────────────────────────────────────────────┘

Variable Renaming


# Before Gepetto analysis
int __fastcall sub_4012B0(int a1, int a2, int a3) {
    int v4 = *(DWORD *)(a1 + 16);
    void *v5 = malloc(v4);
    if (a3 & 1)
        decrypt_xor(v5, *(BYTE **)(a1 + 8), v4, a2);
    return send_data(v5, v4);
}

# After Gepetto analysis
int __fastcall send_encrypted_payload(
    NetworkPacket *packet,
    int encryption_key,
    int flags
) {
    int payload_size = packet->data_length;
    void *payload_buffer = malloc(payload_size);
    if (flags & FLAG_ENCRYPTED)
        decrypt_xor(payload_buffer, packet->data, payload_size, encryption_key);
    return send_data(payload_buffer, payload_size);
}

Vulnerability Detection


# Common patterns Gepetto identifies:
vulnerability_patterns = {
    "buffer_overflow": "memcpy/strcpy without bounds checking",
    "format_string": "printf(user_input) without format specifier",
    "use_after_free": "Pointer used after free() called",
    "integer_overflow": "Arithmetic overflow in size calculation",
    "race_condition": "TOCTOU in file operations",
    "null_deref": "Pointer dereference without null check",
    "uninitialized": "Variable used before initialization",
    "double_free": "free() called twice on same pointer",
}

Configuration

IDA Pro Setup


# gepetto_config.py
GEPETTO_CONFIG = {
    "model": "gpt-4",           # or "claude-3-opus", "local"
    "api_key_env": "OPENAI_API_KEY",
    "max_tokens": 2000,
    "temperature": 0.1,         # Low temperature for accuracy
    "context_functions": 3,     # Include N related functions for context
    "auto_rename": True,        # Automatically apply renamed variables
    "highlight_vulns": True,    # Highlight vulnerabilities in IDA
    "language": "en",           # Output language
}

Local Model Setup


# Use local model via Ollama
GEPETTO_CONFIG = {
    "model": "local",
    "local_endpoint": "http://localhost:11434/api/generate",
    "local_model": "codellama:34b",
}

Analysis Workflow

Step	Action	Tool
1	Load binary	IDA Pro / Ghidra
2	Run auto-analysis	Built-in
3	Identify key functions	Xrefs, strings, imports
4	Explain with Gepetto	Right-click → Explain
5	Rename variables	Right-click → Rename
6	Find vulnerabilities	Right-click → Find Vulns
7	Analyze call graph	Cross-reference analysis
8	Document findings	Export annotations

Best Practices

Provide context — Include related functions when analyzing complex code
Use low temperature — 0.1 for analysis accuracy; higher for creative renaming
Verify AI suggestions — Always validate vulnerability claims manually
Start from main/entry — Work outward from entry points for better context
Use struct reconstruction — Let AI suggest struct layouts from field access patterns
Batch analyze — Process related functions together for better naming consistency
Save annotations — Export AI-generated comments to IDB/project files
Combine with dynamic analysis — Use debugger alongside AI static analysis
Use local models for sensitive code — Don't send proprietary code to cloud APIs
Iterate analysis — Re-analyze after renaming for improved subsequent explanations

Troubleshooting

AI gives incorrect analysis


# Provide more context — include caller/callee functions
# Reduce temperature for more deterministic output
# Specify the binary architecture and OS
"This is an x86-64 Windows PE binary compiled with MSVC"

Rate limiting on API


# Add delay between function analyses
import time
for func in functions_to_analyze:
    result = gepetto.analyze(func)
    time.sleep(1)  # Rate limit

Decompiler output too complex


# Simplify by analyzing smaller functions first
# Break complex functions at natural boundaries
# Use Gepetto on individual basic blocks

⚠️ Loading Issue

Gepetto Engine