OpenAI API Documentation System

Overview

A comprehensive skill for working with the OpenAI developer API ecosystem, covering the full range of OpenAI products and capabilities. This includes the Chat Completions API, the Responses API for stateful agentic workflows, the Agents SDK for building multi-agent systems, the Realtime API for speech-to-speech interactions, structured outputs with JSON schema enforcement, function calling, vision and multimodal inputs, embeddings, fine-tuning, and the latest model families including GPT-4o, o3, o4-mini, and gpt-image. This skill serves as an authoritative reference for building production applications on OpenAI's platform.

When to Use

Building applications with OpenAI's Chat Completions or Responses API
Implementing function calling and tool use with GPT models
Using structured outputs with JSON schema enforcement
Working with vision capabilities (image input analysis)
Building real-time voice applications with the Realtime API
Creating multi-agent systems with the OpenAI Agents SDK
Fine-tuning GPT models on custom datasets
Generating embeddings for search, clustering, or RAG
Need authoritative, up-to-date OpenAI API guidance

Quick Start


pip install openai
export OPENAI_API_KEY="sk-..."


from openai import OpenAI

client = OpenAI()

# Basic chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain recursion in simple terms."},
    ],
)
print(response.choices[0].message.content)

Core Concepts

Chat Completions API

The primary API for text generation, supporting conversation history, system instructions, and tool use:


from openai import OpenAI

client = OpenAI()

# Multi-turn conversation
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an expert Python tutor."},
        {"role": "user", "content": "What are list comprehensions?"},
        {"role": "assistant", "content": "List comprehensions are a concise way to create lists..."},
        {"role": "user", "content": "Show me an example with filtering."},
    ],
    temperature=0.7,
    max_tokens=500,
)

Responses API (Stateful Agents)

The Responses API is designed for agentic workflows with built-in state management:


from openai import OpenAI

client = OpenAI()

# Create a stateful response with tools
response = client.responses.create(
    model="gpt-4o",
    input="Search for the latest Python 3.13 features and summarize them.",
    tools=[{"type": "web_search_preview"}],
)

print(response.output_text)

# Continue the conversation with previous response context
follow_up = client.responses.create(
    model="gpt-4o",
    input="Now compare those features with Python 3.12.",
    previous_response_id=response.id,
)

Model Selection Guide

Model	Context	Best For	Cost Tier
`gpt-4o`	128K	General purpose, multimodal	Medium
`gpt-4o-mini`	128K	Fast, cost-effective tasks	Low
`o3`	200K	Complex reasoning, math, code	High
`o4-mini`	200K	Efficient reasoning	Medium
`gpt-4.1`	1M	Long context, instruction following	Medium
`gpt-4.1-mini`	1M	Cost-effective long context	Low
`gpt-4.1-nano`	1M	Fastest, cheapest	Very Low

Structured Outputs

Force the model to produce valid JSON matching a specific schema:


from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]
    location: str | None = None

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract event details from the text."},
        {"role": "user", "content": "Team standup tomorrow at 10am with Alice, Bob, and Carol in Room 3B."},
    ],
    response_format=CalendarEvent,
)

event = response.choices[0].message.parsed
print(f"{event.name} on {event.date} with {', '.join(event.participants)}")

Function Calling


import json
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City and state, e.g. 'San Francisco, CA'"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

# Handle tool calls
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        args = json.loads(tool_call.function.arguments)
        # Execute the function
        result = get_weather(**args)

        # Send result back to the model
        messages = [
            {"role": "user", "content": "What's the weather in Tokyo?"},
            message,
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            },
        ]
        final_response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=tools
        )

Vision (Image Input)


from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image? Describe in detail."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"},
                },
            ],
        }
    ],
)

# Base64 image input
import base64

with open("diagram.png", "rb") as f:
    b64_image = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Explain this diagram."},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{b64_image}"},
                },
            ],
        }
    ],
)

Streaming


from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story about AI."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Embeddings


from openai import OpenAI

client = OpenAI()

# Generate embeddings for text
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Machine learning is a branch of AI.", "Deep learning uses neural networks."],
)

embedding_1 = response.data[0].embedding  # 1536-dimensional vector
embedding_2 = response.data[1].embedding

# Cosine similarity
import numpy as np
similarity = np.dot(embedding_1, embedding_2) / (np.linalg.norm(embedding_1) * np.linalg.norm(embedding_2))
print(f"Similarity: {similarity:.3f}")

Agents SDK


from agents import Agent, Runner, function_tool

@function_tool
def search_database(query: str) -> str:
    """Search the knowledge base for relevant information."""
    return f"Found results for: {query}"

@function_tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a specified recipient."""
    return f"Email sent to {to}"

# Create specialized agents
researcher = Agent(
    name="Researcher",
    instructions="You find information using the search tool.",
    tools=[search_database],
)

writer = Agent(
    name="Writer",
    instructions="You draft emails based on research findings.",
    tools=[send_email],
)

# Orchestrator agent with handoff capability
orchestrator = Agent(
    name="Orchestrator",
    instructions="Route tasks to the appropriate specialist agent.",
    handoffs=[researcher, writer],
)

# Run the multi-agent system
result = Runner.run_sync(orchestrator, "Research our Q4 results and email a summary to [email protected]")
print(result.final_output)

Configuration Reference

Chat Completions Parameters

Parameter	Type	Default	Description
`model`	str	Required	Model identifier
`messages`	list	Required	Conversation messages
`temperature`	float	1.0	Randomness (0.0 - 2.0)
`max_tokens`	int	Model max	Maximum output tokens
`top_p`	float	1.0	Nucleus sampling threshold
`frequency_penalty`	float	0.0	Penalize frequent tokens (-2.0 to 2.0)
`presence_penalty`	float	0.0	Penalize repeated topics (-2.0 to 2.0)
`stop`	list	None	Up to 4 stop sequences
`stream`	bool	False	Enable streaming response
`tools`	list	None	Available tools/functions
`tool_choice`	str/obj	"auto"	Tool selection strategy
`response_format`	obj	None	JSON mode or structured output schema
`seed`	int	None	Deterministic sampling seed

Embedding Models

Model	Dimensions	Max Input	Best For
`text-embedding-3-small`	1536	8191 tokens	Cost-effective, general
`text-embedding-3-large`	3072	8191 tokens	Higher accuracy

Best Practices

Use system messages for persona and constraints -- Define the assistant's role, tone, and boundaries in the system message for consistent behavior across conversations.
Prefer structured outputs for data extraction -- Use response_format with Pydantic models or JSON schemas to eliminate parsing errors and guarantee valid output structure.
Implement exponential backoff for rate limits -- Use the openai library's built-in retry mechanism or implement backoff for 429 errors to handle API rate limiting gracefully.
Stream responses for user-facing applications -- Streaming provides perceived responsiveness by showing tokens as they are generated, significantly improving UX for long responses.
Use gpt-4o-mini for high-volume, simple tasks -- Reserve gpt-4o and reasoning models for complex tasks. Using mini models for classification, extraction, and simple Q&A reduces cost by 10-20x.
Set max_tokens explicitly -- Prevent runaway costs and latency by capping output length. Leave headroom for the response but do not use the model's full context window.
Pin model versions in production -- Use dated model snapshots (e.g., gpt-4o-2024-11-20) rather than aliases to prevent unexpected behavior changes on model updates.
Validate function call arguments -- Always validate and sanitize the JSON arguments from tool calls before executing functions. The model can produce unexpected argument values.
Use embeddings with dimensionality reduction -- text-embedding-3 models support the dimensions parameter to reduce vector size without retraining, saving storage and compute.
Cache identical requests -- For deterministic tasks (same prompt + temperature=0), cache responses to avoid redundant API calls and reduce cost.

Troubleshooting

Rate limit errors (429): Implement exponential backoff with jitter. The openai library retries automatically for transient errors. For sustained throughput, request a rate limit increase through the OpenAI dashboard.

Context length exceeded error: Count tokens before sending using tiktoken. Truncate or summarize conversation history to fit within the model's context window. Use gpt-4.1 for up to 1M tokens.

Function calling returns malformed arguments: Simplify the function schema -- use fewer parameters, clearer descriptions, and explicit enum values. Add "additionalProperties": false to prevent extra fields.

Structured output does not match schema: Ensure the schema uses only supported JSON Schema features. Avoid $ref, oneOf, and recursive types. Use Pydantic models with client.beta.chat.completions.parse() for automatic validation.

Streaming drops chunks or hangs: Check network stability and proxy timeout settings. Use client.with_streaming_response for better error handling. Implement a client-side timeout to detect stalled streams.

⚠️ Loading Issue

Openai Docs System