Openai Docs System
Battle-tested skill for user, asks, build, openai. Includes structured workflows, validation checks, and reusable patterns for ai research.
OpenAI API Documentation System
Overview
A comprehensive skill for working with the OpenAI developer API ecosystem, covering the full range of OpenAI products and capabilities. This includes the Chat Completions API, the Responses API for stateful agentic workflows, the Agents SDK for building multi-agent systems, the Realtime API for speech-to-speech interactions, structured outputs with JSON schema enforcement, function calling, vision and multimodal inputs, embeddings, fine-tuning, and the latest model families including GPT-4o, o3, o4-mini, and gpt-image. This skill serves as an authoritative reference for building production applications on OpenAI's platform.
When to Use
- Building applications with OpenAI's Chat Completions or Responses API
- Implementing function calling and tool use with GPT models
- Using structured outputs with JSON schema enforcement
- Working with vision capabilities (image input analysis)
- Building real-time voice applications with the Realtime API
- Creating multi-agent systems with the OpenAI Agents SDK
- Fine-tuning GPT models on custom datasets
- Generating embeddings for search, clustering, or RAG
- Need authoritative, up-to-date OpenAI API guidance
Quick Start
pip install openai export OPENAI_API_KEY="sk-..."
from openai import OpenAI client = OpenAI() # Basic chat completion response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain recursion in simple terms."}, ], ) print(response.choices[0].message.content)
Core Concepts
Chat Completions API
The primary API for text generation, supporting conversation history, system instructions, and tool use:
from openai import OpenAI client = OpenAI() # Multi-turn conversation response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are an expert Python tutor."}, {"role": "user", "content": "What are list comprehensions?"}, {"role": "assistant", "content": "List comprehensions are a concise way to create lists..."}, {"role": "user", "content": "Show me an example with filtering."}, ], temperature=0.7, max_tokens=500, )
Responses API (Stateful Agents)
The Responses API is designed for agentic workflows with built-in state management:
from openai import OpenAI client = OpenAI() # Create a stateful response with tools response = client.responses.create( model="gpt-4o", input="Search for the latest Python 3.13 features and summarize them.", tools=[{"type": "web_search_preview"}], ) print(response.output_text) # Continue the conversation with previous response context follow_up = client.responses.create( model="gpt-4o", input="Now compare those features with Python 3.12.", previous_response_id=response.id, )
Model Selection Guide
| Model | Context | Best For | Cost Tier |
|---|---|---|---|
gpt-4o | 128K | General purpose, multimodal | Medium |
gpt-4o-mini | 128K | Fast, cost-effective tasks | Low |
o3 | 200K | Complex reasoning, math, code | High |
o4-mini | 200K | Efficient reasoning | Medium |
gpt-4.1 | 1M | Long context, instruction following | Medium |
gpt-4.1-mini | 1M | Cost-effective long context | Low |
gpt-4.1-nano | 1M | Fastest, cheapest | Very Low |
Structured Outputs
Force the model to produce valid JSON matching a specific schema:
from openai import OpenAI from pydantic import BaseModel client = OpenAI() class CalendarEvent(BaseModel): name: str date: str participants: list[str] location: str | None = None response = client.beta.chat.completions.parse( model="gpt-4o", messages=[ {"role": "system", "content": "Extract event details from the text."}, {"role": "user", "content": "Team standup tomorrow at 10am with Alice, Bob, and Carol in Room 3B."}, ], response_format=CalendarEvent, ) event = response.choices[0].message.parsed print(f"{event.name} on {event.date} with {', '.join(event.participants)}")
Function Calling
import json from openai import OpenAI client = OpenAI() tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location.", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City and state, e.g. 'San Francisco, CA'"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, }, "required": ["location"], }, }, } ] response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tools=tools, tool_choice="auto", ) # Handle tool calls message = response.choices[0].message if message.tool_calls: for tool_call in message.tool_calls: args = json.loads(tool_call.function.arguments) # Execute the function result = get_weather(**args) # Send result back to the model messages = [ {"role": "user", "content": "What's the weather in Tokyo?"}, message, { "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), }, ] final_response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools )
Vision (Image Input)
from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image? Describe in detail."}, { "type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}, }, ], } ], ) # Base64 image input import base64 with open("diagram.png", "rb") as f: b64_image = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="gpt-4o", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Explain this diagram."}, { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64_image}"}, }, ], } ], )
Streaming
from openai import OpenAI client = OpenAI() stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a short story about AI."}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Embeddings
from openai import OpenAI client = OpenAI() # Generate embeddings for text response = client.embeddings.create( model="text-embedding-3-small", input=["Machine learning is a branch of AI.", "Deep learning uses neural networks."], ) embedding_1 = response.data[0].embedding # 1536-dimensional vector embedding_2 = response.data[1].embedding # Cosine similarity import numpy as np similarity = np.dot(embedding_1, embedding_2) / (np.linalg.norm(embedding_1) * np.linalg.norm(embedding_2)) print(f"Similarity: {similarity:.3f}")
Agents SDK
from agents import Agent, Runner, function_tool @function_tool def search_database(query: str) -> str: """Search the knowledge base for relevant information.""" return f"Found results for: {query}" @function_tool def send_email(to: str, subject: str, body: str) -> str: """Send an email to a specified recipient.""" return f"Email sent to {to}" # Create specialized agents researcher = Agent( name="Researcher", instructions="You find information using the search tool.", tools=[search_database], ) writer = Agent( name="Writer", instructions="You draft emails based on research findings.", tools=[send_email], ) # Orchestrator agent with handoff capability orchestrator = Agent( name="Orchestrator", instructions="Route tasks to the appropriate specialist agent.", handoffs=[researcher, writer], ) # Run the multi-agent system result = Runner.run_sync(orchestrator, "Research our Q4 results and email a summary to [email protected]") print(result.final_output)
Configuration Reference
Chat Completions Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | Required | Model identifier |
messages | list | Required | Conversation messages |
temperature | float | 1.0 | Randomness (0.0 - 2.0) |
max_tokens | int | Model max | Maximum output tokens |
top_p | float | 1.0 | Nucleus sampling threshold |
frequency_penalty | float | 0.0 | Penalize frequent tokens (-2.0 to 2.0) |
presence_penalty | float | 0.0 | Penalize repeated topics (-2.0 to 2.0) |
stop | list | None | Up to 4 stop sequences |
stream | bool | False | Enable streaming response |
tools | list | None | Available tools/functions |
tool_choice | str/obj | "auto" | Tool selection strategy |
response_format | obj | None | JSON mode or structured output schema |
seed | int | None | Deterministic sampling seed |
Embedding Models
| Model | Dimensions | Max Input | Best For |
|---|---|---|---|
text-embedding-3-small | 1536 | 8191 tokens | Cost-effective, general |
text-embedding-3-large | 3072 | 8191 tokens | Higher accuracy |
Best Practices
- Use system messages for persona and constraints -- Define the assistant's role, tone, and boundaries in the system message for consistent behavior across conversations.
- Prefer structured outputs for data extraction -- Use
response_formatwith Pydantic models or JSON schemas to eliminate parsing errors and guarantee valid output structure. - Implement exponential backoff for rate limits -- Use the
openailibrary's built-in retry mechanism or implement backoff for 429 errors to handle API rate limiting gracefully. - Stream responses for user-facing applications -- Streaming provides perceived responsiveness by showing tokens as they are generated, significantly improving UX for long responses.
- Use
gpt-4o-minifor high-volume, simple tasks -- Reservegpt-4oand reasoning models for complex tasks. Using mini models for classification, extraction, and simple Q&A reduces cost by 10-20x. - Set
max_tokensexplicitly -- Prevent runaway costs and latency by capping output length. Leave headroom for the response but do not use the model's full context window. - Pin model versions in production -- Use dated model snapshots (e.g.,
gpt-4o-2024-11-20) rather than aliases to prevent unexpected behavior changes on model updates. - Validate function call arguments -- Always validate and sanitize the JSON arguments from tool calls before executing functions. The model can produce unexpected argument values.
- Use embeddings with dimensionality reduction --
text-embedding-3models support thedimensionsparameter to reduce vector size without retraining, saving storage and compute. - Cache identical requests -- For deterministic tasks (same prompt + temperature=0), cache responses to avoid redundant API calls and reduce cost.
Troubleshooting
Rate limit errors (429):
Implement exponential backoff with jitter. The openai library retries automatically for transient errors. For sustained throughput, request a rate limit increase through the OpenAI dashboard.
Context length exceeded error:
Count tokens before sending using tiktoken. Truncate or summarize conversation history to fit within the model's context window. Use gpt-4.1 for up to 1M tokens.
Function calling returns malformed arguments:
Simplify the function schema -- use fewer parameters, clearer descriptions, and explicit enum values. Add "additionalProperties": false to prevent extra fields.
Structured output does not match schema:
Ensure the schema uses only supported JSON Schema features. Avoid $ref, oneOf, and recursive types. Use Pydantic models with client.beta.chat.completions.parse() for automatic validation.
Streaming drops chunks or hangs:
Check network stability and proxy timeout settings. Use client.with_streaming_response for better error handling. Implement a client-side timeout to detect stalled streams.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.