Advanced Gemini
Enterprise-grade skill for user, asks, gemini, code. Includes structured workflows, validation checks, and reusable patterns for ai research.
Google Gemini API Integration
Overview
A comprehensive skill for building applications with Google's Gemini API — covering text generation, multimodal inputs (images, video, audio, documents), function calling, structured outputs, system instructions, and streaming. Gemini models offer competitive performance with unique capabilities like native multimodal understanding, large context windows (up to 2M tokens), and built-in Google Search grounding.
When to Use
- Building applications with Google Gemini models
- Need native multimodal understanding (images, video, audio, PDFs)
- Processing long documents (up to 2M token context window)
- Using function calling / tool use with Gemini
- Need Google Search grounding for real-time information
- Building with the Google AI Studio or Vertex AI
Quick Start
# Install pip install google-genai # Set API key export GOOGLE_API_KEY="your-key-here"
from google import genai client = genai.Client() # Simple text generation response = client.models.generate_content( model="gemini-2.0-flash", contents="Explain quantum computing in simple terms", ) print(response.text) # With system instruction response = client.models.generate_content( model="gemini-2.0-flash", contents="Write a haiku about coding", config=genai.types.GenerateContentConfig( system_instruction="You are a creative poet who loves technology", temperature=0.9, ), )
Multimodal Capabilities
Image Understanding
from google import genai from google.genai import types import base64 client = genai.Client() # From URL response = client.models.generate_content( model="gemini-2.0-flash", contents=[ types.Part.from_uri("https://example.com/image.jpg", mime_type="image/jpeg"), "Describe what you see in this image", ], ) # From local file with open("photo.jpg", "rb") as f: image_data = f.read() response = client.models.generate_content( model="gemini-2.0-flash", contents=[ types.Part.from_bytes(image_data, mime_type="image/jpeg"), "What objects are in this image?", ], )
Video and Audio
# Upload video file video_file = client.files.upload(file="video.mp4") # Analyze video response = client.models.generate_content( model="gemini-2.0-flash", contents=[video_file, "Summarize what happens in this video"], ) # Audio transcription and analysis audio_file = client.files.upload(file="recording.mp3") response = client.models.generate_content( model="gemini-2.0-flash", contents=[audio_file, "Transcribe and summarize this audio"], )
PDF Document Analysis
# Upload and analyze PDF pdf_file = client.files.upload(file="report.pdf") response = client.models.generate_content( model="gemini-2.0-flash", contents=[pdf_file, "Extract the key findings from this report"], )
Function Calling
from google.genai import types # Define tools tools = [ types.Tool(function_declarations=[ types.FunctionDeclaration( name="get_weather", description="Get current weather for a location", parameters=types.Schema( type="OBJECT", properties={ "location": types.Schema(type="STRING", description="City name"), "unit": types.Schema(type="STRING", enum=["celsius", "fahrenheit"]), }, required=["location"], ), ), ]), ] response = client.models.generate_content( model="gemini-2.0-flash", contents="What's the weather in Tokyo?", config=genai.types.GenerateContentConfig(tools=tools), ) # Process function call for part in response.candidates[0].content.parts: if part.function_call: name = part.function_call.name args = dict(part.function_call.args) print(f"Call: {name}({args})")
Structured Output
from pydantic import BaseModel from typing import List class Recipe(BaseModel): name: str ingredients: List[str] prep_time_minutes: int difficulty: str response = client.models.generate_content( model="gemini-2.0-flash", contents="Give me a recipe for chocolate cake", config=genai.types.GenerateContentConfig( response_mime_type="application/json", response_schema=Recipe, ), ) recipe = Recipe.model_validate_json(response.text)
Model Comparison
| Model | Context | Speed | Cost | Best For |
|---|---|---|---|---|
| Gemini 2.0 Flash | 1M | Fast | Low | General purpose, multimodal |
| Gemini 2.0 Pro | 2M | Medium | Medium | Complex reasoning |
| Gemini 1.5 Flash | 1M | Fastest | Lowest | High-throughput tasks |
| Gemini 1.5 Pro | 2M | Medium | Medium | Long document analysis |
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
temperature | 1.0 | Sampling randomness (0-2) |
top_p | 0.95 | Nucleus sampling threshold |
top_k | 40 | Top-k sampling |
max_output_tokens | 8192 | Maximum response length |
stop_sequences | [] | Stop generation strings |
candidate_count | 1 | Number of response candidates |
response_mime_type | text/plain | Output format (JSON, etc.) |
safety_settings | Default | Content safety thresholds |
Best Practices
- Use Gemini 2.0 Flash as default — Best speed/quality ratio for most tasks
- Leverage multimodal natively — No need for separate vision models
- Use structured output —
response_schemawith Pydantic for reliable JSON - Set system instructions — Define persona and constraints upfront
- Use streaming for long responses —
generate_content_streamfor real-time output - Upload large files first — Use
client.files.upload()for files >20MB - Enable Google Search grounding — For tasks needing current information
- Handle safety filters — Check
response.prompt_feedbackfor filtered responses - Use caching for repeated contexts — Context caching reduces cost for long system prompts
- Monitor usage — Track token usage for cost management
Troubleshooting
Response blocked by safety filters
# Adjust safety settings from google.genai import types config = types.GenerateContentConfig( safety_settings=[ types.SafetySetting( category="HARM_CATEGORY_HARASSMENT", threshold="BLOCK_ONLY_HIGH", ), ], )
File upload fails
# Check file size (<2GB) and supported formats # Wait for file processing file = client.files.upload(file="large_video.mp4") while file.state == "PROCESSING": import time time.sleep(5) file = client.files.get(name=file.name)
Context window exceeded
# Count tokens before sending count = client.models.count_tokens( model="gemini-2.0-flash", contents=long_text, ) print(f"Token count: {count.total_tokens}")
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.