A

Advanced Gemini

Enterprise-grade skill for user, asks, gemini, code. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Google Gemini API Integration

Overview

A comprehensive skill for building applications with Google's Gemini API — covering text generation, multimodal inputs (images, video, audio, documents), function calling, structured outputs, system instructions, and streaming. Gemini models offer competitive performance with unique capabilities like native multimodal understanding, large context windows (up to 2M tokens), and built-in Google Search grounding.

When to Use

  • Building applications with Google Gemini models
  • Need native multimodal understanding (images, video, audio, PDFs)
  • Processing long documents (up to 2M token context window)
  • Using function calling / tool use with Gemini
  • Need Google Search grounding for real-time information
  • Building with the Google AI Studio or Vertex AI

Quick Start

# Install pip install google-genai # Set API key export GOOGLE_API_KEY="your-key-here"
from google import genai client = genai.Client() # Simple text generation response = client.models.generate_content( model="gemini-2.0-flash", contents="Explain quantum computing in simple terms", ) print(response.text) # With system instruction response = client.models.generate_content( model="gemini-2.0-flash", contents="Write a haiku about coding", config=genai.types.GenerateContentConfig( system_instruction="You are a creative poet who loves technology", temperature=0.9, ), )

Multimodal Capabilities

Image Understanding

from google import genai from google.genai import types import base64 client = genai.Client() # From URL response = client.models.generate_content( model="gemini-2.0-flash", contents=[ types.Part.from_uri("https://example.com/image.jpg", mime_type="image/jpeg"), "Describe what you see in this image", ], ) # From local file with open("photo.jpg", "rb") as f: image_data = f.read() response = client.models.generate_content( model="gemini-2.0-flash", contents=[ types.Part.from_bytes(image_data, mime_type="image/jpeg"), "What objects are in this image?", ], )

Video and Audio

# Upload video file video_file = client.files.upload(file="video.mp4") # Analyze video response = client.models.generate_content( model="gemini-2.0-flash", contents=[video_file, "Summarize what happens in this video"], ) # Audio transcription and analysis audio_file = client.files.upload(file="recording.mp3") response = client.models.generate_content( model="gemini-2.0-flash", contents=[audio_file, "Transcribe and summarize this audio"], )

PDF Document Analysis

# Upload and analyze PDF pdf_file = client.files.upload(file="report.pdf") response = client.models.generate_content( model="gemini-2.0-flash", contents=[pdf_file, "Extract the key findings from this report"], )

Function Calling

from google.genai import types # Define tools tools = [ types.Tool(function_declarations=[ types.FunctionDeclaration( name="get_weather", description="Get current weather for a location", parameters=types.Schema( type="OBJECT", properties={ "location": types.Schema(type="STRING", description="City name"), "unit": types.Schema(type="STRING", enum=["celsius", "fahrenheit"]), }, required=["location"], ), ), ]), ] response = client.models.generate_content( model="gemini-2.0-flash", contents="What's the weather in Tokyo?", config=genai.types.GenerateContentConfig(tools=tools), ) # Process function call for part in response.candidates[0].content.parts: if part.function_call: name = part.function_call.name args = dict(part.function_call.args) print(f"Call: {name}({args})")

Structured Output

from pydantic import BaseModel from typing import List class Recipe(BaseModel): name: str ingredients: List[str] prep_time_minutes: int difficulty: str response = client.models.generate_content( model="gemini-2.0-flash", contents="Give me a recipe for chocolate cake", config=genai.types.GenerateContentConfig( response_mime_type="application/json", response_schema=Recipe, ), ) recipe = Recipe.model_validate_json(response.text)

Model Comparison

ModelContextSpeedCostBest For
Gemini 2.0 Flash1MFastLowGeneral purpose, multimodal
Gemini 2.0 Pro2MMediumMediumComplex reasoning
Gemini 1.5 Flash1MFastestLowestHigh-throughput tasks
Gemini 1.5 Pro2MMediumMediumLong document analysis

Configuration Reference

ParameterDefaultDescription
temperature1.0Sampling randomness (0-2)
top_p0.95Nucleus sampling threshold
top_k40Top-k sampling
max_output_tokens8192Maximum response length
stop_sequences[]Stop generation strings
candidate_count1Number of response candidates
response_mime_typetext/plainOutput format (JSON, etc.)
safety_settingsDefaultContent safety thresholds

Best Practices

  1. Use Gemini 2.0 Flash as default — Best speed/quality ratio for most tasks
  2. Leverage multimodal natively — No need for separate vision models
  3. Use structured outputresponse_schema with Pydantic for reliable JSON
  4. Set system instructions — Define persona and constraints upfront
  5. Use streaming for long responsesgenerate_content_stream for real-time output
  6. Upload large files first — Use client.files.upload() for files >20MB
  7. Enable Google Search grounding — For tasks needing current information
  8. Handle safety filters — Check response.prompt_feedback for filtered responses
  9. Use caching for repeated contexts — Context caching reduces cost for long system prompts
  10. Monitor usage — Track token usage for cost management

Troubleshooting

Response blocked by safety filters

# Adjust safety settings from google.genai import types config = types.GenerateContentConfig( safety_settings=[ types.SafetySetting( category="HARM_CATEGORY_HARASSMENT", threshold="BLOCK_ONLY_HIGH", ), ], )

File upload fails

# Check file size (<2GB) and supported formats # Wait for file processing file = client.files.upload(file="large_video.mp4") while file.state == "PROCESSING": import time time.sleep(5) file = client.files.get(name=file.name)

Context window exceeded

# Count tokens before sending count = client.models.count_tokens( model="gemini-2.0-flash", contents=long_text, ) print(f"Token count: {count.total_tokens}")
Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates