Ai Product Toolkit
Comprehensive skill designed for every, product, will, powered. Includes structured workflows, validation checks, and reusable patterns for business marketing.
AI Product Toolkit
Production-ready guide for shipping LLM-powered features — covering prompt engineering for products, cost optimization, safety systems, evaluation frameworks, and the gap between demos and production.
When to Use
Use this toolkit when:
- Adding AI/LLM features to an existing product
- Building an AI-native application from scratch
- Need to optimize LLM costs for production scale
- Building evaluation and safety systems for AI features
Not for:
- Pure ML research without product context
- Fine-tuning or training models (see post-training tools)
- Non-LLM AI (computer vision, recommendation systems)
Quick Start
Production LLM Integration
import anthropic from tenacity import retry, stop_after_attempt, wait_exponential client = anthropic.Anthropic() @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)) def generate_response(user_input, context): """Production-grade LLM call with retry and error handling.""" response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="You are a helpful product assistant for [product name].", messages=[ {"role": "user", "content": f"Context: {context}\n\nUser: {user_input}"} ] ) return response.content[0].text # With input validation def handle_user_query(user_input): if len(user_input) > 10000: return "Please keep your message under 10,000 characters." if not user_input.strip(): return "Please enter a message." response = generate_response(user_input, get_context(user_input)) return validate_output(response)
Cost Optimization
class LLMCostOptimizer: MODEL_TIERS = { "simple": {"model": "claude-haiku-4-5-20251001", "cost_per_1k": 0.00025}, "moderate": {"model": "claude-sonnet-4-20250514", "cost_per_1k": 0.003}, "complex": {"model": "claude-opus-4-20250514", "cost_per_1k": 0.015}, } def route_query(self, query, complexity_score): """Route to cheapest model that can handle the task.""" if complexity_score < 0.3: return self.MODEL_TIERS["simple"] elif complexity_score < 0.7: return self.MODEL_TIERS["moderate"] else: return self.MODEL_TIERS["complex"]
Core Concepts
Demo vs Production Gap
| Aspect | Demo | Production |
|---|---|---|
| Input quality | Clean, expected | Messy, adversarial, edge cases |
| Latency | Doesn't matter | < 2 seconds expected |
| Cost | Negligible | $$$$ at scale |
| Errors | Manual retry | Graceful degradation required |
| Safety | Trusted users | Adversarial users |
| Monitoring | None | Logs, alerts, dashboards |
Evaluation Framework
| Metric | Measurement | Target |
|---|---|---|
| Task completion | User successfully completes intent | > 85% |
| Factual accuracy | Correct information in response | > 95% |
| Latency P95 | 95th percentile response time | < 3s |
| Cost per query | Average API cost | < $0.01 |
| Safety rate | Responses passing safety checks | > 99.9% |
| User satisfaction | Thumbs up/down ratio | > 80% positive |
Configuration
| Parameter | Description |
|---|---|
primary_model | Default LLM model |
fallback_model | Backup if primary fails |
max_tokens | Response length limit |
temperature | Creativity vs determinism |
timeout_ms | Maximum response time |
retry_count | Attempts before fallback |
rate_limit | Requests per minute per user |
cost_budget | Monthly spending limit |
Best Practices
- Ship with the simplest model that works — upgrade only when users need more capability
- Build evaluation before features — you can't improve what you can't measure
- Add safety from day one — retrofitting safety is harder and more expensive
- Use model tiering — route simple queries to cheaper models
- Design for graceful degradation — show a helpful fallback when the LLM fails
- Log everything — prompts, responses, latency, and user feedback for continuous improvement
Common Issues
LLM costs are too high: Implement model tiering (route 60-70% of queries to cheaper models). Add response caching for repeated queries. Reduce max_tokens where possible.
Latency is too high for UX: Use streaming responses. Switch to faster models (Haiku for simple tasks). Pre-compute responses for predictable queries.
Users find inaccurate responses: Add retrieval (RAG) to ground responses in your data. Implement output validation. Add confidence scoring and show caveats for low-confidence answers.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.