Ultimate Senior Computer Vision
Enterprise-grade skill for world, class, computer, vision. Includes structured workflows, validation checks, and reusable patterns for development.
Ultimate Senior Computer Vision
A production-grade skill for senior computer vision engineers covering model training pipelines, inference optimization, image processing, video analysis, and deployment of CV systems at scale with modern frameworks.
When to Use This Skill
Choose this skill when:
- Building image classification, object detection, or segmentation pipelines
- Optimizing CV model inference for production latency requirements
- Implementing real-time video processing with OpenCV and deep learning
- Designing data augmentation and annotation workflows for training
- Deploying CV models with TensorRT, ONNX Runtime, or CoreML
Consider alternatives when:
- Working on NLP/text models → use a NLP engineering skill
- Need general ML infrastructure → use an ML platform skill
- Building a simple image resizer → use an image processing skill
- Working on generative AI images → use a generative AI skill
Quick Start
# Set up CV development environment pip install torch torchvision opencv-python-headless pip install ultralytics # YOLO models pip install albumentations # data augmentation pip install onnxruntime-gpu # inference optimization
# Production object detection pipeline from ultralytics import YOLO import cv2 import numpy as np class ObjectDetector: def __init__(self, model_path: str, conf_threshold: float = 0.5): self.model = YOLO(model_path) self.conf_threshold = conf_threshold def detect(self, image: np.ndarray) -> list[dict]: results = self.model(image, conf=self.conf_threshold, verbose=False) detections = [] for r in results: for box in r.boxes: detections.append({ 'class': r.names[int(box.cls)], 'confidence': float(box.conf), 'bbox': box.xyxy[0].tolist(), # [x1, y1, x2, y2] }) return detections def detect_video(self, video_path: str): cap = cv2.VideoCapture(video_path) while cap.isOpened(): ret, frame = cap.read() if not ret: break yield frame, self.detect(frame) cap.release()
Core Concepts
CV Task Selection Guide
| Task | Model Family | Output | Use Case |
|---|---|---|---|
| Classification | ResNet, EfficientNet, ViT | Class label + confidence | Product categorization |
| Object Detection | YOLOv8, DETR, Faster R-CNN | Bounding boxes + labels | Inventory counting |
| Segmentation | SAM, Mask R-CNN, U-Net | Pixel-level masks | Medical imaging |
| Pose Estimation | MediaPipe, HRNet | Keypoint coordinates | Fitness tracking |
| OCR | PaddleOCR, TrOCR | Text strings + positions | Document processing |
| Tracking | ByteTrack, DeepSORT | Object IDs across frames | Video surveillance |
Data Augmentation Pipeline
import albumentations as A from albumentations.pytorch import ToTensorV2 train_transform = A.Compose([ A.RandomResizedCrop(640, 640, scale=(0.5, 1.0)), A.HorizontalFlip(p=0.5), A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1), A.GaussNoise(var_limit=(10, 50), p=0.3), A.GaussianBlur(blur_limit=(3, 7), p=0.2), A.RandomShadow(p=0.2), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'])) # Validation: only resize and normalize — no augmentation val_transform = A.Compose([ A.Resize(640, 640), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
Model Export and Optimization
# Export PyTorch model to ONNX for production inference import torch import onnxruntime as ort def export_to_onnx(model, input_shape, output_path): model.eval() dummy = torch.randn(1, *input_shape) torch.onnx.export( model, dummy, output_path, input_names=['image'], output_names=['detections'], dynamic_axes={'image': {0: 'batch'}, 'detections': {0: 'batch'}}, opset_version=17, ) # Optimized ONNX Runtime inference class ONNXDetector: def __init__(self, model_path: str): providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] self.session = ort.InferenceSession(model_path, providers=providers) self.input_name = self.session.get_inputs()[0].name def predict(self, image: np.ndarray) -> np.ndarray: blob = cv2.dnn.blobFromImage(image, 1/255.0, (640, 640), swapRB=True) return self.session.run(None, {self.input_name: blob})[0]
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
modelFramework | string | 'ultralytics' | Framework: ultralytics, detectron2, or mmdet |
inferenceBackend | string | 'onnxruntime' | Runtime: onnxruntime, tensorrt, or torch |
inputResolution | number | 640 | Model input resolution (pixels) |
confidenceThreshold | number | 0.5 | Minimum detection confidence |
nmsThreshold | number | 0.45 | Non-maximum suppression IoU threshold |
batchSize | number | 1 | Inference batch size |
Best Practices
-
Start with pretrained models and fine-tune — Training from scratch requires massive datasets and compute. Start with COCO or ImageNet pretrained weights and fine-tune on your domain-specific data. Even 100-500 labeled images can yield strong results with transfer learning.
-
Profile inference end-to-end, not just model forward pass — Preprocessing (resize, normalize) and postprocessing (NMS, coordinate transform) often dominate total latency. Optimize the full pipeline, not just the model. Batch preprocessing on GPU when possible.
-
Version datasets alongside models — Every model checkpoint should reference the exact dataset version, augmentation config, and training hyperparameters used to produce it. Use DVC or a similar tool for dataset versioning.
-
Test with edge cases and adversarial conditions — Models perform well on test sets that match training distribution. Test explicitly with: poor lighting, motion blur, occlusion, unusual angles, and out-of-distribution objects. These conditions dominate real-world failure modes.
-
Monitor model performance in production with ground truth sampling — Aggregate metrics (accuracy, mAP) hide distribution shifts. Periodically sample predictions, compare against human labels, and track performance per class and per condition over time.
Common Issues
Model accuracy drops in production vs test set — Training and production data distributions differ. Analyze failure cases by category: lighting, angle, resolution, object size. Add targeted augmentations or collect more training data for underperforming conditions.
Inference too slow for real-time video — Export to ONNX or TensorRT for 2-5x speedup over PyTorch. Reduce input resolution (640→320) if accuracy permits. Use frame skipping (process every 3rd frame) with tracking to interpolate between detections.
GPU memory exhaustion during training — Reduce batch size, use gradient accumulation, or enable mixed precision training (torch.cuda.amp). For very large models, use gradient checkpointing to trade compute for memory. Monitor GPU memory with torch.cuda.memory_summary().
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.