U

Ultimate Senior Computer Vision

Enterprise-grade skill for world, class, computer, vision. Includes structured workflows, validation checks, and reusable patterns for development.

SkillClipticsdevelopmentv1.0.0MIT
0 views0 copies

Ultimate Senior Computer Vision

A production-grade skill for senior computer vision engineers covering model training pipelines, inference optimization, image processing, video analysis, and deployment of CV systems at scale with modern frameworks.

When to Use This Skill

Choose this skill when:

  • Building image classification, object detection, or segmentation pipelines
  • Optimizing CV model inference for production latency requirements
  • Implementing real-time video processing with OpenCV and deep learning
  • Designing data augmentation and annotation workflows for training
  • Deploying CV models with TensorRT, ONNX Runtime, or CoreML

Consider alternatives when:

  • Working on NLP/text models → use a NLP engineering skill
  • Need general ML infrastructure → use an ML platform skill
  • Building a simple image resizer → use an image processing skill
  • Working on generative AI images → use a generative AI skill

Quick Start

# Set up CV development environment pip install torch torchvision opencv-python-headless pip install ultralytics # YOLO models pip install albumentations # data augmentation pip install onnxruntime-gpu # inference optimization
# Production object detection pipeline from ultralytics import YOLO import cv2 import numpy as np class ObjectDetector: def __init__(self, model_path: str, conf_threshold: float = 0.5): self.model = YOLO(model_path) self.conf_threshold = conf_threshold def detect(self, image: np.ndarray) -> list[dict]: results = self.model(image, conf=self.conf_threshold, verbose=False) detections = [] for r in results: for box in r.boxes: detections.append({ 'class': r.names[int(box.cls)], 'confidence': float(box.conf), 'bbox': box.xyxy[0].tolist(), # [x1, y1, x2, y2] }) return detections def detect_video(self, video_path: str): cap = cv2.VideoCapture(video_path) while cap.isOpened(): ret, frame = cap.read() if not ret: break yield frame, self.detect(frame) cap.release()

Core Concepts

CV Task Selection Guide

TaskModel FamilyOutputUse Case
ClassificationResNet, EfficientNet, ViTClass label + confidenceProduct categorization
Object DetectionYOLOv8, DETR, Faster R-CNNBounding boxes + labelsInventory counting
SegmentationSAM, Mask R-CNN, U-NetPixel-level masksMedical imaging
Pose EstimationMediaPipe, HRNetKeypoint coordinatesFitness tracking
OCRPaddleOCR, TrOCRText strings + positionsDocument processing
TrackingByteTrack, DeepSORTObject IDs across framesVideo surveillance

Data Augmentation Pipeline

import albumentations as A from albumentations.pytorch import ToTensorV2 train_transform = A.Compose([ A.RandomResizedCrop(640, 640, scale=(0.5, 1.0)), A.HorizontalFlip(p=0.5), A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1), A.GaussNoise(var_limit=(10, 50), p=0.3), A.GaussianBlur(blur_limit=(3, 7), p=0.2), A.RandomShadow(p=0.2), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'])) # Validation: only resize and normalize — no augmentation val_transform = A.Compose([ A.Resize(640, 640), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

Model Export and Optimization

# Export PyTorch model to ONNX for production inference import torch import onnxruntime as ort def export_to_onnx(model, input_shape, output_path): model.eval() dummy = torch.randn(1, *input_shape) torch.onnx.export( model, dummy, output_path, input_names=['image'], output_names=['detections'], dynamic_axes={'image': {0: 'batch'}, 'detections': {0: 'batch'}}, opset_version=17, ) # Optimized ONNX Runtime inference class ONNXDetector: def __init__(self, model_path: str): providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] self.session = ort.InferenceSession(model_path, providers=providers) self.input_name = self.session.get_inputs()[0].name def predict(self, image: np.ndarray) -> np.ndarray: blob = cv2.dnn.blobFromImage(image, 1/255.0, (640, 640), swapRB=True) return self.session.run(None, {self.input_name: blob})[0]

Configuration

ParameterTypeDefaultDescription
modelFrameworkstring'ultralytics'Framework: ultralytics, detectron2, or mmdet
inferenceBackendstring'onnxruntime'Runtime: onnxruntime, tensorrt, or torch
inputResolutionnumber640Model input resolution (pixels)
confidenceThresholdnumber0.5Minimum detection confidence
nmsThresholdnumber0.45Non-maximum suppression IoU threshold
batchSizenumber1Inference batch size

Best Practices

  1. Start with pretrained models and fine-tune — Training from scratch requires massive datasets and compute. Start with COCO or ImageNet pretrained weights and fine-tune on your domain-specific data. Even 100-500 labeled images can yield strong results with transfer learning.

  2. Profile inference end-to-end, not just model forward pass — Preprocessing (resize, normalize) and postprocessing (NMS, coordinate transform) often dominate total latency. Optimize the full pipeline, not just the model. Batch preprocessing on GPU when possible.

  3. Version datasets alongside models — Every model checkpoint should reference the exact dataset version, augmentation config, and training hyperparameters used to produce it. Use DVC or a similar tool for dataset versioning.

  4. Test with edge cases and adversarial conditions — Models perform well on test sets that match training distribution. Test explicitly with: poor lighting, motion blur, occlusion, unusual angles, and out-of-distribution objects. These conditions dominate real-world failure modes.

  5. Monitor model performance in production with ground truth sampling — Aggregate metrics (accuracy, mAP) hide distribution shifts. Periodically sample predictions, compare against human labels, and track performance per class and per condition over time.

Common Issues

Model accuracy drops in production vs test set — Training and production data distributions differ. Analyze failure cases by category: lighting, angle, resolution, object size. Add targeted augmentations or collect more training data for underperforming conditions.

Inference too slow for real-time video — Export to ONNX or TensorRT for 2-5x speedup over PyTorch. Reduce input resolution (640→320) if accuracy permits. Use frame skipping (process every 3rd frame) with tracking to interpolate between detections.

GPU memory exhaustion during training — Reduce batch size, use gradient accumulation, or enable mixed precision training (torch.cuda.amp). For very large models, use gradient checkpointing to trade compute for memory. Monitor GPU memory with torch.cuda.memory_summary().

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates