Ultimate Senior Computer Vision

A production-grade skill for senior computer vision engineers covering model training pipelines, inference optimization, image processing, video analysis, and deployment of CV systems at scale with modern frameworks.

When to Use This Skill

Choose this skill when:

Building image classification, object detection, or segmentation pipelines
Optimizing CV model inference for production latency requirements
Implementing real-time video processing with OpenCV and deep learning
Designing data augmentation and annotation workflows for training
Deploying CV models with TensorRT, ONNX Runtime, or CoreML

Consider alternatives when:

Working on NLP/text models → use a NLP engineering skill
Need general ML infrastructure → use an ML platform skill
Building a simple image resizer → use an image processing skill
Working on generative AI images → use a generative AI skill

Quick Start


# Set up CV development environment
pip install torch torchvision opencv-python-headless
pip install ultralytics  # YOLO models
pip install albumentations  # data augmentation
pip install onnxruntime-gpu  # inference optimization


# Production object detection pipeline
from ultralytics import YOLO
import cv2
import numpy as np

class ObjectDetector:
    def __init__(self, model_path: str, conf_threshold: float = 0.5):
        self.model = YOLO(model_path)
        self.conf_threshold = conf_threshold

    def detect(self, image: np.ndarray) -> list[dict]:
        results = self.model(image, conf=self.conf_threshold, verbose=False)
        detections = []
        for r in results:
            for box in r.boxes:
                detections.append({
                    'class': r.names[int(box.cls)],
                    'confidence': float(box.conf),
                    'bbox': box.xyxy[0].tolist(),  # [x1, y1, x2, y2]
                })
        return detections

    def detect_video(self, video_path: str):
        cap = cv2.VideoCapture(video_path)
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            yield frame, self.detect(frame)
        cap.release()

Core Concepts

CV Task Selection Guide

Task	Model Family	Output	Use Case
Classification	ResNet, EfficientNet, ViT	Class label + confidence	Product categorization
Object Detection	YOLOv8, DETR, Faster R-CNN	Bounding boxes + labels	Inventory counting
Segmentation	SAM, Mask R-CNN, U-Net	Pixel-level masks	Medical imaging
Pose Estimation	MediaPipe, HRNet	Keypoint coordinates	Fitness tracking
OCR	PaddleOCR, TrOCR	Text strings + positions	Document processing
Tracking	ByteTrack, DeepSORT	Object IDs across frames	Video surveillance

Data Augmentation Pipeline


import albumentations as A
from albumentations.pytorch import ToTensorV2

train_transform = A.Compose([
    A.RandomResizedCrop(640, 640, scale=(0.5, 1.0)),
    A.HorizontalFlip(p=0.5),
    A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1),
    A.GaussNoise(var_limit=(10, 50), p=0.3),
    A.GaussianBlur(blur_limit=(3, 7), p=0.2),
    A.RandomShadow(p=0.2),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2(),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

# Validation: only resize and normalize — no augmentation
val_transform = A.Compose([
    A.Resize(640, 640),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2(),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

Model Export and Optimization


# Export PyTorch model to ONNX for production inference
import torch
import onnxruntime as ort

def export_to_onnx(model, input_shape, output_path):
    model.eval()
    dummy = torch.randn(1, *input_shape)
    torch.onnx.export(
        model, dummy, output_path,
        input_names=['image'],
        output_names=['detections'],
        dynamic_axes={'image': {0: 'batch'}, 'detections': {0: 'batch'}},
        opset_version=17,
    )

# Optimized ONNX Runtime inference
class ONNXDetector:
    def __init__(self, model_path: str):
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
        self.session = ort.InferenceSession(model_path, providers=providers)
        self.input_name = self.session.get_inputs()[0].name

    def predict(self, image: np.ndarray) -> np.ndarray:
        blob = cv2.dnn.blobFromImage(image, 1/255.0, (640, 640), swapRB=True)
        return self.session.run(None, {self.input_name: blob})[0]

Configuration

Parameter	Type	Default	Description
`modelFramework`	string	`'ultralytics'`	Framework: ultralytics, detectron2, or mmdet
`inferenceBackend`	string	`'onnxruntime'`	Runtime: onnxruntime, tensorrt, or torch
`inputResolution`	number	`640`	Model input resolution (pixels)
`confidenceThreshold`	number	`0.5`	Minimum detection confidence
`nmsThreshold`	number	`0.45`	Non-maximum suppression IoU threshold
`batchSize`	number	`1`	Inference batch size

Best Practices

Start with pretrained models and fine-tune — Training from scratch requires massive datasets and compute. Start with COCO or ImageNet pretrained weights and fine-tune on your domain-specific data. Even 100-500 labeled images can yield strong results with transfer learning.
Profile inference end-to-end, not just model forward pass — Preprocessing (resize, normalize) and postprocessing (NMS, coordinate transform) often dominate total latency. Optimize the full pipeline, not just the model. Batch preprocessing on GPU when possible.
Version datasets alongside models — Every model checkpoint should reference the exact dataset version, augmentation config, and training hyperparameters used to produce it. Use DVC or a similar tool for dataset versioning.
Test with edge cases and adversarial conditions — Models perform well on test sets that match training distribution. Test explicitly with: poor lighting, motion blur, occlusion, unusual angles, and out-of-distribution objects. These conditions dominate real-world failure modes.
Monitor model performance in production with ground truth sampling — Aggregate metrics (accuracy, mAP) hide distribution shifts. Periodically sample predictions, compare against human labels, and track performance per class and per condition over time.

Common Issues

Model accuracy drops in production vs test set — Training and production data distributions differ. Analyze failure cases by category: lighting, angle, resolution, object size. Add targeted augmentations or collect more training data for underperforming conditions.

Inference too slow for real-time video — Export to ONNX or TensorRT for 2-5x speedup over PyTorch. Reduce input resolution (640→320) if accuracy permits. Use frame skipping (process every 3rd frame) with tracking to interpolate between detections.

GPU memory exhaustion during training — Reduce batch size, use gradient accumulation, or enable mixed precision training (torch.cuda.amp). For very large models, use gradient checkpointing to trade compute for memory. Monitor GPU memory with torch.cuda.memory_summary().

⚠️ Loading Issue

Ultimate Senior Computer Vision

Ultimate Senior Computer Vision

When to Use This Skill

Quick Start

Core Concepts

CV Task Selection Guide

Data Augmentation Pipeline

Model Export and Optimization

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace