C

Comprehensive Research Engineer

Comprehensive skill designed for uncompromising, academic, research, engineer. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Comprehensive Research Engineer

Senior research engineering methodology for bridging theoretical computer science and high-performance implementation — emphasizing scientific rigor, reproducible experiments, and production-quality research code.

When to Use

Apply this methodology when:

  • Implementing ML/AI research papers from scratch
  • Running reproducible experiments with statistical rigor
  • Building research prototypes that need to scale to production
  • Reviewing and critiquing research claims with empirical evidence

Use standard engineering practices when:

  • Building standard application features
  • Tasks without research or experimental components
  • Well-established patterns with known solutions

Quick Start

Experiment Template

import json import hashlib from dataclasses import dataclass, asdict from datetime import datetime from pathlib import Path @dataclass class ExperimentConfig: name: str model: str dataset: str learning_rate: float batch_size: int epochs: int seed: int = 42 @property def experiment_id(self): config_str = json.dumps(asdict(self), sort_keys=True) return hashlib.sha256(config_str.encode()).hexdigest()[:12] class Experiment: def __init__(self, config: ExperimentConfig): self.config = config self.output_dir = Path(f"experiments/{config.experiment_id}") self.output_dir.mkdir(parents=True, exist_ok=True) self._save_config() def _save_config(self): with open(self.output_dir / "config.json", "w") as f: json.dump(asdict(self.config), f, indent=2) def run(self): self._set_seeds(self.config.seed) metrics = self._train() self._save_results(metrics) return metrics def _set_seeds(self, seed): import random, numpy as np, torch random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed) def _save_results(self, metrics): results = { "config": asdict(self.config), "metrics": metrics, "timestamp": datetime.now().isoformat(), } with open(self.output_dir / "results.json", "w") as f: json.dump(results, f, indent=2)

Paper Implementation Checklist

## Paper: {paper_title} - [ ] Read paper 3 times (overview, details, critique) - [ ] Identify key claims and expected results - [ ] List all hyperparameters mentioned - [ ] Find reference implementation (if exists) - [ ] Implement core algorithm - [ ] Reproduce Table 1 / Figure 1 results - [ ] Run ablation studies - [ ] Document deviations from paper - [ ] Statistical significance testing (3+ seeds)

Core Concepts

Scientific Rigor Principles

PrincipleImplementationWhy It Matters
ReproducibilityFixed seeds, versioned data, logged configsOthers must replicate your results
Statistical validityMultiple runs, confidence intervalsSingle runs are noise
Fair comparisonSame compute budget, same data splitsApples-to-apples only
AblationChange one variable at a timeIsolate what actually helps
DocumentationLog everything, explain decisionsFuture you will forget

Experiment Management

experiments/
  ├── {experiment_id}/
  │   ├── config.json          # Full configuration
  │   ├── results.json         # Metrics and outputs
  │   ├── logs/                # Training logs
  │   ├── checkpoints/         # Model checkpoints
  │   └── analysis/            # Post-hoc analysis
  ├── comparisons/
  │   └── {baseline_vs_method}.json
  └── paper_results/
      └── {table_or_figure}.json

Statistical Testing

import numpy as np from scipy import stats def compare_methods(baseline_scores, method_scores, alpha=0.05): """Statistical comparison of two methods.""" t_stat, p_value = stats.ttest_ind(baseline_scores, method_scores) return { "baseline_mean": np.mean(baseline_scores), "baseline_std": np.std(baseline_scores), "method_mean": np.mean(method_scores), "method_std": np.std(method_scores), "t_statistic": t_stat, "p_value": p_value, "significant": p_value < alpha, "effect_size": (np.mean(method_scores) - np.mean(baseline_scores)) / np.std(baseline_scores) }

Configuration

ParameterDescription
seedRandom seed for reproducibility
num_runsRuns per configuration (minimum 3)
confidence_levelStatistical significance threshold (0.05)
checkpoint_intervalSteps between model saves
log_intervalSteps between metric logging
wandb_projectExperiment tracking project

Best Practices

  1. Run every experiment at least 3 times with different seeds — report mean and standard deviation
  2. Log everything — configs, git commit, library versions, hardware specs
  3. Compare fairly — same compute budget, data splits, and preprocessing
  4. Ablate systematically — change one thing at a time to understand contributions
  5. Read the paper three times before implementing — overview, then details, then critique
  6. Version your datasets — model results are meaningless without knowing the exact data

Common Issues

Cannot reproduce paper results: Check for undocumented hyperparameters (warmup, gradient clipping, weight decay). Try the reference implementation if available. Contact authors — they may have errata or unreported details.

High variance across runs: Increase number of runs. Check for non-deterministic operations (dropout, data shuffling). Use larger evaluation sets.

Experiment tracking chaos: Use a structured directory layout. Never overwrite results — create new experiment IDs. Use tools like Weights & Biases or MLflow for tracking.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates