P

Pro Stable Workspace

Enterprise-grade skill for skill, reinforcement, learning, tasks. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Pro Stable Workspace

Design and manage stable, reproducible computational environments for scientific computing and data analysis. This skill covers virtual environment management, dependency locking, containerization with Docker, CI/CD pipeline configuration, and workspace reproducibility across teams and compute platforms.

When to Use This Skill

Choose Pro Stable Workspace when you need to:

  • Create reproducible Python environments with locked dependencies for published research
  • Containerize analysis pipelines for consistent execution across local, cloud, and HPC systems
  • Set up CI/CD pipelines to automatically test and validate computational workflows
  • Manage multi-project workspaces with isolated, conflict-free dependency trees

Consider alternatives when:

  • You need package distribution and publishing (use setuptools or Poetry's publish features)
  • You need GPU cluster job scheduling (use Slurm or Kubernetes directly)
  • You need notebook-centric reproducibility (use Binder or Google Colab)

Quick Start

# Install environment management tools pip install pipx pipx install poetry pipx install pipenv
# workspace_setup.py — Automated workspace initialization import subprocess import json import os from pathlib import Path def create_stable_workspace(project_name, python_version="3.11"): """Create a fully reproducible project workspace.""" project_dir = Path(project_name) project_dir.mkdir(exist_ok=True) # Initialize with Poetry subprocess.run([ "poetry", "init", "--name", project_name, "--python", f"^{python_version}", "--no-interaction" ], cwd=project_dir) # Create standard directories for subdir in ["src", "tests", "data", "notebooks", "configs", "outputs"]: (project_dir / subdir).mkdir(exist_ok=True) # Create .gitignore gitignore = """ __pycache__/ *.pyc .venv/ dist/ *.egg-info/ data/raw/ outputs/ .env *.lock !poetry.lock """ (project_dir / ".gitignore").write_text(gitignore.strip()) # Create environment config env_config = { "project": project_name, "python_version": python_version, "created": str(Path().resolve()), } (project_dir / "configs" / "environment.json").write_text( json.dumps(env_config, indent=2) ) print(f"Workspace '{project_name}' created successfully") print(f" cd {project_name} && poetry install") create_stable_workspace("my_analysis")

Core Concepts

Environment Tool Comparison

ToolDependency ResolutionLock FileVirtual EnvBest For
pip + venvBasicrequirements.txtManualSimple projects
PoetryAdvanced (SAT solver)poetry.lockAutomaticPython packages
CondaCross-languageconda-lock.ymlAutomaticScientific computing
PipenvGoodPipfile.lockAutomaticApplications
uvVery fastuv.lockAutomaticSpeed-critical
DockerN/A (containerized)DockerfileIsolated OSFull reproducibility

Docker-Based Reproducible Pipeline

# Dockerfile for reproducible analysis FROM python:3.11-slim # System dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential git curl && \ rm -rf /var/lib/apt/lists/* WORKDIR /workspace # Install Python dependencies (cached layer) COPY pyproject.toml poetry.lock ./ RUN pip install poetry && \ poetry config virtualenvs.create false && \ poetry install --no-dev --no-interaction # Copy project code COPY src/ ./src/ COPY configs/ ./configs/ # Run analysis ENTRYPOINT ["python", "-m", "src.main"]
# src/reproducibility.py — Capture full environment state import subprocess import platform import sys import json from datetime import datetime from pathlib import Path def capture_environment_snapshot(output_path="environment_snapshot.json"): """Record complete environment state for reproducibility.""" snapshot = { "timestamp": datetime.now().isoformat(), "python_version": sys.version, "platform": platform.platform(), "architecture": platform.architecture()[0], "packages": {}, "git_hash": None, } # Installed packages result = subprocess.run( [sys.executable, "-m", "pip", "list", "--format=json"], capture_output=True, text=True ) packages = json.loads(result.stdout) snapshot["packages"] = {p["name"]: p["version"] for p in packages} # Git state try: git_hash = subprocess.run( ["git", "rev-parse", "HEAD"], capture_output=True, text=True, check=True ).stdout.strip() snapshot["git_hash"] = git_hash git_dirty = subprocess.run( ["git", "status", "--porcelain"], capture_output=True, text=True ).stdout.strip() snapshot["git_dirty"] = bool(git_dirty) except (subprocess.CalledProcessError, FileNotFoundError): pass Path(output_path).write_text(json.dumps(snapshot, indent=2)) print(f"Environment snapshot saved: {len(snapshot['packages'])} packages") return snapshot capture_environment_snapshot()

Configuration

ParameterDescriptionDefault
python_versionTarget Python version"3.11"
env_toolEnvironment manager (poetry, conda, pipenv)"poetry"
lock_fileGenerate lock file for exact versionstrue
docker_baseDocker base image"python:3.11-slim"
ci_platformCI/CD service (github-actions, gitlab-ci)"github-actions"
cache_depsCache dependencies in CItrue
git_hooksEnable pre-commit hookstrue
snapshot_on_runAuto-capture environment on executiontrue

Best Practices

  1. Always commit lock files to version control — Lock files (poetry.lock, Pipfile.lock, conda-lock.yml) pin exact dependency versions including transitive dependencies. Without them, pip install -r requirements.txt may install different versions on different machines or dates, breaking reproducibility.

  2. Pin Python version in project configuration — Specify the exact Python minor version (e.g., python = "^3.11") in your project config. Python patch versions rarely break compatibility, but minor versions (3.10 vs 3.11) can have different behavior. Document the Python version in your README and CI config.

  3. Separate development and production dependencies — Use [tool.poetry.group.dev.dependencies] for testing frameworks, linters, and notebooks that aren't needed in production. This keeps production images smaller and reduces dependency conflicts.

  4. Capture environment snapshots alongside results — Every time you generate results (figures, tables, model weights), save a JSON snapshot of the environment (Python version, all package versions, git hash, git dirty status). This lets anyone reconstruct the exact state that produced specific results.

  5. Use multi-stage Docker builds to minimize image size — Build dependencies in a first stage with build tools, then copy only the installed packages to a slim runtime stage. This reduces image size from 1-2 GB to 200-400 MB and eliminates build tools from the production image.

Common Issues

"Package version conflict" during installation — Two dependencies require incompatible versions of a shared transitive dependency. Use poetry show --tree or pipdeptree to identify the conflict, then constrain the problematic package version or find alternative packages. Conda's solver handles cross-language conflicts better than pip.

Docker build fails with "Could not find a version that satisfies the requirement" — The Docker build context can't access private registries or local packages. Use --build-arg for credentials, mount private keys with --secret, or copy wheels into the build context. Also check that the base image architecture matches your host.

CI pipeline passes locally but fails in the cloud — Environment differences between local and CI: different OS (macOS vs Linux), missing system libraries, or stale caches. Use the same Docker image locally that CI uses, or add a make ci-local target that runs CI checks in Docker on your machine before pushing.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates