Pro Data Workspace
Comprehensive skill designed for data, privacy, regulatory, compliance. Includes structured workflows, validation checks, and reusable patterns for enterprise communication.
Pro Data Workspace
A comprehensive skill for setting up and managing data analysis workspaces — covering environment configuration, dataset loading, exploratory data analysis workflows, visualization pipelines, and reproducible analysis notebooks.
When to Use This Skill
Choose Pro Data Workspace when you need to:
- Set up a complete data analysis environment from scratch
- Load, clean, and explore datasets systematically
- Build reproducible analysis pipelines with clear documentation
- Generate statistical summaries and visualizations
- Export analysis results in presentation-ready formats
Consider alternatives when:
- You need real-time data streaming (use a streaming pipeline skill)
- You're building ML models (use a machine learning skill)
- You need database administration (use a database management skill)
Quick Start
# Set up a data analysis workspace claude "Set up a Python data workspace for analyzing a CSV of e-commerce transactions. Include pandas, visualization, and export to Excel."
# workspace_setup.py import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from pathlib import Path # Configure workspace WORKSPACE = Path("./analysis") WORKSPACE.mkdir(exist_ok=True) DATA_DIR = WORKSPACE / "data" OUTPUT_DIR = WORKSPACE / "output" DATA_DIR.mkdir(exist_ok=True) OUTPUT_DIR.mkdir(exist_ok=True) # Load and inspect data df = pd.read_csv("transactions.csv", parse_dates=["order_date"]) print(f"Shape: {df.shape}") print(f"Columns: {list(df.columns)}") print(f"Date range: {df['order_date'].min()} to {df['order_date'].max()}") print(f"\nMissing values:\n{df.isnull().sum()}") # Quick EDA summary = df.describe(include="all") summary.to_excel(OUTPUT_DIR / "summary_statistics.xlsx") # Revenue by category revenue = df.groupby("category")["revenue"].sum().sort_values(ascending=False) fig, ax = plt.subplots(figsize=(10, 6)) revenue.plot(kind="bar", ax=ax, color="#2563EB") ax.set_title("Revenue by Category") ax.set_ylabel("Revenue ($)") plt.tight_layout() plt.savefig(OUTPUT_DIR / "revenue_by_category.png", dpi=150)
Core Concepts
Workspace Structure
| Directory | Purpose | Contents |
|---|---|---|
data/ | Raw and processed datasets | CSV, Parquet, JSON files |
notebooks/ | Jupyter analysis notebooks | .ipynb files |
scripts/ | Reusable analysis scripts | .py files |
output/ | Charts, reports, exports | PNG, XLSX, PDF files |
config/ | Environment and parameter files | .yaml, .env files |
Data Loading Patterns
# Multi-format data loading import pandas as pd loaders = { ".csv": lambda f: pd.read_csv(f), ".xlsx": lambda f: pd.read_excel(f), ".json": lambda f: pd.read_json(f), ".parquet": lambda f: pd.read_parquet(f), } def load_data(filepath): ext = Path(filepath).suffix.lower() loader = loaders.get(ext) if not loader: raise ValueError(f"Unsupported format: {ext}") df = loader(filepath) print(f"Loaded {len(df)} rows, {len(df.columns)} columns from {filepath}") return df # Data profiling helper def profile(df): return pd.DataFrame({ "dtype": df.dtypes, "non_null": df.count(), "null_pct": (df.isnull().sum() / len(df) * 100).round(1), "unique": df.nunique(), "sample": df.iloc[0], })
Visualization Pipeline
# Reusable chart configuration import matplotlib.pyplot as plt import seaborn as sns def setup_style(): sns.set_theme(style="whitegrid") plt.rcParams.update({ "figure.figsize": (10, 6), "figure.dpi": 150, "font.size": 12, "axes.titlesize": 14, "axes.labelsize": 12, }) def save_chart(fig, name, output_dir="./output"): path = Path(output_dir) / f"{name}.png" fig.savefig(path, bbox_inches="tight", dpi=150) print(f"Saved: {path}") plt.close(fig)
Configuration
| Parameter | Description | Example |
|---|---|---|
data_source | Input file path or URL | "./data/sales.csv" |
date_column | Column to parse as datetime | "order_date" |
output_format | Export format for results | "xlsx" / "csv" |
chart_style | Seaborn/matplotlib theme | "whitegrid" |
dpi | Chart resolution for exports | 150 |
workspace_dir | Root directory for analysis files | "./analysis" |
Best Practices
-
Separate raw data from processed data — Never modify source files. Load raw data, transform in memory, and save processed versions to a separate directory. This lets you rerun analysis from scratch when requirements change.
-
Profile every dataset before analysis — Run null counts, dtype checks, and value distributions before writing any analysis code. Five minutes of profiling saves hours of debugging malformed data downstream.
-
Use consistent naming for output files — Name outputs with the analysis date and a descriptive tag:
2024-12-15_revenue_by_category.png. When you generate dozens of charts, timestamps prevent confusion about which version is current. -
Pin your dependencies with exact versions — Create a
requirements.txtwith pinned versions (pandas==2.1.4notpandas>=2.0). Analysis that can't be reproduced six months later has limited value for auditing or extending. -
Document assumptions inline with code — Add comments explaining why you filtered rows, chose specific date ranges, or excluded outliers. The code shows what you did; comments explain why, which is critical for peer review.
Common Issues
Memory errors on large datasets — Loading a 5GB CSV into pandas on a 16GB machine will fail. Use dtype specifications to reduce memory (e.g., category for string columns with few unique values), load in chunks with chunksize, or switch to Polars/DuckDB for larger-than-memory datasets.
Date parsing fails silently — Pandas may load dates as strings without error if the format doesn't match expectations. Always pass parse_dates=["col"] explicitly and verify with df["col"].dtype. Mixed date formats in a single column need pd.to_datetime(df["col"], format="mixed").
Charts look different in exports vs notebooks — Matplotlib renders differently depending on the backend (inline notebook vs. file export). Always call plt.tight_layout() before saving, set explicit figsize and dpi, and preview the saved PNG file rather than relying on notebook rendering for final output.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.