Complete Spreadsheet Automation Toolkit
Boost productivity with intelligent spreadsheet manipulation with formulas, charts, and data transforms. Built for Claude Code with best practices and real-world patterns.
Spreadsheet Automation
A data processing skill for programmatically creating, reading, and manipulating Excel and CSV spreadsheets using libraries like xlsx, openpyxl, and pandas for report generation and data pipelines.
When to Use
Choose Spreadsheet Automation when:
- Generating Excel reports with formatting, charts, and multiple sheets
- Processing and transforming CSV/Excel data in automated pipelines
- Building data export features that produce formatted spreadsheets
- Automating repetitive spreadsheet operations like merging and reformatting
Consider alternatives when:
- Interactive data analysis — use Jupyter notebooks with pandas
- Database operations — use SQL queries directly
- Real-time dashboards — use a BI tool like Metabase or Grafana
Quick Start
# Python pip install openpyxl pandas xlsxwriter # Node.js npm install xlsx exceljs
import pandas as pd from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment, Border, Side from openpyxl.chart import BarChart, Reference def create_sales_report(data, output_path): wb = Workbook() # Summary sheet ws = wb.active ws.title = "Summary" # Header styling header_font = Font(name='Calibri', size=12, bold=True, color='FFFFFF') header_fill = PatternFill(start_color='4472C4', fill_type='solid') thin_border = Border( left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin') ) # Write headers headers = ['Region', 'Q1 Sales', 'Q2 Sales', 'Q3 Sales', 'Q4 Sales', 'Total'] for col, header in enumerate(headers, 1): cell = ws.cell(row=1, column=col, value=header) cell.font = header_font cell.fill = header_fill cell.alignment = Alignment(horizontal='center') cell.border = thin_border # Write data for row_idx, row_data in enumerate(data, 2): for col_idx, value in enumerate(row_data, 1): cell = ws.cell(row=row_idx, column=col_idx, value=value) cell.border = thin_border if col_idx > 1: cell.number_format = '$#,##0.00' # Add totals formula last_row = len(data) + 1 for col in range(2, 6): cell = ws.cell(row=last_row + 1, column=col) cell.value = f'=SUM({cell.column_letter}2:{cell.column_letter}{last_row})' cell.font = Font(bold=True) cell.number_format = '$#,##0.00' # Add chart chart = BarChart() chart.title = "Quarterly Sales by Region" chart.y_axis.title = "Revenue ($)" categories = Reference(ws, min_col=1, min_row=2, max_row=last_row) values = Reference(ws, min_col=2, max_col=5, min_row=1, max_row=last_row) chart.add_data(values, titles_from_data=True) chart.set_categories(categories) chart.shape = 4 ws.add_chart(chart, "A" + str(last_row + 4)) # Auto-fit columns for col in ws.columns: max_length = max(len(str(cell.value or '')) for cell in col) ws.column_dimensions[col[0].column_letter].width = max_length + 4 wb.save(output_path)
Core Concepts
Library Comparison
| Library | Language | Read | Write | Formatting | Charts |
|---|---|---|---|---|---|
| openpyxl | Python | Yes | Yes | Full | Yes |
| xlsxwriter | Python | No | Yes | Full | Yes |
| pandas | Python | Yes | Yes | Basic | No |
| ExcelJS | Node.js | Yes | Yes | Full | No |
| SheetJS | Node.js | Yes | Yes | Basic | No |
Data Pipeline with Pandas
import pandas as pd class SpreadsheetPipeline: def read_and_clean(self, filepath): """Read Excel and clean data""" df = pd.read_excel(filepath, sheet_name=0) df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_') df = df.dropna(subset=['email']) df['date'] = pd.to_datetime(df['date'], errors='coerce') return df def transform(self, df): """Apply business logic transformations""" df['revenue'] = df['quantity'] * df['unit_price'] df['month'] = df['date'].dt.to_period('M') summary = df.groupby('month').agg( total_revenue=('revenue', 'sum'), order_count=('order_id', 'nunique'), avg_order_value=('revenue', 'mean') ).reset_index() return summary def export(self, dataframes, output_path): """Export multiple DataFrames to sheets""" with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer: for name, df in dataframes.items(): df.to_excel(writer, sheet_name=name, index=False) worksheet = writer.sheets[name] for i, col in enumerate(df.columns): max_len = max(df[col].astype(str).map(len).max(), len(col)) + 2 worksheet.set_column(i, i, max_len)
Configuration
| Option | Description | Default |
|---|---|---|
engine | Excel engine: openpyxl, xlsxwriter | "openpyxl" |
sheet_name | Default sheet name | "Sheet1" |
header_style | Header row styling | Bold, colored fill |
number_format | Default number format | "#,##0.00" |
date_format | Default date format | "YYYY-MM-DD" |
auto_filter | Add auto-filter to headers | true |
freeze_panes | Freeze header row | true |
column_width | Auto-fit column widths | true |
Best Practices
- Use pandas for data processing and openpyxl/xlsxwriter for formatting — pandas excels at data manipulation while the dedicated Excel libraries handle formatting, charts, and complex layouts
- Freeze the header row and enable auto-filters on all data tables so recipients can sort and filter data without manual setup
- Apply consistent number formatting for currency, percentages, and dates using Excel format codes rather than formatting values as strings — this preserves the data type for recipients who need to do calculations
- Handle large datasets in chunks by reading and writing in batches to avoid memory issues; pandas'
chunksizeparameter and openpyxl'swrite_onlymode support streaming large files - Name sheets descriptively and add a summary/overview sheet as the first tab so recipients understand the workbook structure immediately
Common Issues
Excel file corruption with mixed engines: Using pandas with one engine to write and openpyxl to modify the same file can corrupt it. Choose one engine for the entire write pipeline, or save the pandas output first, then reopen it with openpyxl for formatting modifications.
Dates displaying as numbers: Excel stores dates as serial numbers, and reading them without date parsing returns integers. Specify parse_dates columns in pd.read_excel(), or format date columns explicitly with cell.number_format = 'YYYY-MM-DD' when writing.
Memory exhaustion with large files: Processing Excel files with millions of rows loads everything into memory. Use pd.read_excel(chunksize=10000) for reading large files, and xlsxwriter (which streams to disk) instead of openpyxl for writing large outputs.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.