C

Complete Spreadsheet Automation Toolkit

Boost productivity with intelligent spreadsheet manipulation with formulas, charts, and data transforms. Built for Claude Code with best practices and real-world patterns.

SkillCommunitydevelopmentv1.0.0MIT
0 views0 copies

Spreadsheet Automation

A data processing skill for programmatically creating, reading, and manipulating Excel and CSV spreadsheets using libraries like xlsx, openpyxl, and pandas for report generation and data pipelines.

When to Use

Choose Spreadsheet Automation when:

  • Generating Excel reports with formatting, charts, and multiple sheets
  • Processing and transforming CSV/Excel data in automated pipelines
  • Building data export features that produce formatted spreadsheets
  • Automating repetitive spreadsheet operations like merging and reformatting

Consider alternatives when:

  • Interactive data analysis — use Jupyter notebooks with pandas
  • Database operations — use SQL queries directly
  • Real-time dashboards — use a BI tool like Metabase or Grafana

Quick Start

# Python pip install openpyxl pandas xlsxwriter # Node.js npm install xlsx exceljs
import pandas as pd from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment, Border, Side from openpyxl.chart import BarChart, Reference def create_sales_report(data, output_path): wb = Workbook() # Summary sheet ws = wb.active ws.title = "Summary" # Header styling header_font = Font(name='Calibri', size=12, bold=True, color='FFFFFF') header_fill = PatternFill(start_color='4472C4', fill_type='solid') thin_border = Border( left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin') ) # Write headers headers = ['Region', 'Q1 Sales', 'Q2 Sales', 'Q3 Sales', 'Q4 Sales', 'Total'] for col, header in enumerate(headers, 1): cell = ws.cell(row=1, column=col, value=header) cell.font = header_font cell.fill = header_fill cell.alignment = Alignment(horizontal='center') cell.border = thin_border # Write data for row_idx, row_data in enumerate(data, 2): for col_idx, value in enumerate(row_data, 1): cell = ws.cell(row=row_idx, column=col_idx, value=value) cell.border = thin_border if col_idx > 1: cell.number_format = '$#,##0.00' # Add totals formula last_row = len(data) + 1 for col in range(2, 6): cell = ws.cell(row=last_row + 1, column=col) cell.value = f'=SUM({cell.column_letter}2:{cell.column_letter}{last_row})' cell.font = Font(bold=True) cell.number_format = '$#,##0.00' # Add chart chart = BarChart() chart.title = "Quarterly Sales by Region" chart.y_axis.title = "Revenue ($)" categories = Reference(ws, min_col=1, min_row=2, max_row=last_row) values = Reference(ws, min_col=2, max_col=5, min_row=1, max_row=last_row) chart.add_data(values, titles_from_data=True) chart.set_categories(categories) chart.shape = 4 ws.add_chart(chart, "A" + str(last_row + 4)) # Auto-fit columns for col in ws.columns: max_length = max(len(str(cell.value or '')) for cell in col) ws.column_dimensions[col[0].column_letter].width = max_length + 4 wb.save(output_path)

Core Concepts

Library Comparison

LibraryLanguageReadWriteFormattingCharts
openpyxlPythonYesYesFullYes
xlsxwriterPythonNoYesFullYes
pandasPythonYesYesBasicNo
ExcelJSNode.jsYesYesFullNo
SheetJSNode.jsYesYesBasicNo

Data Pipeline with Pandas

import pandas as pd class SpreadsheetPipeline: def read_and_clean(self, filepath): """Read Excel and clean data""" df = pd.read_excel(filepath, sheet_name=0) df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_') df = df.dropna(subset=['email']) df['date'] = pd.to_datetime(df['date'], errors='coerce') return df def transform(self, df): """Apply business logic transformations""" df['revenue'] = df['quantity'] * df['unit_price'] df['month'] = df['date'].dt.to_period('M') summary = df.groupby('month').agg( total_revenue=('revenue', 'sum'), order_count=('order_id', 'nunique'), avg_order_value=('revenue', 'mean') ).reset_index() return summary def export(self, dataframes, output_path): """Export multiple DataFrames to sheets""" with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer: for name, df in dataframes.items(): df.to_excel(writer, sheet_name=name, index=False) worksheet = writer.sheets[name] for i, col in enumerate(df.columns): max_len = max(df[col].astype(str).map(len).max(), len(col)) + 2 worksheet.set_column(i, i, max_len)

Configuration

OptionDescriptionDefault
engineExcel engine: openpyxl, xlsxwriter"openpyxl"
sheet_nameDefault sheet name"Sheet1"
header_styleHeader row stylingBold, colored fill
number_formatDefault number format"#,##0.00"
date_formatDefault date format"YYYY-MM-DD"
auto_filterAdd auto-filter to headerstrue
freeze_panesFreeze header rowtrue
column_widthAuto-fit column widthstrue

Best Practices

  1. Use pandas for data processing and openpyxl/xlsxwriter for formatting — pandas excels at data manipulation while the dedicated Excel libraries handle formatting, charts, and complex layouts
  2. Freeze the header row and enable auto-filters on all data tables so recipients can sort and filter data without manual setup
  3. Apply consistent number formatting for currency, percentages, and dates using Excel format codes rather than formatting values as strings — this preserves the data type for recipients who need to do calculations
  4. Handle large datasets in chunks by reading and writing in batches to avoid memory issues; pandas' chunksize parameter and openpyxl's write_only mode support streaming large files
  5. Name sheets descriptively and add a summary/overview sheet as the first tab so recipients understand the workbook structure immediately

Common Issues

Excel file corruption with mixed engines: Using pandas with one engine to write and openpyxl to modify the same file can corrupt it. Choose one engine for the entire write pipeline, or save the pandas output first, then reopen it with openpyxl for formatting modifications.

Dates displaying as numbers: Excel stores dates as serial numbers, and reading them without date parsing returns integers. Specify parse_dates columns in pd.read_excel(), or format date columns explicitly with cell.number_format = 'YYYY-MM-DD' when writing.

Memory exhaustion with large files: Processing Excel files with millions of rows loads everything into memory. Use pd.read_excel(chunksize=10000) for reading large files, and xlsxwriter (which streams to disk) instead of openpyxl for writing large outputs.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates