Spreadsheet Automation

A data processing skill for programmatically creating, reading, and manipulating Excel and CSV spreadsheets using libraries like xlsx, openpyxl, and pandas for report generation and data pipelines.

When to Use

Choose Spreadsheet Automation when:

Generating Excel reports with formatting, charts, and multiple sheets
Processing and transforming CSV/Excel data in automated pipelines
Building data export features that produce formatted spreadsheets
Automating repetitive spreadsheet operations like merging and reformatting

Consider alternatives when:

Interactive data analysis — use Jupyter notebooks with pandas
Database operations — use SQL queries directly
Real-time dashboards — use a BI tool like Metabase or Grafana

Quick Start


# Python
pip install openpyxl pandas xlsxwriter

# Node.js
npm install xlsx exceljs


import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.chart import BarChart, Reference

def create_sales_report(data, output_path):
    wb = Workbook()

    # Summary sheet
    ws = wb.active
    ws.title = "Summary"

    # Header styling
    header_font = Font(name='Calibri', size=12, bold=True, color='FFFFFF')
    header_fill = PatternFill(start_color='4472C4', fill_type='solid')
    thin_border = Border(
        left=Side(style='thin'), right=Side(style='thin'),
        top=Side(style='thin'), bottom=Side(style='thin')
    )

    # Write headers
    headers = ['Region', 'Q1 Sales', 'Q2 Sales', 'Q3 Sales', 'Q4 Sales', 'Total']
    for col, header in enumerate(headers, 1):
        cell = ws.cell(row=1, column=col, value=header)
        cell.font = header_font
        cell.fill = header_fill
        cell.alignment = Alignment(horizontal='center')
        cell.border = thin_border

    # Write data
    for row_idx, row_data in enumerate(data, 2):
        for col_idx, value in enumerate(row_data, 1):
            cell = ws.cell(row=row_idx, column=col_idx, value=value)
            cell.border = thin_border
            if col_idx > 1:
                cell.number_format = '$#,##0.00'

    # Add totals formula
    last_row = len(data) + 1
    for col in range(2, 6):
        cell = ws.cell(row=last_row + 1, column=col)
        cell.value = f'=SUM({cell.column_letter}2:{cell.column_letter}{last_row})'
        cell.font = Font(bold=True)
        cell.number_format = '$#,##0.00'

    # Add chart
    chart = BarChart()
    chart.title = "Quarterly Sales by Region"
    chart.y_axis.title = "Revenue ($)"
    categories = Reference(ws, min_col=1, min_row=2, max_row=last_row)
    values = Reference(ws, min_col=2, max_col=5, min_row=1, max_row=last_row)
    chart.add_data(values, titles_from_data=True)
    chart.set_categories(categories)
    chart.shape = 4
    ws.add_chart(chart, "A" + str(last_row + 4))

    # Auto-fit columns
    for col in ws.columns:
        max_length = max(len(str(cell.value or '')) for cell in col)
        ws.column_dimensions[col[0].column_letter].width = max_length + 4

    wb.save(output_path)

Core Concepts

Library Comparison

Library	Language	Read	Write	Formatting	Charts
openpyxl	Python	Yes	Yes	Full	Yes
xlsxwriter	Python	No	Yes	Full	Yes
pandas	Python	Yes	Yes	Basic	No
ExcelJS	Node.js	Yes	Yes	Full	No
SheetJS	Node.js	Yes	Yes	Basic	No

Data Pipeline with Pandas


import pandas as pd

class SpreadsheetPipeline:
    def read_and_clean(self, filepath):
        """Read Excel and clean data"""
        df = pd.read_excel(filepath, sheet_name=0)
        df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
        df = df.dropna(subset=['email'])
        df['date'] = pd.to_datetime(df['date'], errors='coerce')
        return df

    def transform(self, df):
        """Apply business logic transformations"""
        df['revenue'] = df['quantity'] * df['unit_price']
        df['month'] = df['date'].dt.to_period('M')
        summary = df.groupby('month').agg(
            total_revenue=('revenue', 'sum'),
            order_count=('order_id', 'nunique'),
            avg_order_value=('revenue', 'mean')
        ).reset_index()
        return summary

    def export(self, dataframes, output_path):
        """Export multiple DataFrames to sheets"""
        with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer:
            for name, df in dataframes.items():
                df.to_excel(writer, sheet_name=name, index=False)
                worksheet = writer.sheets[name]
                for i, col in enumerate(df.columns):
                    max_len = max(df[col].astype(str).map(len).max(), len(col)) + 2
                    worksheet.set_column(i, i, max_len)

Configuration

Option	Description	Default
`engine`	Excel engine: openpyxl, xlsxwriter	`"openpyxl"`
`sheet_name`	Default sheet name	`"Sheet1"`
`header_style`	Header row styling	Bold, colored fill
`number_format`	Default number format	`"#,##0.00"`
`date_format`	Default date format	`"YYYY-MM-DD"`
`auto_filter`	Add auto-filter to headers	`true`
`freeze_panes`	Freeze header row	`true`
`column_width`	Auto-fit column widths	`true`

Best Practices

Use pandas for data processing and openpyxl/xlsxwriter for formatting — pandas excels at data manipulation while the dedicated Excel libraries handle formatting, charts, and complex layouts
Freeze the header row and enable auto-filters on all data tables so recipients can sort and filter data without manual setup
Apply consistent number formatting for currency, percentages, and dates using Excel format codes rather than formatting values as strings — this preserves the data type for recipients who need to do calculations
Handle large datasets in chunks by reading and writing in batches to avoid memory issues; pandas' chunksize parameter and openpyxl's write_only mode support streaming large files
Name sheets descriptively and add a summary/overview sheet as the first tab so recipients understand the workbook structure immediately

Common Issues

Excel file corruption with mixed engines: Using pandas with one engine to write and openpyxl to modify the same file can corrupt it. Choose one engine for the entire write pipeline, or save the pandas output first, then reopen it with openpyxl for formatting modifications.

Dates displaying as numbers: Excel stores dates as serial numbers, and reading them without date parsing returns integers. Specify parse_dates columns in pd.read_excel(), or format date columns explicitly with cell.number_format = 'YYYY-MM-DD' when writing.

Memory exhaustion with large files: Processing Excel files with millions of rows loads everything into memory. Use pd.read_excel(chunksize=10000) for reading large files, and xlsxwriter (which streams to disk) instead of openpyxl for writing large outputs.

⚠️ Loading Issue

Complete Spreadsheet Automation Toolkit

Spreadsheet Automation

When to Use

Quick Start

Core Concepts

Library Comparison

Data Pipeline with Pandas

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace