S

Seaborn Expert

Production-ready skill that handles statistical, visualization, scatter, violin. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Seaborn Expert

Create publication-quality statistical visualizations with Seaborn, a Python library for informative, attractive plots from structured datasets. This skill covers categorical plots, distribution visualization, regression plots, matrix displays, multi-faceted figure grids, and custom styling for scientific publications.

When to Use This Skill

Choose Seaborn Expert when you need to:

  • Create statistical visualizations that automatically compute and display aggregations
  • Build multi-faceted plots (FacetGrid, PairGrid) for exploring relationships across subgroups
  • Produce polished publication figures with minimal boilerplate code
  • Visualize distributions, relationships, and categorical comparisons from pandas DataFrames

Consider alternatives when:

  • You need full low-level control over every plot element (use matplotlib directly)
  • You need interactive or web-based visualizations (use Plotly)
  • You need geographic or spatial visualizations (use GeoPandas or Folium)

Quick Start

pip install seaborn pandas matplotlib numpy
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np # Set publication style sns.set_theme(style="whitegrid", font_scale=1.2, rc={"figure.figsize": (8, 5), "savefig.dpi": 300}) # Load example dataset tips = sns.load_dataset("tips") # Faceted visualization g = sns.FacetGrid(tips, col="time", row="smoker", hue="sex", height=3.5, aspect=1.2, margin_titles=True) g.map_dataframe(sns.scatterplot, x="total_bill", y="tip", alpha=0.7) g.add_legend() g.set_axis_labels("Total Bill ($)", "Tip ($)") g.savefig("tips_facet.pdf") # Distribution comparison fig, axes = plt.subplots(1, 2, figsize=(10, 4)) sns.violinplot(data=tips, x="day", y="total_bill", hue="sex", split=True, inner="quart", ax=axes[0]) axes[0].set_title("Bill Distribution by Day") sns.boxenplot(data=tips, x="day", y="tip", palette="Set2", ax=axes[1]) axes[1].set_title("Tip Distribution by Day") fig.tight_layout() fig.savefig("distributions.pdf")

Core Concepts

Plot Type Reference

FunctionCategoryBest For
sns.scatterplot()RelationalTwo continuous variables
sns.lineplot()RelationalTrends over continuous axis
sns.histplot()DistributionSingle variable distribution
sns.kdeplot()DistributionSmooth density estimate
sns.ecdfplot()DistributionCumulative distribution
sns.boxplot()CategoricalQuartile comparison
sns.violinplot()CategoricalDistribution shape comparison
sns.barplot()CategoricalMean + confidence interval
sns.stripplot()CategoricalIndividual data points
sns.heatmap()MatrixCorrelation or expression matrices
sns.clustermap()MatrixHierarchically clustered heatmap
sns.lmplot()RegressionScatter + fitted regression line
sns.jointplot()JointBivariate + marginal distributions
sns.pairplot()MultiAll pairwise relationships

Advanced Multi-Panel Figures

import seaborn as sns import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import pandas as pd import numpy as np # Custom multi-panel figure penguins = sns.load_dataset("penguins").dropna() fig = plt.figure(figsize=(12, 8)) gs = gridspec.GridSpec(2, 3, hspace=0.35, wspace=0.35) # Panel A: Scatter with regression ax1 = fig.add_subplot(gs[0, :2]) sns.scatterplot(data=penguins, x="flipper_length_mm", y="body_mass_g", hue="species", style="sex", s=60, alpha=0.7, ax=ax1) ax1.set_title("Flipper Length vs Body Mass", fontweight='bold') # Panel B: Violin ax2 = fig.add_subplot(gs[0, 2]) sns.violinplot(data=penguins, y="bill_depth_mm", x="species", palette="Set2", inner="box", ax=ax2) ax2.set_title("Bill Depth", fontweight='bold') # Panel C: KDE ax3 = fig.add_subplot(gs[1, 0]) for species in penguins['species'].unique(): subset = penguins[penguins['species'] == species] sns.kdeplot(subset['bill_length_mm'], label=species, ax=ax3, fill=True, alpha=0.3) ax3.set_title("Bill Length Density", fontweight='bold') ax3.legend(fontsize=8) # Panel D: Heatmap correlation ax4 = fig.add_subplot(gs[1, 1:]) numeric_cols = penguins.select_dtypes(include='number') corr = numeric_cols.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='RdBu_r', center=0, square=True, linewidths=0.5, ax=ax4) ax4.set_title("Feature Correlations", fontweight='bold') # Panel labels for ax, label in zip([ax1, ax2, ax3, ax4], ['A', 'B', 'C', 'D']): ax.text(-0.1, 1.05, label, transform=ax.transAxes, fontsize=14, fontweight='bold') fig.savefig("penguin_analysis.pdf", bbox_inches='tight')

Configuration

ParameterDescriptionDefault
stylePlot background style (whitegrid, darkgrid, white, dark, ticks)"whitegrid"
contextScaling for different contexts (paper, notebook, talk, poster)"notebook"
font_scaleFont size multiplier1.0
paletteColor palette name or list"deep"
rcmatplotlib rcParams overrides{}
heightFacetGrid subplot height in inches5
aspectWidth:height ratio for FacetGrid1
ciConfidence interval size for aggregation plots95

Best Practices

  1. Set the theme once at the top of your script — Use sns.set_theme(style="whitegrid", context="paper", font_scale=1.1) before any plotting. This applies consistent styling to all subsequent plots. Use context="talk" for presentations and context="poster" for large-format output.

  2. Use figure-level functions for exploration, axes-level for publication — Figure-level functions (sns.lmplot, sns.catplot, FacetGrid) create their own figures and are great for quick exploration. For publication figures with custom GridSpec layouts, use axes-level functions (sns.scatterplot, sns.boxplot) that accept an ax parameter.

  3. Always pass data as a DataFrame with named columns — Seaborn is designed for structured data. Pass data=df with x="column_name" rather than raw arrays. This enables automatic axis labels, legends, and semantic grouping with hue, style, and size parameters.

  4. Choose the right plot for your data type — Continuous vs continuous: scatter/line. Continuous vs categorical: box/violin/strip. Distribution: histogram/KDE/ECDF. Matrix: heatmap/clustermap. Selecting the wrong plot type obscures patterns or misleads readers.

  5. Combine strip/swarm plots with box/violin for complete pictures — Aggregation plots (boxplot, barplot) hide individual data points. Overlay sns.stripplot(alpha=0.3) on top of sns.boxplot() to show both the distribution summary and raw observations, especially for small sample sizes (n < 30).

Common Issues

Legend overlaps with plot data — Use plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') to place the legend outside the plot area. For FacetGrid, use g.add_legend() which automatically positions it. If the legend is still problematic, move it to a separate call after all plotting.

Colors are inconsistent across subplots — When using hue across multiple separate plots, pass the same palette and hue_order to each plot call. Seaborn assigns colors based on the data present in each plot, so if a category is missing from one subplot, colors shift. Explicit hue_order prevents this.

Plot looks cramped with many categories on x-axis — Rotate tick labels with plt.xticks(rotation=45, ha='right') or switch to horizontal orientation using orient='h' in categorical plots. For more than 8-10 categories, consider using sns.catplot with col or row faceting instead of cramming everything into one panel.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates