Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.
Seaborn follows these core principles:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load example dataset
df = sns.load_dataset('tips')
# Create a simple visualization
sns.scatterplot(data=df, x='total_bill', y='tip', hue='day')
plt.show()
The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).
When to use:
The seaborn.objects interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.
When to use:
from seaborn import objects as so
# Declarative syntax
(
so.Plot(data=df, x='total_bill', y='tip')
.add(so.Dot(), color='day')
.add(so.Line(), so.PolyFit())
)
Use for: Exploring how two or more variables relate to each other
scatterplot() - Display individual observations as pointslineplot() - Show trends and changes (automatically aggregates and computes CI)relplot() - Figure-level interface with automatic facetingKey parameters:
x, y - Primary variableshue - Color encoding for additional categorical/continuous variablesize - Point/line size encodingstyle - Marker/line style encodingcol, row - Facet into multiple subplots (figure-level only)# Scatter with multiple semantic mappings
sns.scatterplot(data=df, x='total_bill', y='tip',
hue='time', size='size', style='sex')
# Line plot with confidence intervals
sns.lineplot(data=timeseries, x='date', y='value', hue='category')
# Faceted relational plot
sns.relplot(data=df, x='total_bill', y='tip',
col='time', row='sex', hue='smoker', kind='scatter')
Use for: Understanding data spread, shape, and probability density
histplot() - Bar-based frequency distributions with flexible binningkdeplot() - Smooth density estimates using Gaussian kernelsecdfplot() - Empirical cumulative distribution (no parameters to tune)rugplot() - Individual observation tick marksdisplot() - Figure-level interface for univariate and bivariate distributionsjointplot() - Bivariate plot with marginal distributionspairplot() - Matrix of pairwise relationships across datasetKey parameters:
x, y - Variables (y optional for univariate)hue - Separate distributions by categorystat - Normalization: "count", "frequency", "probability", "density"bins / binwidth - Histogram binning controlbw_adjust - KDE bandwidth multiplier (higher = smoother)fill - Fill area under curvemultiple - How to handle hue: "layer", "stack", "dodge", "fill"# Histogram with density normalization
sns.histplot(data=df, x='total_bill', hue='time',
stat='density', multiple='stack')
# Bivariate KDE with contours
sns.kdeplot(data=df, x='total_bill', y='tip',
fill=True, levels=5, thresh=0.1)
# Joint plot with marginals
sns.jointplot(data=df, x='total_bill', y='tip',
kind='scatter', hue='time')
# Pairwise relationships
sns.pairplot(data=df, hue='species', corner=True)
Use for: Comparing distributions or statistics across discrete categories
Categorical scatterplots:
stripplot() - Points with jitter to show all observationsswarmplot() - Non-overlapping points (beeswarm algorithm)Distribution comparisons:
boxplot() - Quartiles and outliersviolinplot() - KDE + quartile informationboxenplot() - Enhanced boxplot for larger datasetsStatistical estimates:
barplot() - Mean/aggregate with confidence intervalspointplot() - Point estimates with connecting linescountplot() - Count of observations per categoryFigure-level:
catplot() - Faceted categorical plots (set kind parameter)Key parameters:
x, y - Variables (one typically categorical)hue - Additional categorical groupingorder, hue_order - Control category orderingdodge - Separate hue levels side-by-sideorient - "v" (vertical) or "h" (horizontal)kind - Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"# Swarm plot showing all points
sns.swarmplot(data=df, x='day', y='total_bill', hue='sex')
# Violin plot with split for comparison
sns.violinplot(data=df, x='day', y='total_bill',
hue='sex', split=True)
# Bar plot with error bars
sns.barplot(data=df, x='day', y='total_bill',
hue='sex', estimator='mean', errorbar='ci')
# Faceted categorical plot
sns.catplot(data=df, x='day', y='total_bill',
col='time', kind='box')
Use for: Visualizing linear regressions and residuals
regplot() - Axes-level regression plot with scatter + fit linelmplot() - Figure-level with faceting supportresidplot() - Residual plot for assessing model fitKey parameters:
x, y - Variables to regressorder - Polynomial regression orderlogistic - Fit logistic regressionrobust - Use robust regression (less sensitive to outliers)ci - Confidence interval width (default 95)scatter_kws, line_kws - Customize scatter and line properties# Simple linear regression
sns.regplot(data=df, x='total_bill', y='tip')
# Polynomial regression with faceting
sns.lmplot(data=df, x='total_bill', y='tip',
col='time', order=2, ci=95)
# Check residuals
sns.residplot(data=df, x='total_bill', y='tip')
Use for: Visualizing matrices, correlations, and grid-structured data
heatmap() - Color-encoded matrix with annotationsclustermap() - Hierarchically-clustered heatmapKey parameters:
data - 2D rectangular dataset (DataFrame or array)annot - Display values in cellsfmt - Format string for annotations (e.g., ".2f")cmap - Colormap namecenter - Value at colormap center (for diverging colormaps)vmin, vmax - Color scale limitssquare - Force square cellslinewidths - Gap between cells# Correlation heatmap
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f',
cmap='coolwarm', center=0, square=True)
# Clustered heatmap
sns.clustermap(data, cmap='viridis',
standard_scale=1, figsize=(10, 10))
Seaborn provides grid objects for creating complex multi-panel figures:
Create subplots based on categorical variables. Most useful when called through figure-level functions (relplot, displot, catplot), but can be used directly for custom plots.
g = sns.FacetGrid(df, col='time', row='sex', hue='smoker')
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()
Show pairwise relationships between all variables in a dataset.
g = sns.PairGrid(df, hue='species')
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot)
g.add_legend()
Combine bivariate plot with marginal distributions.
g = sns.JointGrid(data=df, x='total_bill', y='tip')
g.plot_joint(sns.scatterplot)
g.plot_marginals(sns.histplot)
Understanding this distinction is crucial for effective seaborn usage:
Axes objectax= parameter for precise placementAxes objectscatterplot, histplot, boxplot, regplot, heatmap
When to use:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
sns.scatterplot(data=df, x='x', y='y', ax=axes[0, 0])
sns.histplot(data=df, x='x', ax=axes[0, 1])
sns.boxplot(data=df, x='cat', y='y', ax=axes[1, 0])
sns.kdeplot(data=df, x='x', y='y', ax=axes[1, 1])
col and row parametersFacetGrid, JointGrid, or PairGrid objectsheight and aspect for sizing (per subplot)relplot, displot, catplot, lmplot, jointplot, pairplot
When to use:
# Automatic faceting
sns.relplot(data=df, x='x', y='y', col='category', row='group',
hue='type', height=3, aspect=1.2)
Each variable is a column, each observation is a row. This "tidy" format provides maximum flexibility:
# Long-form structure
subject condition measurement
0 1 control 10.5
1 1 treatment 12.3
2 2 control 9.8
3 2 treatment 13.1
Advantages:
Variables are spread across columns. Useful for simple rectangular data:
# Wide-form structure
control treatment
0 10.5 12.3
1 9.8 13.1
Use cases:
Converting wide to long:
df_long = df.melt(var_name='condition', value_name='measurement')
Seaborn provides carefully designed color palettes for different data types:
Distinguish categories through hue variation:
"deep" - Default, vivid colors"muted" - Softer, less saturated"pastel" - Light, desaturated"bright" - Highly saturated"dark" - Dark values"colorblind" - Safe for color vision deficiencysns.set_palette("colorblind")
sns.color_palette("Set2")
Show progression from low to high values:
"rocket", "mako" - Wide luminance range (good for heatmaps)"flare", "crest" - Restricted luminance (good for points/lines)"viridis", "magma", "plasma" - Matplotlib perceptually uniformsns.heatmap(data, cmap='rocket')
sns.kdeplot(data=df, x='x', y='y', cmap='mako', fill=True)
Emphasize deviations from a midpoint:
"vlag" - Blue to red"icefire" - Blue to orange"coolwarm" - Cool to warm"Spectral" - Rainbow divergingsns.heatmap(correlation_matrix, cmap='vlag', center=0)
# Create custom palette
custom = sns.color_palette("husl", 8)
# Light to dark gradient
palette = sns.light_palette("seagreen", as_cmap=True)
# Diverging palette from hues
palette = sns.diverging_palette(250, 10, as_cmap=True)
set_theme() controls overall appearance:
# Set complete theme
sns.set_theme(style='whitegrid', palette='pastel', font='sans-serif')
# Reset to defaults
sns.set_theme()
Control background and grid appearance:
"darkgrid" - Gray background with white grid (default)"whitegrid" - White background with gray grid"dark" - Gray background, no grid"white" - White background, no grid"ticks" - White background with axis tickssns.set_style("whitegrid")
# Remove spines
sns.despine(left=False, bottom=False, offset=10, trim=True)
# Temporary style
with sns.axes_style("white"):
sns.scatterplot(data=df, x='x', y='y')
Scale elements for different use cases:
"paper" - Smallest (default)"notebook" - Slightly larger"talk" - Presentation slides"poster" - Large formatsns.set_context("talk", font_scale=1.2)
# Temporary context
with sns.plotting_context("poster"):
sns.barplot(data=df, x='category', y='value')
Always use well-structured DataFrames with meaningful column names:
# Good: Named columns in DataFrame
df = pd.DataFrame({'bill': bills, 'tip': tips, 'day': days})
sns.scatterplot(data=df, x='bill', y='tip', hue='day')
# Avoid: Unnamed arrays
sns.scatterplot(x=x_array, y=y_array) # Loses axis labels
Continuous x, continuous y: scatterplot, lineplot, kdeplot, regplot
Continuous x, categorical y: violinplot, boxplot, stripplot, swarmplot
One continuous variable: histplot, kdeplot, ecdfplot
Correlations/matrices: heatmap, clustermap
Pairwise relationships: pairplot, jointplot
# Instead of manual subplot creation
sns.relplot(data=df, x='x', y='y', col='category', col_wrap=3)
# Not: Creating subplots manually for simple faceting
Use hue, size, and style to encode additional dimensions:
sns.scatterplot(data=df, x='x', y='y',
hue='category', # Color by category
size='importance', # Size by continuous variable
style='type') # Marker style by type
Many functions compute statistics automatically. Understand and customize:
# Lineplot computes mean and 95% CI by default
sns.lineplot(data=df, x='time', y='value',
errorbar='sd') # Use standard deviation instead
# Barplot computes mean by default
sns.barplot(data=df, x='category', y='value',
estimator='median', # Use median instead
errorbar=('ci', 95)) # Bootstrapped CI
Seaborn integrates seamlessly with matplotlib for fine-tuning:
ax = sns.scatterplot(data=df, x='x', y='y')
ax.set(xlabel='Custom X Label', ylabel='Custom Y Label',
title='Custom Title')
ax.axhline(y=0, color='r', linestyle='--')
plt.tight_layout()
fig = sns.relplot(data=df, x='x', y='y', col='group')
fig.savefig('figure.png', dpi=300, bbox_inches='tight')
fig.savefig('figure.pdf') # Vector format for publications
# Quick overview of all relationships
sns.pairplot(data=df, hue='target', corner=True)
# Distribution exploration
sns.displot(data=df, x='variable', hue='group',
kind='kde', fill=True, col='category')
# Correlation analysis
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
g = sns.catplot(data=df, x='treatment', y='response',
col='cell_line', kind='box', height=3, aspect=1.2)
g.set_axis_labels('Treatment Condition', 'Response (μM)')
g.set_titles('{col_name}')
sns.despine(trim=True)
g.savefig('figure.pdf', dpi=300, bbox_inches='tight')
# Using matplotlib subplots with seaborn
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
sns.scatterplot(data=df, x='x1', y='y', hue='group', ax=axes[0, 0])
sns.histplot(data=df, x='x1', hue='group', ax=axes[0, 1])
sns.violinplot(data=df, x='group', y='y', ax=axes[1, 0])
sns.heatmap(df.pivot_table(values='y', index='x1', columns='x2'),
ax=axes[1, 1], cmap='viridis')
plt.tight_layout()
# Lineplot automatically aggregates and shows CI
sns.lineplot(data=timeseries, x='date', y='measurement',
hue='sensor', style='location', errorbar='sd')
# For more control
g = sns.relplot(data=timeseries, x='date', y='measurement',
col='location', hue='sensor', kind='line',
height=4, aspect=1.5, errorbar=('ci', 95))
g.set_axis_labels('Date', 'Measurement (units)')
Figure-level functions place legends outside by default. To move inside:
g = sns.relplot(data=df, x='x', y='y', hue='category')
g._legend.set_bbox_to_anchor((0.9, 0.5)) # Adjust position
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
For figure-level functions:
sns.relplot(data=df, x='x', y='y', height=6, aspect=1.5)
For axes-level functions:
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=df, x='x', y='y', ax=ax)
# Use a different palette
sns.set_palette("bright")
# Or specify number of colors
palette = sns.color_palette("husl", n_colors=len(df['category'].unique()))
sns.scatterplot(data=df, x='x', y='y', hue='category', palette=palette)
# Adjust bandwidth
sns.kdeplot(data=df, x='x', bw_adjust=0.5) # Less smooth
sns.kdeplot(data=df, x='x', bw_adjust=2) # More smooth
This skill includes reference materials for deeper exploration:
function_reference.md - Comprehensive listing of all seaborn functions with parameters and examplesobjects_interface.md - Detailed guide to the modern seaborn.objects APIexamples.md - Common use cases and code patterns for different analysis scenariosLoad reference files as needed for detailed function signatures, advanced parameters, or specific examples.