Databrycewang-stanfordFree

python-dataviz-guide

Publication-quality data visualization with matplotlib, seaborn, and plotly

Repo bundle on Versuzbrycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research747 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research Yours? Claim it ↗

§ 01 — Stats

Stars903

Prior1166

Quality—

Score—

Tasks—

§ 02 — Install

Get python-dataviz-guide.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

npx versuz@latest install brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-43-wentorai-research-plugins-skills-analysis-dataviz-pyth

Or clone the repo

$git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-43-wentorai-research-plugins-skills-analysis-dataviz-pyth↵

Show SKILL.md content (~1.9k tokens)

---
name: python-dataviz-guide
description: "Publication-quality data visualization with matplotlib, seaborn, and plotly"
metadata:
  openclaw:
    emoji: "📊"
    category: "analysis"
    subcategory: "dataviz"
    keywords: ["data visualization", "chart design", "Python dataviz", "scientific figure creation", "publication quality figure"]
    source: "N/A"
---

# Python Data Visualization Guide

## Overview

Data visualization is how researchers communicate quantitative findings. A well-designed figure can convey complex relationships instantly, while a poor one buries the signal in clutter. Python's visualization ecosystem -- anchored by matplotlib, seaborn, and plotly -- provides everything needed to produce publication-quality figures for journals, conferences, and presentations.

This guide covers the three major Python visualization libraries, their strengths and trade-offs, and concrete recipes for the chart types researchers use most frequently. Each example is designed to be copy-paste ready and customizable for your specific dataset and venue requirements.

The emphasis is on producing figures that meet journal standards: correct DPI, appropriate font sizes, accessible color palettes, and vector-format exports. We also cover interactive visualization with plotly for exploratory analysis and supplementary materials.

## Matplotlib: The Foundation

Matplotlib is the most flexible Python plotting library. Nearly every other visualization tool in the Python ecosystem builds on it.

### Setting Up Publication Defaults

```python
import matplotlib.pyplot as plt
import matplotlib as mpl

# Publication-quality defaults
plt.rcParams.update({
    'figure.figsize': (6, 4),
    'figure.dpi': 150,
    'savefig.dpi': 300,
    'savefig.bbox': 'tight',
    'font.size': 11,
    'font.family': 'serif',
    'font.serif': ['Times New Roman'],
    'axes.labelsize': 12,
    'axes.titlesize': 13,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'legend.fontsize': 10,
    'lines.linewidth': 1.5,
    'lines.markersize': 6,
    'axes.grid': True,
    'grid.alpha': 0.3,
})
```

### Line Plot with Error Bands

```python
import numpy as np

epochs = np.arange(1, 51)
acc_mean = 1 - 0.5 * np.exp(-epochs / 10)
acc_std = 0.03 * np.exp(-epochs / 20)

fig, ax = plt.subplots()
ax.plot(epochs, acc_mean, label='Our Method', color='#2563EB')
ax.fill_between(epochs, acc_mean - acc_std, acc_mean + acc_std,
                alpha=0.2, color='#2563EB')
ax.set_xlabel('Epoch')
ax.set_ylabel('Accuracy')
ax.set_ylim(0.4, 1.0)
ax.legend(frameon=False)
fig.savefig('accuracy_curve.pdf')  # Vector format for papers
```

### Multi-Panel Figures

```python
fig, axes = plt.subplots(1, 3, figsize=(15, 4), sharey=True)

for ax, dataset, color in zip(axes, ['CIFAR-10', 'ImageNet', 'COCO'],
                                ['#2563EB', '#DC2626', '#16A34A']):
    x = np.random.randn(200)
    ax.hist(x, bins=30, color=color, alpha=0.7, edgecolor='white')
    ax.set_title(dataset)
    ax.set_xlabel('Score Distribution')

axes[0].set_ylabel('Count')
plt.tight_layout()
fig.savefig('multi_panel.pdf')
```

## Seaborn: Statistical Visualization

Seaborn excels at statistical graphics with minimal code. It handles data frames natively and produces polished output by default.

### Comparison Bar Chart with Significance

```python
import seaborn as sns
import pandas as pd

data = pd.DataFrame({
    'Method': ['Baseline', 'Baseline', 'Ours', 'Ours', 'Ours+FT', 'Ours+FT'],
    'Metric': ['BLEU', 'ROUGE'] * 3,
    'Score': [34.2, 45.1, 41.8, 52.3, 48.5, 58.7]
})

fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=data, x='Metric', y='Score', hue='Method',
            palette=['#94A3B8', '#3B82F6', '#EF4444'], ax=ax)
ax.set_ylabel('Score')
ax.legend(title='Method', frameon=False)
fig.savefig('comparison.pdf')
```

### Correlation Heatmap

```python
corr_matrix = pd.DataFrame(
    np.random.randn(8, 8),
    columns=[f'Feature {i}' for i in range(8)]
).corr()

fig, ax = plt.subplots(figsize=(8, 7))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, square=True, linewidths=0.5, ax=ax)
ax.set_title('Feature Correlation Matrix')
fig.savefig('heatmap.pdf')
```

### Violin Plot for Distribution Comparison

```python
df = pd.DataFrame({
    'Group': np.repeat(['Control', 'Treatment A', 'Treatment B'], 100),
    'Value': np.concatenate([
        np.random.normal(50, 10, 100),
        np.random.normal(55, 8, 100),
        np.random.normal(60, 12, 100)
    ])
})

fig, ax = plt.subplots(figsize=(8, 5))
sns.violinplot(data=df, x='Group', y='Value', palette='Set2',
               inner='box', ax=ax)
ax.set_ylabel('Measurement')
fig.savefig('violin.pdf')
```

## Plotly: Interactive Visualization

Plotly is ideal for exploratory analysis and HTML-based supplementary materials.

```python
import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp",
                 size="pop", color="continent",
                 hover_name="country",
                 log_x=True, size_max=60,
                 title="GDP vs Life Expectancy (2007)")
fig.write_html("interactive_scatter.html")
fig.write_image("scatter.pdf")  # Requires kaleido
```

## Chart Type Selection Guide

| Data Relationship | Recommended Chart | Library |
|-------------------|-------------------|---------|
| Trend over time | Line plot | matplotlib |
| Distribution | Histogram, violin, box | seaborn |
| Comparison (categories) | Bar chart, grouped bar | seaborn |
| Correlation (2 vars) | Scatter plot | matplotlib/plotly |
| Correlation (matrix) | Heatmap | seaborn |
| Part-to-whole | Stacked bar (not pie) | matplotlib |
| High-dimensional | PCA/t-SNE scatter | plotly |
| Geospatial | Choropleth | plotly |

## Best Practices

- **Export as PDF or SVG for print, PNG at 300 DPI as fallback.** Never submit JPEG figures to journals.
- **Use colorblind-safe palettes.** `sns.color_palette("colorblind")` or use tools like ColorBrewer.
- **Label everything.** Axes, legends, and units should be readable without referring to the caption.
- **Avoid chartjunk.** Remove unnecessary gridlines, borders, and decorative elements.
- **Match the figure width to the journal column width.** Single-column is typically 3.3 inches; double-column is 6.9 inches.
- **Use consistent styling across all figures in a paper.** Define a style dictionary once and reuse it.
- **Include error bars or confidence intervals.** Raw point estimates without uncertainty are incomplete.

## References

- [matplotlib Documentation](https://matplotlib.org/stable/) -- Official reference
- [seaborn Documentation](https://seaborn.pydata.org/) -- Statistical visualization
- [plotly Documentation](https://plotly.com/python/) -- Interactive charts
- [Scientific Visualization: Python + Matplotlib](https://github.com/rougier/scientific-visualization-book) -- Nicolas Rougier
- [Ten Simple Rules for Better Figures](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833) -- Rougier et al.