xlsx-generation

Show SKILL.md content (~16.3k tokens)
---
name: xlsx-generation
description: Excel spreadsheet generation and manipulation expertise for creating, reading, and automating XLSX files programmatically. Use when building report generators, automating financial models, creating data export pipelines, or manipulating spreadsheets.
summary_l0: "Generate and manipulate Excel spreadsheets with formulas, charts, and multi-library support"
overview_l1: "This skill provides comprehensive expertise in generating, reading, and manipulating Excel XLSX files programmatically across Python, JavaScript, and Java ecosystems. Use it when building automated report generators, creating data export pipelines, producing financial models, formatting business dashboards as spreadsheets, converting database query results into styled Excel workbooks, or integrating spreadsheet output into ETL workflows. Key capabilities include library selection across openpyxl, xlsxwriter, ExcelJS, and Apache POI; workbook and worksheet creation with typed cell data; rich cell styling with fonts, borders, fills, alignment, and number formats; conditional formatting rules and data validation dropdowns; formula injection with named ranges, array formulas, and cross-sheet references; chart generation including bar, line, pie, scatter, and combo charts with data-driven configuration; advanced features such as autofilters, freeze panes, print layout, password protection, and VBA macro preservation; and Pandas integration for DataFrame-to-Excel pipelines with multi-sheet export and Styler-based formatting. The expected output is production-ready XLSX files with correct formatting, formulas, charts, and data integrity suitable for business stakeholders. Trigger phrases: xlsx, Excel generation, spreadsheet, openpyxl, xlsxwriter, ExcelJS, Apache POI, Excel report, data export, Excel automation, workbook, worksheet, cell formatting, Excel chart, pivot table, Excel formula, DataFrame to Excel, spreadsheet pipeline."
---

# XLSX Generation

Structured guidance for generating, reading, and manipulating Excel XLSX files programmatically. Covers library selection, cell formatting, formulas, charts, advanced workbook features, and data pipeline integration across Python, JavaScript, and Java ecosystems.

## When to Use This Skill

Use this skill for:

- Building automated report generators that output styled Excel workbooks
- Creating data export pipelines that produce XLSX files from databases or APIs
- Generating financial models, budgets, or forecasts as spreadsheets
- Formatting business dashboards with conditional formatting, charts, and pivot-ready data
- Converting DataFrame analysis results into multi-sheet Excel deliverables
- Integrating spreadsheet output into ETL or CI/CD workflows
- Reading and transforming existing Excel files (extract data, update cells, merge workbooks)
- Producing Excel templates with data validation, dropdowns, and protected ranges

**Trigger phrases**: "xlsx", "Excel generation", "spreadsheet", "openpyxl", "xlsxwriter", "ExcelJS", "Apache POI", "Excel report", "data export", "Excel automation", "workbook", "worksheet", "cell formatting", "Excel chart", "pivot table", "Excel formula", "DataFrame to Excel", "spreadsheet pipeline"

## What This Skill Does

Provides Excel generation patterns including:

- **Library Selection**: Decision matrix for openpyxl, xlsxwriter, ExcelJS, Apache POI, and Pandas wrappers
- **Cell Formatting**: Fonts, fills, borders, alignment, number formats, merged cells, rich text
- **Formulas**: Cell references, named ranges, array formulas, cross-sheet formulas, formula auditing
- **Charts**: Bar, line, pie, scatter, combo charts with programmatic configuration and positioning
- **Data Validation**: Dropdown lists, numeric ranges, date constraints, custom formula validators
- **Advanced Features**: Autofilters, freeze panes, print setup, protection, VBA preservation, images
- **Pandas Integration**: DataFrame export, multi-sheet workbooks, Styler formatting, read/transform/write
- **Performance**: Streaming writes for large datasets, memory optimization, batch operations

## Instructions

### Step 1: Library Selection

Choosing the right library depends on whether you need read/write access, write-only performance, language ecosystem, and specific feature requirements.

**Decision Matrix**:

| Feature | openpyxl (Python) | xlsxwriter (Python) | ExcelJS (Node.js) | Apache POI (Java) | Pandas to_excel |
|---------|-------------------|---------------------|--------------------|--------------------|-----------------|
| Read XLSX | Yes | No | Yes | Yes | Yes (via openpyxl) |
| Write XLSX | Yes | Yes | Yes | Yes | Yes (via openpyxl/xlsxwriter) |
| Modify existing | Yes | No | Yes | Yes | No |
| Streaming write | Yes (write-only mode) | Yes (default) | Yes | Yes (SXSSF) | No |
| Formulas | Yes | Yes | Yes | Yes | Limited |
| Charts | Yes | Yes | Yes | Yes | No |
| Conditional formatting | Yes | Yes | Yes | Yes | Via Styler |
| VBA macro support | Yes (preserve) | Yes (xlsm) | No | Yes | No |
| Images | Yes | Yes | Yes | Yes | No |
| Memory efficiency | Moderate | High | Moderate | Low (HSSF) / High (SXSSF) | Low |
| Max rows | 1,048,576 | 1,048,576 | 1,048,576 | 1,048,576 | 1,048,576 |
| Install | `pip install openpyxl` | `pip install xlsxwriter` | `npm i exceljs` | Maven/Gradle | `pip install pandas openpyxl` |

**When to Use Each Library**:

- **openpyxl**: Default choice for Python when you need both read and write, or must modify existing files. Best for template-based report generation where you load a template and fill in data
- **xlsxwriter**: Best for Python write-only scenarios requiring maximum performance and feature richness (conditional formatting, sparklines, data validation). Cannot read or modify existing files
- **ExcelJS**: The standard choice for Node.js/TypeScript projects. Supports read, write, and streaming. Good feature coverage for most business requirements
- **Apache POI**: The Java ecosystem standard. Use XSSF for full-featured access or SXSSF for streaming large datasets. Heaviest memory footprint but most mature library
- **Pandas to_excel**: Best when your data is already in DataFrames. Not a standalone Excel library; delegates to openpyxl or xlsxwriter under the hood. Use for quick exports; switch to the underlying library when you need formatting control

**Installation and Setup**:

```python
# Python: openpyxl (read/write)
pip install openpyxl

# Python: xlsxwriter (write-only, high performance)
pip install xlsxwriter

# Python: Pandas with Excel support
pip install pandas openpyxl  # or pandas xlsxwriter
```

```bash
# Node.js: ExcelJS
npm install exceljs
```

```xml
<!-- Java: Apache POI (Maven) -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.5</version>
</dependency>
```

**Choosing Based on Use Case**:

```
Need to read existing XLSX?
  ├─ Yes → openpyxl (Python), ExcelJS (Node), POI (Java)
  └─ No → Write-only?
       ├─ Yes, large dataset → xlsxwriter (Python), SXSSF (Java)
       ├─ Yes, data already in DataFrame → pandas.to_excel
       └─ Yes, Node.js project → ExcelJS

Need to preserve VBA macros?
  ├─ Yes → openpyxl (keep_vba=True), POI
  └─ No → Any library

Need charts + conditional formatting + data validation?
  ├─ Yes → xlsxwriter (Python), POI (Java), ExcelJS (Node)
  └─ Basic formatting only → Any library
```

### Step 2: Python openpyxl Fundamentals

openpyxl is the most versatile Python library for Excel manipulation. It supports reading, writing, and modifying XLSX files with full formatting control.

**Workbook and Worksheet Basics**:

```python
from openpyxl import Workbook, load_workbook
from openpyxl.utils import get_column_letter

# Create a new workbook
wb = Workbook()
ws = wb.active  # Get the default sheet
ws.title = "Sales Report"

# Add additional sheets
ws2 = wb.create_sheet("Summary")
ws3 = wb.create_sheet("Raw Data", 0)  # Insert at position 0

# Write data to cells
ws["A1"] = "Product"
ws["B1"] = "Revenue"
ws["C1"] = "Quarter"

# Write by row and column index (1-based)
ws.cell(row=2, column=1, value="Widget A")
ws.cell(row=2, column=2, value=15000.50)
ws.cell(row=2, column=3, value="Q1 2026")

# Write rows in bulk
data = [
    ["Widget B", 22000.75, "Q1 2026"],
    ["Widget C", 8500.00, "Q1 2026"],
    ["Widget D", 31200.25, "Q1 2026"],
]
for row in data:
    ws.append(row)

# Set column widths
ws.column_dimensions["A"].width = 20
ws.column_dimensions["B"].width = 15
ws.column_dimensions["C"].width = 12

# Set row height
ws.row_dimensions[1].height = 25

# Save the workbook
wb.save("sales_report.xlsx")
```

**Cell Data Types and Number Formats**:

```python
from openpyxl import Workbook
from openpyxl.utils import get_column_letter
from datetime import datetime, date
from decimal import Decimal

wb = Workbook()
ws = wb.active

# String values
ws["A1"] = "Revenue Report"

# Numeric values (integers and floats)
ws["A2"] = 42
ws["B2"] = 3.14159
ws["C2"] = Decimal("15000.50")  # Converted to float internally

# Date and datetime values
ws["A3"] = date(2026, 3, 15)
ws["A3"].number_format = "YYYY-MM-DD"

ws["B3"] = datetime(2026, 3, 15, 14, 30, 0)
ws["B3"].number_format = "YYYY-MM-DD HH:MM:SS"

# Currency formatting
ws["A4"] = 15000.50
ws["A4"].number_format = '"$"#,##0.00'

# Percentage formatting
ws["B4"] = 0.1575
ws["B4"].number_format = "0.00%"

# Accounting format (negative in parentheses)
ws["C4"] = -5000.00
ws["C4"].number_format = '_("$"* #,##0.00_);_("$"* (#,##0.00);_("$"* "-"??_);_(@_)'

# Custom number formats
ws["A5"] = 1234567
ws["A5"].number_format = "#,##0"  # Thousands separator

ws["B5"] = 0.5
ws["B5"].number_format = "0.0%"

# Boolean values
ws["A6"] = True  # Displays as TRUE in Excel

wb.save("data_types.xlsx")
```

**Cell Styling (Fonts, Fills, Borders, Alignment)**:

```python
from openpyxl import Workbook
from openpyxl.styles import (
    Font, PatternFill, Border, Side, Alignment, NamedStyle
)

wb = Workbook()
ws = wb.active

# Font styling
header_font = Font(
    name="Calibri",
    size=14,
    bold=True,
    italic=False,
    color="FFFFFF",  # White text
)

body_font = Font(name="Calibri", size=11, color="333333")

# Fill (background color)
header_fill = PatternFill(
    start_color="2F5496",  # Dark blue
    end_color="2F5496",
    fill_type="solid",
)

alternating_fill = PatternFill(
    start_color="D6E4F0",  # Light blue
    end_color="D6E4F0",
    fill_type="solid",
)

# Borders
thin_border = Border(
    left=Side(style="thin", color="999999"),
    right=Side(style="thin", color="999999"),
    top=Side(style="thin", color="999999"),
    bottom=Side(style="thin", color="999999"),
)

header_border = Border(
    bottom=Side(style="medium", color="2F5496"),
)

# Alignment
center_align = Alignment(
    horizontal="center",
    vertical="center",
    wrap_text=True,
)

# Apply styles to header row
headers = ["Product", "Revenue", "Cost", "Profit", "Margin"]
for col_idx, header in enumerate(headers, 1):
    cell = ws.cell(row=1, column=col_idx, value=header)
    cell.font = header_font
    cell.fill = header_fill
    cell.border = thin_border
    cell.alignment = center_align

# Apply alternating row colors
data_rows = [
    ["Widget A", 15000, 8000, 7000, 0.4667],
    ["Widget B", 22000, 12000, 10000, 0.4545],
    ["Widget C", 8500, 5000, 3500, 0.4118],
    ["Widget D", 31200, 18000, 13200, 0.4231],
]

for row_idx, row_data in enumerate(data_rows, 2):
    for col_idx, value in enumerate(row_data, 1):
        cell = ws.cell(row=row_idx, column=col_idx, value=value)
        cell.font = body_font
        cell.border = thin_border
        if row_idx % 2 == 0:
            cell.fill = alternating_fill
        # Format currency columns
        if col_idx in (2, 3, 4):
            cell.number_format = '"$"#,##0'
        # Format percentage column
        if col_idx == 5:
            cell.number_format = "0.0%"

wb.save("styled_report.xlsx")
```

**Named Styles for Reuse**:

```python
from openpyxl import Workbook
from openpyxl.styles import NamedStyle, Font, PatternFill, Border, Side, Alignment

wb = Workbook()

# Define reusable named styles
header_style = NamedStyle(name="header_style")
header_style.font = Font(bold=True, size=12, color="FFFFFF")
header_style.fill = PatternFill(start_color="2F5496", fill_type="solid")
header_style.alignment = Alignment(horizontal="center", vertical="center")
header_style.border = Border(
    bottom=Side(style="medium", color="1F3864")
)
wb.add_named_style(header_style)

currency_style = NamedStyle(name="currency_style")
currency_style.number_format = '"$"#,##0.00'
currency_style.font = Font(size=11)
currency_style.alignment = Alignment(horizontal="right")
wb.add_named_style(currency_style)

# Apply named styles by name
ws = wb.active
ws["A1"].style = "header_style"
ws["A1"].value = "Amount"
ws["A2"].style = "currency_style"
ws["A2"].value = 15000.50

wb.save("named_styles.xlsx")
```

**Merged Cells**:

```python
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment

wb = Workbook()
ws = wb.active

# Merge cells for a title row
ws.merge_cells("A1:E1")
ws["A1"] = "Quarterly Sales Report - Q1 2026"
ws["A1"].font = Font(size=16, bold=True)
ws["A1"].alignment = Alignment(horizontal="center")

# Merge cells for section headers
ws.merge_cells("A3:B3")
ws["A3"] = "Product Details"
ws["A3"].font = Font(bold=True)

ws.merge_cells("C3:E3")
ws["C3"] = "Financial Metrics"
ws["C3"].font = Font(bold=True)

# Unmerge if needed
# ws.unmerge_cells("A1:E1")

wb.save("merged_cells.xlsx")
```

**Loading and Modifying Existing Files**:

```python
from openpyxl import load_workbook

# Load an existing workbook
wb = load_workbook("template.xlsx")
ws = wb.active

# Read cell values
value = ws["A1"].value
print(f"Cell A1: {value}")

# Iterate over rows
for row in ws.iter_rows(min_row=2, max_col=5, values_only=True):
    product, revenue, cost, profit, margin = row
    print(f"{product}: ${revenue}")

# Modify cells
ws["F1"] = "Status"
for row_idx in range(2, ws.max_row + 1):
    profit = ws.cell(row=row_idx, column=4).value
    if profit and profit > 10000:
        ws.cell(row=row_idx, column=6, value="High")
    else:
        ws.cell(row=row_idx, column=6, value="Standard")

# Load with data_only to get calculated values instead of formulas
wb_values = load_workbook("report_with_formulas.xlsx", data_only=True)
# Note: data_only returns the cached value from the last Excel save,
# not a recalculated value. If the file was never opened in Excel,
# formula cells will return None.

wb.save("template_updated.xlsx")
```

### Step 3: Python xlsxwriter

xlsxwriter is a write-only library optimized for performance and feature richness. It produces XLSX files without needing to read existing ones, making it ideal for report generation pipelines.

**Basic Setup and Formatting**:

```python
import xlsxwriter

wb = xlsxwriter.Workbook("report.xlsx")
ws = wb.add_worksheet("Sales")

# Define reusable formats
header_fmt = wb.add_format({
    "bold": True,
    "font_size": 12,
    "font_color": "#FFFFFF",
    "bg_color": "#2F5496",
    "border": 1,
    "align": "center",
    "valign": "vcenter",
    "text_wrap": True,
})

currency_fmt = wb.add_format({
    "num_format": "$#,##0.00",
    "font_size": 11,
    "border": 1,
})

percent_fmt = wb.add_format({
    "num_format": "0.0%",
    "font_size": 11,
    "border": 1,
})

date_fmt = wb.add_format({
    "num_format": "yyyy-mm-dd",
    "border": 1,
})

# Write header row
headers = ["Product", "Revenue", "Cost", "Profit", "Margin", "Date"]
for col, header in enumerate(headers):
    ws.write(0, col, header, header_fmt)

# Write data rows
data = [
    ["Widget A", 15000.50, 8000, 7000.50, 0.4667, "2026-03-15"],
    ["Widget B", 22000.75, 12000, 10000.75, 0.4545, "2026-03-16"],
    ["Widget C", 8500.00, 5000, 3500.00, 0.4118, "2026-03-17"],
]

for row_idx, row_data in enumerate(data, 1):
    ws.write(row_idx, 0, row_data[0])             # String
    ws.write(row_idx, 1, row_data[1], currency_fmt) # Currency
    ws.write(row_idx, 2, row_data[2], currency_fmt) # Currency
    ws.write(row_idx, 3, row_data[3], currency_fmt) # Currency
    ws.write(row_idx, 4, row_data[4], percent_fmt)  # Percentage
    ws.write(row_idx, 5, row_data[5], date_fmt)     # Date

# Set column widths
ws.set_column("A:A", 20)
ws.set_column("B:D", 15)
ws.set_column("E:E", 10)
ws.set_column("F:F", 14)

# Set row heights
ws.set_row(0, 25)

wb.close()  # xlsxwriter uses close(), not save()
```

**Conditional Formatting**:

```python
import xlsxwriter

wb = xlsxwriter.Workbook("conditional.xlsx")
ws = wb.add_worksheet()

# Write sample data
headers = ["Product", "Revenue", "Target", "Variance"]
for col, h in enumerate(headers):
    ws.write(0, col, h)

data = [
    ["Widget A", 15000, 12000, 3000],
    ["Widget B", 8000, 12000, -4000],
    ["Widget C", 22000, 12000, 10000],
    ["Widget D", 11000, 12000, -1000],
]
for r, row in enumerate(data, 1):
    for c, val in enumerate(row):
        ws.write(r, c, val)

# Color scale: green (high) to red (low) on revenue column
ws.conditional_format("B2:B5", {
    "type": "3_color_scale",
    "min_color": "#F8696B",   # Red
    "mid_color": "#FFEB84",   # Yellow
    "max_color": "#63BE7B",   # Green
})

# Data bar on revenue column
ws.conditional_format("B2:B5", {
    "type": "data_bar",
    "bar_color": "#2F5496",
})

# Icon set on variance column
ws.conditional_format("D2:D5", {
    "type": "icon_set",
    "icon_style": "3_traffic_lights",
    "icons": [
        {"criteria": ">=", "type": "number", "value": 5000},
        {"criteria": ">=", "type": "number", "value": 0},
        {"criteria": "<",  "type": "number", "value": 0},
    ],
})

# Cell-based conditional format: highlight negative variance in red
red_fmt = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})
ws.conditional_format("D2:D5", {
    "type": "cell",
    "criteria": "<",
    "value": 0,
    "format": red_fmt,
})

# Highlight cells above average
green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
ws.conditional_format("B2:B5", {
    "type": "average",
    "criteria": "above",
    "format": green_fmt,
})

# Formula-based: highlight entire row where variance is negative
row_red_fmt = wb.add_format({"bg_color": "#FFC7CE"})
ws.conditional_format("A2:D5", {
    "type": "formula",
    "criteria": "=$D2<0",
    "format": row_red_fmt,
})

wb.close()
```

**Data Validation**:

```python
import xlsxwriter

wb = xlsxwriter.Workbook("validation.xlsx")
ws = wb.add_worksheet()

# Dropdown list validation
ws.write("A1", "Status")
ws.data_validation("A2:A100", {
    "validate": "list",
    "source": ["Active", "Inactive", "Pending", "Archived"],
    "input_title": "Select Status",
    "input_message": "Choose a status from the dropdown.",
    "error_title": "Invalid Status",
    "error_message": "Please select a valid status from the list.",
})

# Numeric range validation
ws.write("B1", "Quantity")
ws.data_validation("B2:B100", {
    "validate": "integer",
    "criteria": "between",
    "minimum": 1,
    "maximum": 10000,
    "input_title": "Enter Quantity",
    "input_message": "Quantity must be between 1 and 10,000.",
    "error_type": "stop",
})

# Date range validation
ws.write("C1", "Due Date")
ws.data_validation("C2:C100", {
    "validate": "date",
    "criteria": ">=",
    "value": "2026-01-01",
    "input_title": "Enter Date",
    "input_message": "Date must be on or after 2026-01-01.",
})

# Custom formula validation (value must be unique in column)
ws.write("D1", "Code")
ws.data_validation("D2:D100", {
    "validate": "custom",
    "value": "=COUNTIF($D:$D,D2)<=1",
    "input_title": "Unique Code",
    "input_message": "Enter a unique product code.",
    "error_title": "Duplicate",
    "error_message": "This code already exists in the column.",
})

wb.close()
```

**Sparklines**:

```python
import xlsxwriter

wb = xlsxwriter.Workbook("sparklines.xlsx")
ws = wb.add_worksheet()

# Monthly revenue data
ws.write_row("A1", ["Product", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Trend"])
ws.write_row("A2", ["Widget A", 100, 120, 115, 140, 155, 170])
ws.write_row("A3", ["Widget B", 200, 190, 210, 195, 220, 240])
ws.write_row("A4", ["Widget C", 50, 60, 55, 70, 65, 80])

# Add sparklines in the Trend column
ws.add_sparkline("H2", {
    "range": "B2:G2",
    "type": "line",
    "markers": True,
    "high_point": True,
    "low_point": True,
})

ws.add_sparkline("H3", {
    "range": "B3:G3",
    "type": "column",
    "high_point": True,
})

ws.add_sparkline("H4", {
    "range": "B4:G4",
    "type": "win_loss",
})

ws.set_column("H:H", 20)
wb.close()
```

**Memory-Optimized Writing for Large Datasets**:

```python
import xlsxwriter

# Enable constant_memory mode for large datasets
# Rows are flushed to disk and cannot be revisited
wb = xlsxwriter.Workbook("large_dataset.xlsx", {"constant_memory": True})
ws = wb.add_worksheet()

header_fmt = wb.add_format({"bold": True, "bg_color": "#2F5496", "font_color": "#FFFFFF"})

headers = ["ID", "Name", "Value", "Category", "Timestamp"]
for col, h in enumerate(headers):
    ws.write(0, col, h, header_fmt)

# Write 1 million rows efficiently
for row in range(1, 1_000_001):
    ws.write_number(row, 0, row)
    ws.write_string(row, 1, f"Item {row}")
    ws.write_number(row, 2, row * 1.5)
    ws.write_string(row, 3, f"Cat-{row % 10}")
    ws.write_string(row, 4, "2026-03-15T10:30:00")

wb.close()
```

### Step 4: JavaScript ExcelJS

ExcelJS is the standard library for Excel generation in Node.js and TypeScript projects. It supports reading, writing, and streaming with comprehensive formatting options.

**Basic Workbook Creation**:

```javascript
const ExcelJS = require("exceljs");

async function createReport() {
  const wb = new ExcelJS.Workbook();
  wb.creator = "Report Generator";
  wb.created = new Date();

  const ws = wb.addWorksheet("Sales Report", {
    properties: { tabColor: { argb: "2F5496" } },
    pageSetup: {
      paperSize: 9, // A4
      orientation: "landscape",
      fitToPage: true,
    },
  });

  // Define columns with headers, keys, and widths
  ws.columns = [
    { header: "Product", key: "product", width: 25 },
    { header: "Revenue", key: "revenue", width: 15, style: { numFmt: "$#,##0.00" } },
    { header: "Cost", key: "cost", width: 15, style: { numFmt: "$#,##0.00" } },
    { header: "Profit", key: "profit", width: 15, style: { numFmt: "$#,##0.00" } },
    { header: "Margin", key: "margin", width: 12, style: { numFmt: "0.0%" } },
  ];

  // Add rows using key-value objects
  ws.addRow({ product: "Widget A", revenue: 15000.50, cost: 8000, profit: 7000.50, margin: 0.4667 });
  ws.addRow({ product: "Widget B", revenue: 22000.75, cost: 12000, profit: 10000.75, margin: 0.4545 });
  ws.addRow({ product: "Widget C", revenue: 8500.00, cost: 5000, profit: 3500.00, margin: 0.4118 });

  // Style the header row
  const headerRow = ws.getRow(1);
  headerRow.eachCell((cell) => {
    cell.font = { bold: true, size: 12, color: { argb: "FFFFFFFF" } };
    cell.fill = {
      type: "pattern",
      pattern: "solid",
      fgColor: { argb: "FF2F5496" },
    };
    cell.alignment = { horizontal: "center", vertical: "middle" };
    cell.border = {
      bottom: { style: "medium", color: { argb: "FF1F3864" } },
    };
  });
  headerRow.height = 25;

  await wb.xlsx.writeFile("sales_report.xlsx");
}

createReport();
```

**Cell Styling and Rich Text**:

```javascript
const ExcelJS = require("exceljs");

async function styledWorkbook() {
  const wb = new ExcelJS.Workbook();
  const ws = wb.addWorksheet("Styled");

  // Rich text in a single cell
  ws.getCell("A1").value = {
    richText: [
      { font: { bold: true, size: 14, color: { argb: "FF2F5496" } }, text: "Q1 2026 " },
      { font: { italic: true, size: 14, color: { argb: "FF666666" } }, text: "Sales Report" },
    ],
  };

  // Conditional fill based on value
  const data = [
    { name: "Widget A", value: 15000 },
    { name: "Widget B", value: -3000 },
    { name: "Widget C", value: 22000 },
  ];

  data.forEach((item, idx) => {
    const row = idx + 3;
    ws.getCell(`A${row}`).value = item.name;
    const valueCell = ws.getCell(`B${row}`);
    valueCell.value = item.value;
    valueCell.numFmt = "$#,##0.00";

    if (item.value < 0) {
      valueCell.font = { color: { argb: "FF9C0006" } };
      valueCell.fill = {
        type: "pattern",
        pattern: "solid",
        fgColor: { argb: "FFFFC7CE" },
      };
    } else {
      valueCell.font = { color: { argb: "FF006100" } };
      valueCell.fill = {
        type: "pattern",
        pattern: "solid",
        fgColor: { argb: "FFC6EFCE" },
      };
    }
  });

  // Data validation dropdown
  ws.getCell("C3").dataValidation = {
    type: "list",
    allowBlank: true,
    formulae: ['"Active,Inactive,Pending"'],
    showErrorMessage: true,
    errorTitle: "Invalid",
    error: "Select a valid status.",
  };

  await wb.xlsx.writeFile("styled.xlsx");
}

styledWorkbook();
```

**Streaming Writes for Large Datasets**:

```javascript
const ExcelJS = require("exceljs");
const fs = require("fs");

async function streamLargeDataset() {
  const options = {
    filename: "large_dataset.xlsx",
    useStyles: true,
    useSharedStrings: false, // Disable for better performance
  };

  const wb = new ExcelJS.stream.xlsx.WorkbookWriter(options);
  const ws = wb.addWorksheet("Data");

  // Define columns
  ws.columns = [
    { header: "ID", key: "id", width: 10 },
    { header: "Name", key: "name", width: 25 },
    { header: "Value", key: "value", width: 15 },
    { header: "Category", key: "category", width: 15 },
  ];

  // Stream 500,000 rows without holding them all in memory
  for (let i = 1; i <= 500_000; i++) {
    ws.addRow({
      id: i,
      name: `Item ${i}`,
      value: Math.round(Math.random() * 10000) / 100,
      category: `Cat-${i % 10}`,
    }).commit(); // Flush row to disk immediately
  }

  ws.commit();
  await wb.commit();
}

streamLargeDataset();
```

**Adding Images**:

```javascript
const ExcelJS = require("exceljs");

async function addImages() {
  const wb = new ExcelJS.Workbook();
  const ws = wb.addWorksheet("With Image");

  // Add image from file
  const logoId = wb.addImage({
    filename: "logo.png",
    extension: "png",
  });

  // Position image over a cell range
  ws.addImage(logoId, {
    tl: { col: 0, row: 0 },       // Top-left anchor
    br: { col: 3, row: 4 },       // Bottom-right anchor
    editAs: "oneCell",             // Resize behavior
  });

  // Add image from buffer
  const imageBuffer = fs.readFileSync("chart_screenshot.png");
  const chartId = wb.addImage({
    buffer: imageBuffer,
    extension: "png",
  });

  ws.addImage(chartId, "E1:K15");  // Shorthand range notation

  await wb.xlsx.writeFile("with_images.xlsx");
}
```

### Step 5: Formulas and Calculations

All major XLSX libraries support embedding Excel formulas in cells. Formulas are stored as text and evaluated by Excel when the file is opened.

**Cell Formulas in openpyxl**:

```python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Simple formulas
ws["A1"] = "Revenue"
ws["A2"] = 15000
ws["A3"] = 22000
ws["A4"] = 8500
ws["A5"] = "=SUM(A2:A4)"            # Sum
ws["A6"] = "=AVERAGE(A2:A4)"        # Average
ws["A7"] = "=MAX(A2:A4)"            # Maximum
ws["A8"] = '=IF(A5>40000,"High","Low")'  # Conditional

# Cross-cell references
ws["B2"] = 8000   # Cost for row 2
ws["C2"] = "=A2-B2"  # Profit = Revenue - Cost
ws["D2"] = "=C2/A2"  # Margin = Profit / Revenue

# Fill formulas down a range
for row in range(2, 5):
    ws.cell(row=row, column=3).value = f"=A{row}-B{row}"
    ws.cell(row=row, column=4).value = f"=C{row}/A{row}"

# VLOOKUP and INDEX/MATCH
ws2 = wb.create_sheet("Lookup")
ws2["A1"] = "Product"
ws2["B1"] = "Category"
ws2["A2"] = "Widget A"
ws2["B2"] = "Electronics"
ws2["A3"] = "Widget B"
ws2["B3"] = "Hardware"

# Reference lookup from main sheet
ws["E2"] = "=VLOOKUP(A2,Lookup!A:B,2,FALSE)"
# Modern alternative: XLOOKUP (Excel 365+)
ws["F2"] = "=XLOOKUP(A2,Lookup!A:A,Lookup!B:B)"

wb.save("formulas.xlsx")
```

**Named Ranges**:

```python
from openpyxl import Workbook
from openpyxl.workbook.defined_name import DefinedName

wb = Workbook()
ws = wb.active
ws.title = "Data"

# Write data
ws["A1"] = "Revenue"
for row, val in enumerate([15000, 22000, 8500, 31200], 2):
    ws.cell(row=row, column=1, value=val)

# Create a named range
revenue_range = DefinedName("RevenueData", attr_text="Data!$A$2:$A$5")
wb.defined_names.add(revenue_range)

# Use the named range in formulas
ws["C1"] = "Total Revenue"
ws["C2"] = "=SUM(RevenueData)"
ws["C3"] = "Average Revenue"
ws["C4"] = "=AVERAGE(RevenueData)"

# Named range scoped to a specific sheet
local_range = DefinedName(
    "LocalTotal",
    attr_text="Data!$C$2",
    localSheetId=0,  # Sheet index
)
wb.defined_names.add(local_range)

# Print area as a named range
ws.print_area = "A1:C10"

# Print titles (repeat rows at top of each printed page)
ws.print_title_rows = "1:1"

wb.save("named_ranges.xlsx")
```

**Array Formulas and Dynamic Arrays**:

```python
from openpyxl import Workbook
from openpyxl.worksheet.formula import ArrayFormula

wb = Workbook()
ws = wb.active

# Data
ws["A1"] = "Price"
ws["B1"] = "Quantity"
ws["C1"] = "Total"
for row, (price, qty) in enumerate([(10, 5), (20, 3), (15, 8)], 2):
    ws.cell(row=row, column=1, value=price)
    ws.cell(row=row, column=2, value=qty)

# Legacy CSE array formula (Ctrl+Shift+Enter)
# Computes sum of element-wise multiplication
ws["D1"] = "Sum of Products"
ws["D2"] = ArrayFormula("D2", "=SUM(A2:A4*B2:B4)")

# Individual cell formulas (non-array, but referencing ranges)
ws["C2"] = "=A2*B2"
ws["C3"] = "=A3*B3"
ws["C4"] = "=A4*B4"

# Dynamic array formulas (Excel 365+, spill into adjacent cells)
# These work when opened in Excel; the library writes the formula to the anchor cell
ws2 = wb.create_sheet("Dynamic")
ws2["A1"] = "=SORT(Data!A2:A4)"            # Spills sorted values
ws2["C1"] = "=UNIQUE(Data!A2:A10)"          # Spills unique values
ws2["E1"] = "=FILTER(Data!A2:C4,Data!C2:C4>50)"  # Filtered results

wb.save("array_formulas.xlsx")
```

**Cross-Sheet References**:

```python
from openpyxl import Workbook

wb = Workbook()

# Create multiple sheets with data
regions = {
    "North": [10000, 12000, 15000],
    "South": [8000, 9000, 11000],
    "East": [14000, 13000, 16000],
    "West": [7000, 8500, 9500],
}

for region, values in regions.items():
    ws = wb.create_sheet(region)
    ws["A1"] = "Q1"
    ws["B1"] = "Q2"
    ws["C1"] = "Q3"
    for col, val in enumerate(values, 1):
        ws.cell(row=2, column=col, value=val)
    ws["D1"] = "Total"
    ws["D2"] = "=SUM(A2:C2)"

# Summary sheet with cross-sheet references
summary = wb.create_sheet("Summary", 0)
summary["A1"] = "Region"
summary["B1"] = "Q1"
summary["C1"] = "Q2"
summary["D1"] = "Q3"
summary["E1"] = "Total"

for row, region in enumerate(regions.keys(), 2):
    summary.cell(row=row, column=1, value=region)
    summary.cell(row=row, column=2).value = f"='{region}'!A2"
    summary.cell(row=row, column=3).value = f"='{region}'!B2"
    summary.cell(row=row, column=4).value = f"='{region}'!C2"
    summary.cell(row=row, column=5).value = f"='{region}'!D2"

# Grand total row
total_row = len(regions) + 2
summary.cell(row=total_row, column=1, value="Grand Total")
for col in range(2, 6):
    col_letter = chr(64 + col)  # B, C, D, E
    summary.cell(row=total_row, column=col).value = (
        f"=SUM({col_letter}2:{col_letter}{total_row - 1})"
    )

# Remove the default sheet
del wb["Sheet"]

wb.save("cross_sheet.xlsx")
```

**Formula Auditing and Validation**:

```python
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
import re

def audit_formulas(filepath: str) -> list[dict]:
    """Scan a workbook and report all formula cells with their references."""
    wb = load_workbook(filepath)
    findings = []

    for ws in wb.worksheets:
        for row in ws.iter_rows():
            for cell in row:
                if isinstance(cell.value, str) and cell.value.startswith("="):
                    formula = cell.value
                    # Extract cell references from the formula
                    refs = re.findall(
                        r"(?:'[^']+'\!)?\$?[A-Z]{1,3}\$?\d+(?::\$?[A-Z]{1,3}\$?\d+)?",
                        formula,
                    )
                    findings.append({
                        "sheet": ws.title,
                        "cell": cell.coordinate,
                        "formula": formula,
                        "references": refs,
                    })

    return findings

# Usage
# results = audit_formulas("complex_report.xlsx")
# for f in results:
#     print(f"{f['sheet']}!{f['cell']}: {f['formula']} -> refs: {f['references']}")
```

### Step 6: Charts and Visualization

Both openpyxl and xlsxwriter support creating Excel-native charts. Charts are embedded in the worksheet and update dynamically when the underlying data changes.

**openpyxl Charts**:

```python
from openpyxl import Workbook
from openpyxl.chart import (
    BarChart, LineChart, PieChart, ScatterChart, Reference
)
from openpyxl.chart.series import SeriesLabel
from openpyxl.chart.label import DataLabelList
from openpyxl.utils import get_column_letter

wb = Workbook()
ws = wb.active
ws.title = "Chart Data"

# Write data
headers = ["Month", "Revenue", "Cost", "Profit"]
data = [
    ["Jan", 15000, 8000, 7000],
    ["Feb", 18000, 9500, 8500],
    ["Mar", 22000, 11000, 11000],
    ["Apr", 19000, 10000, 9000],
    ["May", 25000, 12500, 12500],
    ["Jun", 28000, 14000, 14000],
]

ws.append(headers)
for row in data:
    ws.append(row)

# --- Bar Chart ---
bar_chart = BarChart()
bar_chart.type = "col"  # "col" for vertical, "bar" for horizontal
bar_chart.title = "Monthly Revenue and Cost"
bar_chart.x_axis.title = "Month"
bar_chart.y_axis.title = "Amount ($)"
bar_chart.style = 10
bar_chart.width = 20
bar_chart.height = 12

# Data references (min_col/max_col are 1-based)
categories = Reference(ws, min_col=1, min_row=2, max_row=7)  # Month labels
revenue_data = Reference(ws, min_col=2, min_row=1, max_row=7)  # Include header
cost_data = Reference(ws, min_col=3, min_row=1, max_row=7)

bar_chart.add_data(revenue_data, titles_from_data=True)
bar_chart.add_data(cost_data, titles_from_data=True)
bar_chart.set_categories(categories)

# Customize series colors
bar_chart.series[0].graphicalProperties.solidFill = "2F5496"  # Blue
bar_chart.series[1].graphicalProperties.solidFill = "C00000"  # Red

ws.add_chart(bar_chart, "F2")

# --- Line Chart ---
line_chart = LineChart()
line_chart.title = "Profit Trend"
line_chart.x_axis.title = "Month"
line_chart.y_axis.title = "Profit ($)"
line_chart.style = 10
line_chart.width = 20
line_chart.height = 12

profit_data = Reference(ws, min_col=4, min_row=1, max_row=7)
line_chart.add_data(profit_data, titles_from_data=True)
line_chart.set_categories(categories)

# Add data labels
line_chart.series[0].graphicalProperties.line.width = 25000  # EMUs
line_chart.series[0].dLbls = DataLabelList()
line_chart.series[0].dLbls.showVal = True

ws.add_chart(line_chart, "F18")

# --- Pie Chart ---
pie_ws = wb.create_sheet("Pie Chart")
pie_ws["A1"] = "Category"
pie_ws["B1"] = "Amount"
pie_data_rows = [
    ["Electronics", 45000],
    ["Hardware", 30000],
    ["Software", 25000],
    ["Services", 15000],
]
for row in pie_data_rows:
    pie_ws.append(row)

pie_chart = PieChart()
pie_chart.title = "Revenue by Category"
pie_chart.width = 18
pie_chart.height = 14

pie_labels = Reference(pie_ws, min_col=1, min_row=2, max_row=5)
pie_values = Reference(pie_ws, min_col=2, min_row=1, max_row=5)
pie_chart.add_data(pie_values, titles_from_data=True)
pie_chart.set_categories(pie_labels)

# Show percentage labels
pie_chart.series[0].dLbls = DataLabelList()
pie_chart.series[0].dLbls.showPercent = True
pie_chart.series[0].dLbls.showCatName = True
pie_chart.series[0].dLbls.showVal = False

pie_ws.add_chart(pie_chart, "D2")

# --- Scatter Chart ---
scatter_ws = wb.create_sheet("Scatter")
scatter_ws.append(["Ad Spend", "Revenue"])
scatter_data_rows = [
    [1000, 12000], [2000, 18000], [3000, 22000],
    [4000, 28000], [5000, 32000], [6000, 35000],
]
for row in scatter_data_rows:
    scatter_ws.append(row)

scatter_chart = ScatterChart()
scatter_chart.title = "Ad Spend vs Revenue"
scatter_chart.x_axis.title = "Ad Spend ($)"
scatter_chart.y_axis.title = "Revenue ($)"
scatter_chart.width = 18
scatter_chart.height = 14

x_values = Reference(scatter_ws, min_col=1, min_row=2, max_row=7)
y_values = Reference(scatter_ws, min_col=2, min_row=2, max_row=7)
series = scatter_chart.series
from openpyxl.chart import Series
s = Series(y_values, x_values, title="Revenue")
scatter_chart.series.append(s)

# Add trendline
from openpyxl.chart.trendline import Trendline
s.trendline = Trendline(trendlineType="linear", dispRSqr=True, dispEq=True)

scatter_ws.add_chart(scatter_chart, "D2")

wb.save("charts.xlsx")
```

**xlsxwriter Charts**:

```python
import xlsxwriter

wb = xlsxwriter.Workbook("xlsxwriter_charts.xlsx")
ws = wb.add_worksheet("Data")

# Write data
headers = ["Month", "Revenue", "Cost", "Profit"]
data = [
    ["Jan", 15000, 8000, 7000],
    ["Feb", 18000, 9500, 8500],
    ["Mar", 22000, 11000, 11000],
    ["Apr", 19000, 10000, 9000],
    ["May", 25000, 12500, 12500],
    ["Jun", 28000, 14000, 14000],
]

bold = wb.add_format({"bold": True})
for col, h in enumerate(headers):
    ws.write(0, col, h, bold)
for r, row in enumerate(data, 1):
    for c, val in enumerate(row):
        ws.write(r, c, val)

# --- Clustered Bar Chart ---
bar_chart = wb.add_chart({"type": "column"})

bar_chart.add_series({
    "name": "=Data!$B$1",
    "categories": "=Data!$A$2:$A$7",
    "values": "=Data!$B$2:$B$7",
    "fill": {"color": "#2F5496"},
    "gap": 150,
})
bar_chart.add_series({
    "name": "=Data!$C$1",
    "values": "=Data!$C$2:$C$7",
    "fill": {"color": "#C00000"},
})

bar_chart.set_title({"name": "Monthly Revenue and Cost"})
bar_chart.set_x_axis({"name": "Month"})
bar_chart.set_y_axis({"name": "Amount ($)", "num_format": "$#,##0"})
bar_chart.set_size({"width": 720, "height": 400})
bar_chart.set_legend({"position": "bottom"})

ws.insert_chart("F2", bar_chart)

# --- Combo Chart (Bar + Line) ---
combo_chart = wb.add_chart({"type": "column"})

combo_chart.add_series({
    "name": "Revenue",
    "categories": "=Data!$A$2:$A$7",
    "values": "=Data!$B$2:$B$7",
    "fill": {"color": "#2F5496"},
})

# Add a line series on a secondary axis
line_series = wb.add_chart({"type": "line"})
line_series.add_series({
    "name": "Profit",
    "categories": "=Data!$A$2:$A$7",
    "values": "=Data!$D$2:$D$7",
    "line": {"color": "#00B050", "width": 2.5},
    "marker": {"type": "circle", "size": 6},
    "y2_axis": True,
})

combo_chart.combine(line_series)
combo_chart.set_title({"name": "Revenue (Bars) vs Profit (Line)"})
combo_chart.set_y_axis({"name": "Revenue ($)"})
combo_chart.set_y2_axis({"name": "Profit ($)"})
combo_chart.set_size({"width": 720, "height": 400})

ws.insert_chart("F22", combo_chart)

# --- Stacked Area Chart ---
area_chart = wb.add_chart({"type": "area", "subtype": "stacked"})
area_chart.add_series({
    "name": "Cost",
    "categories": "=Data!$A$2:$A$7",
    "values": "=Data!$C$2:$C$7",
    "fill": {"color": "#FFC000"},
})
area_chart.add_series({
    "name": "Profit",
    "values": "=Data!$D$2:$D$7",
    "fill": {"color": "#00B050"},
})
area_chart.set_title({"name": "Revenue Composition"})
area_chart.set_size({"width": 720, "height": 400})

ws.insert_chart("F42", area_chart)

wb.close()
```

**Data-Driven Chart Generation Helper**:

```python
from openpyxl import Workbook
from openpyxl.chart import BarChart, LineChart, PieChart, Reference


def add_chart_from_data(
    wb: Workbook,
    sheet_name: str,
    chart_type: str,
    title: str,
    data_range: tuple[int, int, int, int],
    category_col: int,
    anchor_cell: str = "A1",
    width: int = 18,
    height: int = 12,
) -> None:
    """Add a chart to a worksheet from a data range.

    Args:
        wb: The workbook containing the data.
        sheet_name: Name of the sheet with data.
        chart_type: One of "bar", "line", "pie".
        title: Chart title.
        data_range: Tuple of (min_col, min_row, max_col, max_row) for data series.
        category_col: Column number for category labels.
        anchor_cell: Cell where the chart top-left corner is placed.
        width: Chart width in cm.
        height: Chart height in cm.
    """
    ws = wb[sheet_name]
    min_col, min_row, max_col, max_row = data_range

    chart_classes = {
        "bar": BarChart,
        "line": LineChart,
        "pie": PieChart,
    }

    chart_cls = chart_classes.get(chart_type)
    if chart_cls is None:
        raise ValueError(f"Unsupported chart type: {chart_type}. Use: {list(chart_classes.keys())}")

    chart = chart_cls()
    chart.title = title
    chart.width = width
    chart.height = height

    data_ref = Reference(ws, min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row)
    categories = Reference(ws, min_col=category_col, min_row=min_row + 1, max_row=max_row)

    chart.add_data(data_ref, titles_from_data=True)
    chart.set_categories(categories)

    target_ws = ws
    target_ws.add_chart(chart, anchor_cell)


# Usage:
# wb = Workbook()
# ws = wb.active
# ws.append(["Month", "Revenue", "Cost"])
# for row in [["Jan", 15000, 8000], ["Feb", 18000, 9500]]:
#     ws.append(row)
# add_chart_from_data(wb, ws.title, "bar", "Revenue vs Cost", (2, 1, 3, 3), 1, "E2")
# wb.save("dynamic_chart.xlsx")
```

### Step 7: Advanced Features

Production Excel files often require features beyond basic data and formatting: autofilters for interactive exploration, freeze panes for navigation, print configuration for physical output, protection for controlled access, and VBA macro preservation for existing automation.

**Autofilters**:

```python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

headers = ["Product", "Category", "Revenue", "Region", "Status"]
ws.append(headers)

data = [
    ["Widget A", "Electronics", 15000, "North", "Active"],
    ["Widget B", "Hardware", 22000, "South", "Active"],
    ["Widget C", "Software", 8500, "East", "Inactive"],
    ["Widget D", "Electronics", 31200, "West", "Active"],
    ["Widget E", "Hardware", 12000, "North", "Pending"],
]
for row in data:
    ws.append(row)

# Enable autofilter on the data range
ws.auto_filter.ref = f"A1:E{len(data) + 1}"

# Pre-apply a filter (visible when file is opened)
# Note: openpyxl sets the filter definition but Excel applies it on open
ws.auto_filter.add_filter_column(4, ["Active"])  # Column E (0-indexed: 4)
ws.auto_filter.add_sort_condition("C2:C6")       # Sort by Revenue

wb.save("autofilter.xlsx")
```

**Freeze Panes**:

```python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Freeze the top row (header) and first column
ws.freeze_panes = "B2"
# "B2" means: freeze everything above row 2 and left of column B
# Result: row 1 and column A stay visible when scrolling

# Other common freeze configurations:
# ws.freeze_panes = "A2"   # Freeze top row only
# ws.freeze_panes = "B1"   # Freeze first column only
# ws.freeze_panes = "C3"   # Freeze rows 1-2 and columns A-B
# ws.freeze_panes = None    # Remove freeze panes

# Write headers and data
ws.append(["ID", "Name", "Revenue", "Cost", "Profit"])
for i in range(1, 101):
    ws.append([i, f"Product {i}", i * 100, i * 60, i * 40])

wb.save("freeze_panes.xlsx")
```

**Print Setup and Page Layout**:

```python
from openpyxl import Workbook
from openpyxl.worksheet.page import PageMargins

wb = Workbook()
ws = wb.active

# Page setup
ws.page_setup.paperSize = ws.PAPERSIZE_A4
ws.page_setup.orientation = ws.ORIENTATION_LANDSCAPE
ws.page_setup.fitToWidth = 1   # Fit all columns on one page width
ws.page_setup.fitToHeight = 0  # Allow multiple pages vertically
ws.page_setup.scale = 85       # 85% scale (ignored if fitTo is set)

# Margins (in inches)
ws.page_margins = PageMargins(
    left=0.5, right=0.5,
    top=0.75, bottom=0.75,
    header=0.3, footer=0.3,
)

# Header and footer
ws.oddHeader.center.text = "Quarterly Sales Report"
ws.oddHeader.right.text = "&D"   # Current date
ws.oddFooter.center.text = "Page &P of &N"  # Page X of Y
ws.oddFooter.left.text = "Confidential"

# Print titles: repeat row 1 on every printed page
ws.print_title_rows = "1:1"
# Repeat columns A-B on every page
ws.print_title_cols = "A:B"

# Print area: only print a specific range
ws.print_area = "A1:F50"

# Page breaks
ws.page_breaks.append(openpyxl.worksheet.pagebreak.Break(id=25))  # After row 25

# Gridlines and headings
ws.sheet_properties.pageSetUpPr.fitToPage = True
ws.print_options.gridLines = True       # Print gridlines
ws.print_options.horizontalCentered = True  # Center on page

wb.save("print_setup.xlsx")
```

**Password Protection**:

```python
from openpyxl import Workbook
from openpyxl.worksheet.protection import SheetProtection

wb = Workbook()
ws = wb.active

# Write data
ws.append(["Product", "Price", "Discount", "Final Price"])
ws.append(["Widget A", 100, 0.1, "=B2*(1-C2)"])
ws.append(["Widget B", 200, 0.15, "=B3*(1-C3)"])

# Protect the sheet with a password
ws.protection = SheetProtection(
    sheet=True,
    password=os.getenv("SHEET_PASSWORD", "changeme"),
    formatCells=False,      # Allow formatting
    formatColumns=False,    # Allow column width changes
    formatRows=False,       # Allow row height changes
    insertColumns=False,
    insertRows=False,
    insertHyperlinks=False,
    deleteColumns=True,     # Prevent column deletion
    deleteRows=True,        # Prevent row deletion
    selectLockedCells=False,
    sort=False,             # Allow sorting
    autoFilter=False,       # Allow filtering
    pivotTables=True,       # Prevent pivot table changes
    selectUnlockedCells=False,
)

# Unlock specific cells that users can edit (discount column)
from openpyxl.styles import Protection

unlocked = Protection(locked=False)
for row in range(2, 4):
    ws.cell(row=row, column=3).protection = unlocked

# Protect the workbook structure (prevent adding/removing sheets)
wb.security.workbookPassword = os.getenv("WORKBOOK_PASSWORD", "changeme")
wb.security.lockStructure = True

wb.save("protected.xlsx")
```

**VBA Macro Preservation**:

```python
from openpyxl import load_workbook

# Load a macro-enabled workbook (.xlsm) while preserving VBA
wb = load_workbook("template_with_macros.xlsm", keep_vba=True)
ws = wb.active

# Modify data without affecting macros
ws["A1"] = "Updated by automation"
ws["B1"] = 42

# Save as .xlsm to preserve macros
# IMPORTANT: Saving as .xlsx will strip all VBA code
wb.save("updated_with_macros.xlsm")
```

```python
# xlsxwriter: create a new .xlsm file with VBA from a binary
import xlsxwriter

wb = xlsxwriter.Workbook("new_macros.xlsm")
ws = wb.add_worksheet()

ws.write("A1", "Click the button to run the macro")

# Add VBA project from a .bin file extracted from an existing .xlsm
# Extract with: python -c "import zipfile; z=zipfile.ZipFile('source.xlsm'); z.extract('xl/vbaProject.bin')"
wb.add_vba_project("xl/vbaProject.bin")

# Optionally add a button that triggers a macro
# ws.insert_button("B3", {"macro": "MyMacro", "caption": "Run Report", "width": 128, "height": 30})

wb.close()
```

**Images and Embedded Objects**:

```python
from openpyxl import Workbook
from openpyxl.drawing.image import Image

wb = Workbook()
ws = wb.active

# Add an image
img = Image("company_logo.png")
img.width = 200   # Pixels
img.height = 80
ws.add_image(img, "A1")

# Add a second image positioned elsewhere
chart_img = Image("exported_chart.png")
chart_img.width = 600
chart_img.height = 400
ws.add_image(chart_img, "D5")

# Write data below the logo
ws["A6"] = "Report starts here"

wb.save("with_images.xlsx")
```

**Hyperlinks**:

```python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# External URL
ws["A1"] = "Visit Website"
ws["A1"].hyperlink = "https://example.com"
ws["A1"].style = "Hyperlink"  # Built-in hyperlink style

# Link to another sheet in the same workbook
ws2 = wb.create_sheet("Details")
ws["A2"] = "Go to Details"
ws["A2"].hyperlink = "#Details!A1"
ws["A2"].style = "Hyperlink"

# Link to a file
ws["A3"] = "Open Document"
ws["A3"].hyperlink = "file:///C:/reports/summary.pdf"
ws["A3"].style = "Hyperlink"

# Email link
ws["A4"] = "Send Email"
ws["A4"].hyperlink = "mailto:reports@example.com?subject=Monthly%20Report"
ws["A4"].style = "Hyperlink"

wb.save("hyperlinks.xlsx")
```

### Step 8: Pandas Integration and Data Pipelines

Pandas provides the fastest path from data analysis to Excel output. The `DataFrame.to_excel()` method delegates to openpyxl or xlsxwriter under the hood, while `pd.read_excel()` handles ingestion.

**Basic DataFrame to Excel**:

```python
import pandas as pd

# Create sample data
df = pd.DataFrame({
    "Product": ["Widget A", "Widget B", "Widget C", "Widget D"],
    "Revenue": [15000.50, 22000.75, 8500.00, 31200.25],
    "Cost": [8000, 12000, 5000, 18000],
    "Units Sold": [150, 220, 85, 312],
    "Date": pd.to_datetime(["2026-01-15", "2026-02-01", "2026-02-15", "2026-03-01"]),
})
df["Profit"] = df["Revenue"] - df["Cost"]
df["Margin"] = df["Profit"] / df["Revenue"]

# Simple export
df.to_excel("basic_export.xlsx", index=False, sheet_name="Sales")

# Export with xlsxwriter engine for better formatting control
df.to_excel(
    "formatted_export.xlsx",
    index=False,
    sheet_name="Sales",
    engine="xlsxwriter",
    float_format="%.2f",
)
```

**Multi-Sheet Export**:

```python
import pandas as pd

# Multiple DataFrames to separate sheets
sales_df = pd.DataFrame({
    "Product": ["Widget A", "Widget B"],
    "Revenue": [15000, 22000],
})

inventory_df = pd.DataFrame({
    "Product": ["Widget A", "Widget B"],
    "Stock": [500, 300],
    "Reorder Point": [100, 50],
})

summary_df = pd.DataFrame({
    "Metric": ["Total Revenue", "Total Stock", "Products"],
    "Value": [37000, 800, 2],
})

# Use ExcelWriter for multi-sheet output
with pd.ExcelWriter("multi_sheet.xlsx", engine="openpyxl") as writer:
    summary_df.to_excel(writer, sheet_name="Summary", index=False)
    sales_df.to_excel(writer, sheet_name="Sales", index=False)
    inventory_df.to_excel(writer, sheet_name="Inventory", index=False)
```

**Styled Export with xlsxwriter Engine**:

```python
import pandas as pd

df = pd.DataFrame({
    "Product": ["Widget A", "Widget B", "Widget C", "Widget D"],
    "Revenue": [15000, 22000, 8500, 31200],
    "Cost": [8000, 12000, 5000, 18000],
    "Margin": [0.4667, 0.4545, 0.4118, 0.4231],
})

with pd.ExcelWriter("styled_pandas.xlsx", engine="xlsxwriter") as writer:
    df.to_excel(writer, sheet_name="Report", index=False, startrow=1)

    wb = writer.book
    ws = writer.sheets["Report"]

    # Title row
    title_fmt = wb.add_format({
        "bold": True, "font_size": 16, "font_color": "#2F5496",
    })
    ws.write("A1", "Sales Performance Report", title_fmt)

    # Header formatting
    header_fmt = wb.add_format({
        "bold": True, "bg_color": "#2F5496", "font_color": "#FFFFFF",
        "border": 1, "align": "center",
    })
    for col_num, header in enumerate(df.columns):
        ws.write(1, col_num, header, header_fmt)

    # Column formats
    currency_fmt = wb.add_format({"num_format": "$#,##0", "border": 1})
    percent_fmt = wb.add_format({"num_format": "0.0%", "border": 1})

    ws.set_column("A:A", 20)
    ws.set_column("B:C", 15, currency_fmt)
    ws.set_column("D:D", 12, percent_fmt)

    # Conditional formatting on margin column
    red_fmt = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})
    green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})

    ws.conditional_format("D3:D6", {
        "type": "cell", "criteria": "<", "value": 0.45, "format": red_fmt,
    })
    ws.conditional_format("D3:D6", {
        "type": "cell", "criteria": ">=", "value": 0.45, "format": green_fmt,
    })

    # Add a chart
    chart = wb.add_chart({"type": "column"})
    chart.add_series({
        "name": "Revenue",
        "categories": "=Report!$A$3:$A$6",
        "values": "=Report!$B$3:$B$6",
        "fill": {"color": "#2F5496"},
    })
    chart.set_title({"name": "Revenue by Product"})
    chart.set_size({"width": 500, "height": 300})
    ws.insert_chart("F3", chart)
```

**Pandas Styler for Conditional Formatting**:

```python
import pandas as pd

df = pd.DataFrame({
    "Product": ["Widget A", "Widget B", "Widget C", "Widget D"],
    "Revenue": [15000, 22000, 8500, 31200],
    "Cost": [8000, 12000, 5000, 18000],
    "Profit": [7000, 10000, 3500, 13200],
    "Margin": [0.4667, 0.4545, 0.4118, 0.4231],
})

def highlight_negative(val):
    """Highlight negative values in red."""
    color = "color: #9C0006; background-color: #FFC7CE" if val < 0 else ""
    return color

def highlight_high_margin(val):
    """Highlight margins above 45% in green."""
    if val > 0.45:
        return "color: #006100; background-color: #C6EFCE"
    return ""

# Apply styles
styled = (
    df.style
    .format({
        "Revenue": "${:,.0f}",
        "Cost": "${:,.0f}",
        "Profit": "${:,.0f}",
        "Margin": "{:.1%}",
    })
    .map(highlight_negative, subset=["Profit"])
    .map(highlight_high_margin, subset=["Margin"])
    .bar(subset=["Revenue"], color="#D6E4F0", vmin=0)
    .set_caption("Sales Performance Report")
    .set_table_styles([
        {"selector": "th", "props": [
            ("background-color", "#2F5496"),
            ("color", "white"),
            ("font-weight", "bold"),
        ]},
    ])
)

# Export to Excel (uses openpyxl engine)
styled.to_excel("styler_export.xlsx", engine="openpyxl", index=False)
```

**Read, Transform, Write Pipeline**:

```python
import pandas as pd
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.styles import Font, PatternFill


def excel_etl_pipeline(
    input_path: str | Path,
    output_path: str | Path,
    transformations: dict | None = None,
) -> dict:
    """Read an Excel file, apply transformations, and write a formatted output.

    Args:
        input_path: Path to the source Excel file.
        output_path: Path for the output Excel file.
        transformations: Optional dict of column_name -> callable transforms.

    Returns:
        Dict with row counts and sheet names processed.
    """
    input_path = Path(input_path)
    output_path = Path(output_path)

    # Read all sheets
    all_sheets = pd.read_excel(input_path, sheet_name=None, engine="openpyxl")
    stats = {"sheets_processed": [], "total_rows": 0}

    with pd.ExcelWriter(output_path, engine="openpyxl") as writer:
        for sheet_name, df in all_sheets.items():
            # Apply transformations
            if transformations:
                for col, transform_fn in transformations.items():
                    if col in df.columns:
                        df[col] = df[col].apply(transform_fn)

            # Remove fully empty rows
            df = df.dropna(how="all")

            # Write to output
            df.to_excel(writer, sheet_name=sheet_name, index=False)
            stats["sheets_processed"].append(sheet_name)
            stats["total_rows"] += len(df)

    # Post-process: apply formatting with openpyxl
    wb = load_workbook(output_path)
    header_font = Font(bold=True, color="FFFFFF")
    header_fill = PatternFill(start_color="2F5496", fill_type="solid")

    for ws in wb.worksheets:
        # Style headers
        for cell in ws[1]:
            cell.font = header_font
            cell.fill = header_fill

        # Auto-adjust column widths
        for col in ws.columns:
            max_length = 0
            col_letter = col[0].column_letter
            for cell in col:
                if cell.value:
                    max_length = max(max_length, len(str(cell.value)))
            ws.column_dimensions[col_letter].width = min(max_length + 4, 50)

        # Freeze header row
        ws.freeze_panes = "A2"

        # Enable autofilter
        if ws.max_row > 1:
            ws.auto_filter.ref = ws.dimensions

    wb.save(output_path)
    return stats


# Usage:
# stats = excel_etl_pipeline(
#     "raw_data.xlsx",
#     "processed_report.xlsx",
#     transformations={
#         "revenue": lambda x: round(x, 2) if pd.notna(x) else 0,
#         "name": lambda x: str(x).strip().title() if pd.notna(x) else "",
#     },
# )
# print(f"Processed {stats['total_rows']} rows across {len(stats['sheets_processed'])} sheets")
```

**Batch Report Generation**:

```python
import pandas as pd
from pathlib import Path


def generate_regional_reports(
    data: pd.DataFrame,
    output_dir: str | Path,
    group_column: str = "Region",
) -> list[Path]:
    """Generate one Excel report per group (e.g., per region).

    Args:
        data: Source DataFrame with all regions.
        output_dir: Directory to write individual report files.
        group_column: Column name to group/split by.

    Returns:
        List of generated file paths.
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    generated_files = []

    for group_name, group_df in data.groupby(group_column):
        safe_name = str(group_name).replace(" ", "_").lower()
        filepath = output_dir / f"report_{safe_name}.xlsx"

        with pd.ExcelWriter(filepath, engine="xlsxwriter") as writer:
            # Summary sheet
            summary = group_df.describe()
            summary.to_excel(writer, sheet_name="Summary")

            # Detail sheet
            group_df.to_excel(writer, sheet_name="Detail", index=False)

            # Format the detail sheet
            wb = writer.book
            ws = writer.sheets["Detail"]

            header_fmt = wb.add_format({
                "bold": True, "bg_color": "#2F5496",
                "font_color": "#FFFFFF", "border": 1,
            })

            for col_num, col_name in enumerate(group_df.columns):
                ws.write(0, col_num, col_name, header_fmt)
                # Auto-fit column width
                max_len = max(
                    group_df[col_name].astype(str).map(len).max(),
                    len(col_name),
                )
                ws.set_column(col_num, col_num, min(max_len + 2, 40))

        generated_files.append(filepath)

    return generated_files


# Usage:
# df = pd.DataFrame({
#     "Region": ["North", "North", "South", "South", "East", "East"],
#     "Product": ["A", "B", "A", "B", "A", "B"],
#     "Revenue": [15000, 22000, 18000, 12000, 25000, 19000],
# })
# files = generate_regional_reports(df, "output/regional_reports")
# for f in files:
#     print(f"Generated: {f}")
```

**Common Pitfalls and Best Practices**:

- **Floating-point precision**: Use `Decimal` for financial data in Python. openpyxl converts to float internally, so round before writing: `float(Decimal("15000.50").quantize(Decimal("0.01")))`
- **Date handling**: Always pass `datetime` or `date` objects, not strings. Set `number_format` explicitly to control display
- **Memory with large files**: Use xlsxwriter's `constant_memory` mode or ExcelJS streaming API for datasets over 100,000 rows
- **Formula evaluation**: XLSX libraries write formula text, not computed values. The formulas are evaluated only when the file is opened in Excel. If you need pre-computed values, calculate them in Python and write the results
- **data_only mode in openpyxl**: Returns the last cached value from when the file was saved by Excel. If the file was never opened in Excel, formula cells return `None`
- **String length limit**: Excel cells support a maximum of 32,767 characters. Truncate long strings before writing
- **Sheet name limits**: Sheet names cannot exceed 31 characters and cannot contain `\ / * ? : [ ]`
- **Column limit**: XLSX supports up to 16,384 columns (XFD). Validate wide DataFrames before export
- **File locking**: On Windows, Excel locks open files. Catch `PermissionError` and prompt users to close the file
- **Encoding**: XLSX is UTF-8 internally. Special characters, CJK text, and emoji work without extra configuration
Get xlsx-generation.

vz-bench-debug

vz-scrape-runner

Think you can beat it?