DataJamie-BitFlightFree

python3-data

Specialist skill for Python data engineering — pandas, polars, DuckDB, numpy, ETL pipelines, tabular data ingestion, and notebook-to-module extraction. Use when working with dataframes, data validation at ingress boundaries, merge/join operations, typed column contracts, or choosing between pandas vs polars vs DuckDB for a data task.

Repo bundle on VersuzJamie-BitFlight/claude_skills264 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/Jamie-BitFlight/claude_skills Yours? Claim it ↗

§ 01 — Stats

Stars44

Prior1140

Quality—

Score—

Tasks—

§ 02 — Install

Get python3-data.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install jamie-bitflight-claude-skills-plugins-python-engineering-skills-python3-data

Or clone the repo

$git clone https://github.com/Jamie-BitFlight/claude_skills.git

Or copy the SKILL.md manually

cp claude_skills/SKILL.MD ~/.claude/skills/jamie-bitflight-claude-skills-plugins-python-engineering-skills-python3-data/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge jamie-bitflight-claude-skills-plugins-python-engineering-skills-python3-data↵

Show SKILL.md content (~637 tokens)

---
name: python3-data
description: Specialist skill for Python data engineering — pandas, polars, DuckDB, numpy, ETL pipelines, tabular data ingestion, and notebook-to-module extraction. Use when working with dataframes, data validation at ingress boundaries, merge/join operations, typed column contracts, or choosing between pandas vs polars vs DuckDB for a data task.
user-invocable: false
---

# Python Data

Load `python3-core` for standing defaults. Load `python3-typing` for boundary schemas. Load `python3-testing` for parser and edge-case tests.

## Quality Checklist

- [ ] Schema validated at first stable ingress point — not deep in transforms
- [ ] `dtype=` explicit in `pd.read_csv()` / `pd.read_excel()` — never rely on inference
- [ ] No raw `pd.DataFrame` crossing module boundaries without documented column contract
- [ ] Merge/join results checked for unexpected nulls and row count changes
- [ ] `model_config = {"strict": True}` on all Pydantic boundary models
- [ ] No `inplace=True` — deprecated, returns `None`, causes silent bugs
- [ ] Notebook logic that survived 3+ uses extracted into tested modules

## Gotchas

| Trap | What to do instead |
|---|---|
| `df["a"]["b"] = x` (chained indexing) | `df.loc[:, "b"] = x` — chained indexing silently fails |
| `.apply(lambda)` on large frames | Vectorized ops first; `.apply()` only when no vectorized path exists |
| `pd.merge()` without post-check | Assert no unexpected nulls or duplicate keys after merge |
| `df.drop(..., inplace=True)` | `df = df.drop(...)` — `inplace` is deprecated and returns `None` |
| Bare `pd.read_csv(path)` | Always pass `dtype=` to prevent silent type inference errors |

## Decision Table

| Task | Use | Not |
|---|---|---|
| Tabular < 1M rows | pandas | Polars (overhead not justified) |
| Tabular > 1M rows or need speed | Polars | pandas |
| SQL-like analytics on local files | DuckDB | Loading everything into pandas |
| Read-only TOML config | `tomllib` (stdlib, binary mode `"rb"`) | `tomlkit` |
| Read/write TOML preserving comments | `tomlkit` (text mode) | `tomllib` |

## Module Layout

```text
etl/
├── ingest.py      # raw data loading (boundary)
├── validate.py    # schema validation (boundary)
├── transform.py   # business logic (typed core)
├── load.py        # output writing (boundary)
└── types.py       # shared typed models
```