Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install jamie-bitflight-claude-skills-plugins-python-engineering-skills-python3-datagit clone https://github.com/Jamie-BitFlight/claude_skills.gitcp claude_skills/SKILL.MD ~/.claude/skills/jamie-bitflight-claude-skills-plugins-python-engineering-skills-python3-data/SKILL.md---
name: python3-data
description: Specialist skill for Python data engineering — pandas, polars, DuckDB, numpy, ETL pipelines, tabular data ingestion, and notebook-to-module extraction. Use when working with dataframes, data validation at ingress boundaries, merge/join operations, typed column contracts, or choosing between pandas vs polars vs DuckDB for a data task.
user-invocable: false
---
# Python Data
Load `python3-core` for standing defaults. Load `python3-typing` for boundary schemas. Load `python3-testing` for parser and edge-case tests.
## Quality Checklist
- [ ] Schema validated at first stable ingress point — not deep in transforms
- [ ] `dtype=` explicit in `pd.read_csv()` / `pd.read_excel()` — never rely on inference
- [ ] No raw `pd.DataFrame` crossing module boundaries without documented column contract
- [ ] Merge/join results checked for unexpected nulls and row count changes
- [ ] `model_config = {"strict": True}` on all Pydantic boundary models
- [ ] No `inplace=True` — deprecated, returns `None`, causes silent bugs
- [ ] Notebook logic that survived 3+ uses extracted into tested modules
## Gotchas
| Trap | What to do instead |
|---|---|
| `df["a"]["b"] = x` (chained indexing) | `df.loc[:, "b"] = x` — chained indexing silently fails |
| `.apply(lambda)` on large frames | Vectorized ops first; `.apply()` only when no vectorized path exists |
| `pd.merge()` without post-check | Assert no unexpected nulls or duplicate keys after merge |
| `df.drop(..., inplace=True)` | `df = df.drop(...)` — `inplace` is deprecated and returns `None` |
| Bare `pd.read_csv(path)` | Always pass `dtype=` to prevent silent type inference errors |
## Decision Table
| Task | Use | Not |
|---|---|---|
| Tabular < 1M rows | pandas | Polars (overhead not justified) |
| Tabular > 1M rows or need speed | Polars | pandas |
| SQL-like analytics on local files | DuckDB | Loading everything into pandas |
| Read-only TOML config | `tomllib` (stdlib, binary mode `"rb"`) | `tomlkit` |
| Read/write TOML preserving comments | `tomlkit` (text mode) | `tomllib` |
## Module Layout
```text
etl/
├── ingest.py # raw data loading (boundary)
├── validate.py # schema validation (boundary)
├── transform.py # business logic (typed core)
├── load.py # output writing (boundary)
└── types.py # shared typed models
```