Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-00-full-empirical-analysis-skill-statspaigit clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.gitcp Awesome-Agent-Skills-for-Empirical-Research/SKILL.MD ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-00-full-empirical-analysis-skill-statspai/SKILL.md---
name: StatsPAI_skill
description: Use when the user asks to run a full empirical / causal analysis in Python — by default in the style of an applied economics paper (AER / QJE / JPE / ReStud / AEJ) with DID / RD / IV / SCM / DML / matching, written-out estimating equation + identifying assumption, Table 1 / Table 2 / event-study figure / robustness gauntlet — OR in epidemiology / public health style (target-trial emulation, IPTW + g-formula + TMLE triplet, Mendelian randomization, KM/AFT survival, E-value sensitivity, STROBE/TRIPOD reporting) — OR in ML causal inference style (DML, S/T/X/R/DR meta-learners, causal forest, Dragonnet/TARNet/CEVAE, BCF, CATE distribution, policy learning, conformal causal, fairness audit, causal discovery). Also covers exporting multi-column regression tables to Word / Excel / LaTeX (Stata outreg2 / esttab / R modelsummary equivalent) and bundling an entire replication appendix into one .docx / .xlsx / .tex file. Triggers on keywords "StatsPAI", "statspai", "AER empirical analysis", "applied micro pipeline", "Table 1 balance", "event study", "first-stage F", "Oster bound", "honest_did", "spec_curve", "callaway_santanna", "dragonnet", "text as treatment", "outreg2 in Python", "regression table to Word/Excel", "sp.regtable", "sp.collect", "sp.paper_tables", "sp.feols", "summary_col", "modelsummary", "AER style table", "QJE style table", "epidemiology pipeline", "target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE", "TRIPOD", "公共健康", "流行病学", "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "conformal causal", "policy learning", "因果机器学习", "ML causal".
triggers:
- causal inference in python
- applied microeconomics pipeline
- AER empirical analysis
- QJE style robustness
- DID IV RD SCM
- callaway_santanna
- synthetic control
- double machine learning
- causal forest
- event study plot
- first stage F-statistic
- Oster bound
- honest_did
- spec_curve
- estimand-first DSL
- LLM-assisted DAG discovery
- text as treatment
- export regression table to Word
- export regression table to Excel
- regression table docx
- regression table xlsx
- outreg2 in Python
- summary_col equivalent
- modelsummary equivalent
- AER house style table
- QJE house style table
- journal template regression
- Stata collect equivalent
- replication bundle
- sp.regtable
- sp.collect
- sp.paper_tables
- sp.feols
- sp.cite
- high-dim fixed effects
- two-way clustering
- StatsPAI
- statspai
- fmt auto regression table
- magnitude-adaptive coefficient formatting
- mixed magnitude coefficients
- sumstats by_labels
- Control Treated auto labels
- epidemiology pipeline
- public health causal inference
- target trial emulation
- g-formula
- IPTW marginal structural model
- TMLE doubly robust
- HAL-TMLE
- Mendelian randomization
- MR-Egger weighted median
- STROBE TRIPOD reporting
- E-value sensitivity
- Kaplan-Meier AFT survival
- 流行病学
- 公共健康
- ML causal inference
- double machine learning DML
- meta-learner S T X R DR
- causal forest GRF
- Dragonnet TARNet CEVAE
- Bayesian causal forest BCF
- CATE distribution
- policy tree
- off-policy evaluation
- conformal causal prediction
- fairness audit
- causal discovery PC NOTEARS
- 因果机器学习
---
# StatsPAI: Agent-Native Causal Inference & AER-Style Empirical Workflow
StatsPAI is the agent-native Python package for causal inference and applied econometrics: one `import statspai as sp`, 900+ functions behind a self-describing API, and `CausalResult` objects that export to LaTeX / Word / Excel / BibTeX.
This skill drives StatsPAI through the **canonical pipeline of an applied AER empirical paper**. Each step maps to a section of the published paper and emits a paper-ready artifact (Table 1, event-study figure, Table 2 main results, robustness panel, replication stamp).
- **Source**: https://github.com/brycewang-stanford/StatsPAI
- **Install**: `pip install statspai` (>= 1.6)
- **Paper**: submitted to JOSS (under review)
## Why for Agents
1. **Self-describing**: `sp.list_functions()` / `sp.describe_function(name)` / `sp.function_schema(name)` — every public symbol is discoverable without doc lookup.
2. **Unified result**: every estimator returns `CausalResult` with `.summary()`, `.plot()`, `.diagnostics`, `.to_latex()`, `.to_word()`, `.cite()`.
3. **One import, full pipeline**: data contract → Table 1 → estimand-first DSL → identification graphs → main table → heterogeneity → mechanisms → robustness → replication package.
4. **Estimand-first**: `sp.causal_question(...).identify()` forces the "DID vs RD vs IV?" decision *before* estimation, with the identifying assumption written down — the way a referee expects to read it.
## The AER-style empirical pipeline
The skill mirrors the canonical sections of an applied AER / QJE / AEJ paper. Each step below is one paper section and one set of artifacts on disk.
```
Paper section Step StatsPAI moves
─────────────────────────── ───── ────────────────────────────────────────────────
Pre-Analysis Plan −1 sp.power.* + freeze IdentificationPlan to disk
§1. Data 0 data_contract + sample-construction log (footnote 4)
§1.1 Descriptives (Table 1) 1 sp.sumstats · sp.balance_table · sp.describe
§2. Empirical Strategy 2 write equation + identifying assumption + sp.causal_question
(LLM-DAG addendum) 2.5 sp.llm_dag_propose · validate · constrained
§3. Identification graphics 3 event-study · first-stage F · McCrary · love plot
§4. Main Results (Table 2) 4 progressive controls + FE (sp.regtable / sp.causal)
§5. Heterogeneity (Table 3) 5 sp.subgroup_analysis · sp.continuous_did · CATE
§6. Mechanisms 6 sp.mediation · sp.decompose
§7. Robustness gauntlet 7 placebo · Oster · honest_did · E-value · 2-way / Conley SE · spec_curve
§8. Replication package 8 .to_latex() · .plot() · reproducibility stamp
```
> **All code blocks below share one running example (`training → wage`, with `worker_id / firm_id / year / age / edu / tenure`) purely for readability.** Column names, `population`, `estimand`, and `design` values are **illustrative** — substitute the user's actual columns and research question. Only `sp.*` function names and argument *shapes* are normative.
## Three domain modes (default = AER econ; alternates = epi & ML-causal)
The default playbook above is **AER-style applied econometrics** — the AEA convention: written-out estimating equation, identifying assumption table, design horse-race, full robustness gauntlet. The skill **also** ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same export stack (`sp.regtable / sp.collect / sp.paper_tables`) and result objects:
| Mode | Reader convention | Identification stack | Reporting stack | Jump to |
|---|---|---|---|---|
| **Default — Applied Econ (AER / QJE / AEJ)** | "Show the equation + identifying assumption + design horse-race; controls visible; clustered SE" | DID / IV / RD / SCM / matching / `feols` HDFE | AER house-style multi-column `regtable` + 8-section paper layout | §−1 → §8 (entire playbook above) |
| **Mode A — Epidemiology / Public Health** | "STROBE / TRIPOD-AI; target trial protocol; doubly-robust estimand; absolute & relative risk; KM survival" | Target-trial emulation · IPTW · g-formula · TMLE · Mendelian randomization · KM/AFT | Same `regtable` + `collect`, with risk-difference / hazard-ratio / E-value rows | §A. Epidemiology pipeline |
| **Mode B — ML Causal Inference** | "DML / meta-learners / causal forest / DR-learner; CATE distribution; policy value" | DML · S/T/X/R/DR-Learner · GRF causal forest · Dragonnet/TARNet/CEVAE · BCF · matrix completion | `regtable` ML horse-race + `cate_plot` + policy-value table + `conformal_causal` PI | §B. ML causal pipeline |
**How to invoke a non-default mode** (Claude / agent picks this up from the user's wording):
| User says... | Mode the skill switches to |
|---|---|
| "Run a DID / IV / RD / event study", "AER table", "applied micro" | Default (AER econ) |
| "Target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE / TRIPOD", "公共健康 / 流行病学", "epi pipeline", "RWE study", "cohort study", "case-control" | Mode A (Epi) |
| "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "Dragonnet", "BCF", "policy learning", "conformal causal", "ML causal", "uplift modeling", "因果机器学习" | Mode B (ML causal) |
| "Mix" (e.g. "estimate DID + then ML CATE on the heterogeneity") | Default + Mode B in sequence — every estimator returns the same `CausalResult`, drop them all into one `sp.regtable(...)` for the horse-race column |
The three modes share **the same export stack, the same `CausalResult` interface, and the same `sp.causal_question(...).identify()` estimand-first DSL** — switching modes only changes which Step 4 estimators you reach for, not the surrounding scaffolding. If you only want descriptive stats / Table 1 / a balance check, the AER `sp.sumstats` / `sp.mean_comparison` / `sp.collect` calls work in all three modes.
## Paper-ready figure & table inventory (what to produce by section)
A modern AER paper has **5–7 figures** and **3–5 main tables** + an appendix robustness table. Every step below should leave at least one numbered artifact on disk. Default file names assume parallel `.tex` / `.docx` / `.xlsx` exports (the agent should produce all three so co-authors can edit in Word / Excel and the build system can use LaTeX):
| § | Artifact | StatsPAI primitive | Filenames (write all three) |
|---|---|---|---|
| §1 | **Figure 1**: raw trends / treatment rollout | `sp.parallel_trends_plot` · `sp.treatment_rollout_plot` | `figures/fig1_trends.png` |
| §1 | **Table 1**: summary stats (full / treated / control + Δ) | `sp.sumstats` + `sp.mean_comparison(...).to_word()/.to_excel()` (or `sp.collect().add_summary().add_balance()`) | `tables/table1_summary.{tex,docx,xlsx}` |
| §3 | **Figure 2**: identification graphic (event-study / first-stage / McCrary / RD scatter / SCM trajectory) | `sp.enhanced_event_study_plot` · `sp.binscatter` · `sp.rdplot` · `sp.rddensity().plot()` · `sp.synthdid_plot` | `figures/fig2_identification.png` |
| §4 | **Table 2**: main results — progressive controls | `rt = sp.regtable(M1...M5, template="aer"); rt.to_word(...); rt.to_excel(...)` | `tables/table2_main.{tex,docx,xlsx}` |
| §4 | **Table 2-bis**: design horse-race (OLS / IV / DID / DML) | `sp.regtable(ols, iv, did, dml, ...).to_word/.to_excel` | `tables/table2b_designs.{tex,docx,xlsx}` |
| §4 | **Figure 3** (optional): coefficient plot across specs | `sp.coefplot(M1, M2, M3, M4)` | `figures/fig3_coef.png` |
| §5 | **Table 3**: heterogeneity by subgroup | `sp.regtable(g_full, g_male, g_fem, g_q1...q4).to_word/.to_excel` | `tables/table3_heterogeneity.{tex,docx,xlsx}` |
| §5 | **Figure 4**: dose-response / CATE | `sp.dose_response(...).plot()` · `sp.cate_plot` · `sp.cate_group_plot` | `figures/fig4_cate.png` |
| §6 | **Table 4**: mechanisms (mediation / decomposition) | `sp.regtable(total, direct, indirect).to_word/.to_excel` | `tables/table4_mechanisms.{tex,docx,xlsx}` |
| §7 | **Table A1**: robustness master (one row per check) | `sp.regtable(rob1...robN, panel_labels=[...]).to_word/.to_excel` — or `sp.paper_tables(robustness=[...]).to_docx()` | `tables/tableA1_robustness.{tex,docx,xlsx}` |
| §7 | **Figure 5**: spec curve | `sp.spec_curve(...).plot()` | `figures/fig5_spec_curve.png` |
| §7 | **Figure 6**: sensitivity dashboard / Cinelli–Hazlett contour | `sp.sensitivity_dashboard` · `sp.sensitivity_plot` | `figures/fig6_sensitivity.png` |
| §8 | **Replication bundle**: all tables in one Word/Excel/LaTeX file | `sp.collect("Paper").add_summary(...).add_regression(...)...save("paper.{docx,xlsx,tex}")` — or `sp.paper_tables(main=, heterogeneity=, robustness=, placebo=).to_docx/.to_xlsx` | `replication/paper.{docx,xlsx,tex}` |
> Every `CausalResult` and OLS model can be passed straight into `sp.regtable(...)`, `sp.coefplot(...)`, **and `sp.collect()`**. Don't hand-roll LaTeX, and don't render Word/Excel from pandas — the export functions apply book-tab borders, AER-style stars, and the right SE label automatically.
---
## Export cookbook — Word / Excel / LaTeX in one line
StatsPAI's export stack is the agent-native equivalent of Stata's `outreg2` / `esttab` / `collect` and R's `modelsummary` / `gtsummary`. Three tiers, picked by **scope** of what you're exporting:
| Tier | Use when | API | Hot kwargs |
|---|---|---|---|
| **1. Single multi-column table** (the outreg2 / `summary_col` equivalent) | Exporting *one* Table 2 / Table 3 / Table A1 with progressive columns | `rt = sp.regtable(M1, M2, ..., template="aer", title=...)` *(default: all coefs incl. intercept)*<br>`rt.to_word("table2.docx")`<br>`rt.to_excel("table2.xlsx")`<br>`rt.to_latex()` · `rt.to_markdown()` | `template`, `coef_labels`, `model_labels`, `panel_labels`, `dep_var_labels`, `stats`, `stars`, `add_rows`; opt-in filters: `drop=["Intercept"]` (suppress constant), `keep=[focal]` (focal-only) |
| **2. Multi-panel paper format** (Tables 2 + 3 + A1 + A2 in one file) | Producing the *paper-tables block* — main + heterogeneity + robustness + placebo as a single document | `pt = sp.paper_tables(main=[M1...M5], heterogeneity=[H1,H2,H3], robustness=[R1...Rn], placebo=[P1,P2], template="aer")`<br>`pt.to_docx("paper_tables.docx")`<br>`pt.to_xlsx("paper_tables.xlsx")`<br>`pt.to_latex(...)` | `main`, `heterogeneity`, `robustness`, `placebo`, `template`, `coef_labels`, `model_labels_<panel>`, `keep` |
| **3. Full session bundle** (Stata 15 `collect` equivalent) | Replication appendix that mixes summary stats + balance + multiple regression tables + headings + prose in **one** file | `c = sp.collect("Paper title", template="aer")`<br>`c.add_heading("§1. Descriptives")`<br>`c.add_summary(df, vars=...)`<br>`c.add_balance(df, treatment=, variables=...)`<br>`c.add_regression(M1, M2, ..., title="Table 2")`<br>`c.add_text("Notes ...")`<br>`c.save("paper.docx")` (auto-detect by extension; `.xlsx`/`.tex`/`.md`/`.html`/`.txt` all work) | `add_heading(level=)`, `add_summary(stats=, labels=)`, `add_balance(weights=, test=)`, `add_regression(**regtable_kwargs)`, `add_table(result)`, `add_text(...)` |
**Journal templates** (apply the right SE label, star levels, and notes automatically):
```python
sp.list_journal_templates()
# → ('aer', 'qje', 'econometrica', 'restat', 'jf', 'aeja', 'jpe', 'restud')
rt = sp.regtable(M1, M2, M3, template="qje") # QJE styling; default = full coef list (incl. intercept)
rt.to_word("table2_qje.docx")
# Opt-in filters:
# • drop the constant only: sp.regtable(M1, M2, M3, template="qje", drop=["Intercept"])
# • focal-coefficient only: sp.regtable(M1, M2, M3, template="qje", keep=["x"])
sp.get_journal_template("aer") # inspect a preset
# → {'label': 'American Economic Review', 'star_levels': [0.1, 0.05, 0.01],
# 'se_label': 'Standard errors', 'stats': ['N', 'R-squared'],
# 'notes_default': [...], 'font_name': 'Times New Roman'}
```
**Inline citations in prose** (drop a coefficient straight into a sentence):
```python
sp.cite(M3, "training") # → "1.239*** (0.153)"
sp.cite(M3, "training", output="latex") # → "$1.239^{***}$ ($0.153$)"
```
> **Naming gotcha**: `sp.regtable(..., output="docx")` is invalid — the enum is `{"text", "latex", "tex", "html", "markdown", "md", "qmd", "quarto", "word", "excel"}`. Use `output="word"` / `"excel"`, or — simpler — drop `output=` and call `.to_word(filename)` / `.to_excel(filename)` on the result.
---
## Step −1 — Pre-Analysis Plan (pre-data; AEA RCT Registry style)
`sp.power(design, n=..., effect_size=..., power_target=...)` is a unified dispatcher — leave one argument `None` to solve for it (sample size, MDE, or power). Convenience wrappers: `sp.power_rct`, `sp.power_did`, `sp.power_rd`, `sp.power_iv`, `sp.power_cluster_rct`, `sp.power_ols`.
```python
# Always go through the dispatcher when you want auto-solve. The
# `sp.power_<design>` wrappers (power_rct / power_did / power_rd /
# power_iv / power_cluster_rct / power_ols) accept *only* the design's
# native arguments — they will NOT solve for power_target / n / effect
# unless you go via `sp.power(design, ..., power_target=...)`.
sp.power("rct", effect_size=0.3, power_target=0.80) # → PowerResult(n=349, power=0.80)
sp.power("did", n=200, effect_size=0.15, power_target=0.80,
n_periods=4, n_treated_periods=2) # DID: solves MDE / n / power
sp.power("cluster_rct", cluster_size=50, icc=0.05,
effect_size=0.2, power_target=0.80) # Cluster RCT: solves n_clusters
sp.pretrends_power(result) # Roth (2022) pre-trends power
```
Persist the `PowerResult` next to `data_contract.json` and `empirical_strategy.md` — a referee will ask whether the design was powered before data collection, not after.
## Step 0 — Sample construction & data contract (Section "Data")
An AER §1 *Data* section has three jobs: (a) describe sources, (b) document **every** sample restriction (the "footnote 4" sample log), (c) lock the panel structure. StatsPAI assumes an **analysis-ready DataFrame** — do ETL (imputation, type coercion, merges, transforms) in pandas first, then run the 5-check contract.
### 0.1 Sample-construction log (footnote 4)
```python
sample_log = []
df0 = df_raw.copy(); sample_log.append(("0. raw", len(df0)))
df1 = df0.dropna(subset=["wage"]); sample_log.append(("1. drop missing wage", len(df1)))
df2 = df1[df1["age"].between(18, 65)]; sample_log.append(("2. drop age outside 18-65", len(df2)))
df3 = df2[df2["industry"].isin(MANUF_CODES)]; sample_log.append(("3. keep manufacturing", len(df3)))
df = df3
import json; json.dump(sample_log, open("artifacts/sample_construction.json", "w"), indent=2)
```
Paste this log verbatim as footnote 4 of your paper. AER reviewers use it to reconstruct the analysis sample.
### 0.2 Five-check data contract (go / no-go gate)
```python
import pandas as pd, numpy as np, statspai as sp
def data_contract(df, *, y, treatment, id=None, time=None, covariates=()):
"""Return a go/no-go dict. Stop the pipeline if any required check fails."""
keys = [y, treatment] + ([id, time] if id and time else []) + list(covariates)
c = {
"n_obs": len(df), # 1. shape
"dtypes": df[keys].dtypes.astype(str).to_dict(), # 2. dtypes on keys
"n_missing": df[keys].isna().sum().to_dict(), # 3. missing pattern
"n_dupes_on_keys": 0,
"panel_balanced": None,
"cohort_sizes": None,
}
if id and time:
c["n_dupes_on_keys"] = int(df.duplicated([id, time]).sum()) # 4. duplicate (id,time)
balanced = sp.balance_panel(df, entity=id, time=time) # 5. panel balance
c["panel_balanced"] = len(balanced) == len(df)
c["n_dropped_by_balance"] = len(df) - len(balanced)
if "first_treat_year" in df.columns: # staggered cohorts
c["cohort_sizes"] = (
df.drop_duplicates(id).groupby("first_treat_year").size().to_dict()
)
c["y_range"] = (float(df[y].min()), float(df[y].max()))
c["treatment_share"] = float(df[treatment].mean())
# Missingness mechanism hint (Rubin): compare covariate means between
# rows missing-on-y vs observed. Any p < 0.05 ⇒ NOT MCAR → use MI / IPW,
# not listwise deletion.
from scipy import stats
miss_y = df[y].isna()
c["mcar_hint"] = "likely MCAR (listwise OK)"
if miss_y.any() and (~miss_y).any():
for cov in covariates:
if df[cov].dtype.kind in "fi":
_, p = stats.ttest_ind(df.loc[miss_y, cov].dropna(),
df.loc[~miss_y, cov].dropna(),
equal_var=False)
if p < 0.05:
c["mcar_hint"] = f"NOT MCAR (y-miss differs on {cov}, p={p:.3f}) → use MI / IPW"
break
return c
contract = data_contract(df, y="wage", treatment="training",
id="worker_id", time="year",
covariates=["age", "edu", "tenure"])
assert contract["n_dupes_on_keys"] == 0, "duplicate (id, time) — fix before panel methods"
assert all(v == 0 for v in contract["n_missing"].values()), \
f"NaNs on keys: {contract['n_missing']}"
```
If any assertion fires, **stop** and fix it in pandas — StatsPAI estimators silently drop NaN rows, the most common source of "mysterious sample-size shrinkage" bugs. Persist:
```python
import json; json.dump(contract, open("artifacts/data_contract.json", "w"), indent=2, default=str)
```
## Step 1 — Descriptive statistics (Table 1)
The signature AER Table 1 has three column blocks plus a difference column:
| | (1) Full | (2) Treated | (3) Control | (4) Δ (t-test) |
The Imbens–Rubin rule of thumb: a normalized difference `|Δ| / √((s²₁+s²₀)/2) > 0.25` flags substantive imbalance and should trigger matching / reweighting *before* you trust an OLS comparison.
```python
# Quick text/LaTeX preview (use sumstats `output=` for a string-only render).
# When `by=` is binary 0/1 and you don't pass `by_labels=`, sumstats auto-fills
# the panel headers as **Control / Treated** so the academic Table 1 reads
# correctly out of the box. For non-0/1 codings or different wording, pass
# `by_labels={0:"Untrained", 1:"Trained"}` (or `{"A":"Control","B":"Treated"}`).
print(sp.sumstats(df, vars=["wage","edu","exp","tenure","age"],
by="training", output="text"))
# AER-style balance table → Word + Excel + LaTeX in three lines.
# `mean_comparison` returns a MeanComparisonResult that exposes the full
# export chain (.to_word / .to_excel / .to_latex / .to_markdown / .to_html).
mc = sp.mean_comparison(df,
["age","edu","tenure","firm_size"],
group="training",
test="ttest",
title="Table 1. Summary statistics by treatment status")
mc.to_word ("tables/table1_summary.docx") # editable in Word
mc.to_excel("tables/table1_summary.xlsx") # editable in Excel
open("tables/table1_summary.tex", "w").write(mc.to_latex())
sp.describe(df).to_markdown("references/codebook.md") # auto-codebook
```
### 1.1 Multi-panel Table 1 (AER convention)
Group rows into **Panel A: Outcomes**, **Panel B: Treatment intensity**, **Panel C: Controls**, **Panel D: Sample composition**. The cleanest path is to push each panel into a `sp.collect()` bundle — one `.save("file.docx")` call then writes the whole multi-panel Table 1 with AER book-tab borders, in Word **and** Excel **and** LaTeX from one source.
```python
panels = {
"A. Outcomes": ["wage", "log_wage", "weeks_employed"],
"B. Treatment": ["training", "training_hours"],
"C. Demographic controls": ["age", "edu", "female", "married"],
"D. Labor market": ["tenure", "firm_size", "industry_id"],
}
c1 = sp.collect("Table 1. Summary statistics", template="aer")
for label, vs in panels.items():
c1.add_heading(f"Panel {label}", level=2)
c1.add_summary(df, vars=vs, stats=["mean", "sd", "n"])
c1.save("tables/table1_summary.docx") # editable Word, AER book-tab borders
c1.save("tables/table1_summary.xlsx") # one sheet per panel (heading drives the sheet name)
c1.save("tables/table1_summary.tex") # multi-panel LaTeX
# Plain-text alternative (no Collection): one `sp.sumstats` per panel, concat strings.
# Useful when you only need the .tex preview without a binary export.
import io; buf = io.StringIO()
for label, vs in panels.items():
buf.write(f"\n% Panel {label}\n")
buf.write(sp.sumstats(df, vars=vs, by="training",
stats=["mean", "sd", "n"], output="latex"))
open("tables/table1_summary_flat.tex", "w").write(buf.getvalue())
```
### 1.2 Figure 1 — raw trends / treatment rollout
For DID / event-study designs, the *first* figure of an applied paper is almost always either (a) raw treated-vs-control means over time, or (b) the staggered rollout heat-strip showing which units are treated when. Both are one-liners:
```python
# (a) Raw trends with vertical line at treatment start (DID Figure 1 style)
sp.parallel_trends_plot(df, y="wage", time="year", treat="training",
treat_time=2015, ci=True,
labels={"treated":"Trained", "control":"Untrained"})\
.savefig("figures/fig1a_raw_trends.png", dpi=300)
# (b) Treatment rollout heatmap (staggered DID convention; Goodman-Bacon-friendly)
sp.treatment_rollout_plot(df, time="year", treat="training", id="worker_id",
sort_by="first_treat_year",
title="Figure 1. Treatment timing")\
.savefig("figures/fig1b_rollout.png", dpi=300)
```
For matching designs, also produce a **love plot** of standardized differences pre/post matching (Step 3.4).
## Step 2 — Empirical strategy (Section "Identification")
This is the heart of an AER paper. Before any code, **write down the equation explicitly** and **state the identifying assumption**. Vague identification language is the single most common reason a referee rejects an applied paper.
### 2.1 Equation × identifying assumption table
| Design | Estimating equation | Identifying assumption |
|---|---|---|
| 2×2 DID | `Y_it = α_i + λ_t + β·D_it + X'γ + ε_it` | parallel trends conditional on X |
| Event-study (CS / SA) | `Y_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t-G_i = e} + ε_it` | no anticipation + group-time PT |
| 2SLS | `Y_i = α + β·D_i + X'γ + ε_i; D_i = π·Z_i + X'δ + u_i` | exclusion + relevance + monotonicity |
| Sharp RD | `Y_i = α + β·1{X_i ≥ c} + f(X_i) + ε_i` (local poly) | continuity of E[Y(0)\|X] at c, no manipulation |
| SCM | `Ŷ_1t(0) = Σ_j ŵ_j Y_jt`, τ_t = `Y_1t − Ŷ_1t(0)` for t≥T_0 | pre-period fit + interpolation validity |
| DML / unconfoundedness | `Y_i = m(X_i) + β·D_i + ε_i` (Robinson partialling-out) | unconfoundedness \| X + overlap |
### 2.2 Design picker
When `design="auto"` is too opaque, use this decision tree:
```
┌─ running var + cutoff ───────────────── RDD (sp.rdrobust)
│
├─ exogenous instrument Z ─────────────── IV (sp.ivreg, sp.dml)
data + question ─┤
├─ pre/post × treat/control ─┬ 2 periods ── 2×2 DID (sp.did)
│ └ staggered ── CS / SA (sp.callaway_santanna)
│
├─ 1 treated unit + donor pool + long pre ── SCM (sp.synth, sp.sdid)
│
├─ high-dim X, selection-on-observables ── DML / Causal Forest
│
└─ none of the above ──────────────────── matching + E-value (sp.match, sp.evalue)
```
### 2.3 Estimand-first DSL = pre-registration
`sp.causal_question` declares the five-tuple (population, treatment, outcome, estimand, design) and `.identify()` picks the estimator with its assumptions written down. **Treat the `IdentificationPlan` as your pre-registration artifact** — freeze it *before* running `q.estimate()` so the analysis plan is a dated document, not a post-hoc rationalization.
```python
q = sp.causal_question(
treatment="training", outcome="wage", data=df,
population="manufacturing workers, 2010–2020",
estimand="ATT",
design="auto", # 'auto' | 'did' | 'event_study' | 'regression_discontinuity'
# | 'iv' | 'rct' | 'selection_on_observables'
# | 'synthetic_control' | 'natural_experiment'
# | 'policy_shock' | 'longitudinal_observational'
time_structure="panel", time="year", id="worker_id",
covariates=["age", "edu", "tenure"],
)
plan = q.identify() # IdentificationPlan: estimator + assumptions + fallbacks
print(plan.summary()) # human-readable Methods paragraph
print(plan.identification_story) # narrative of why this estimator identifies the estimand
# FREEZE the plan to disk BEFORE estimating — this is your pre-registration.
# `q` (CausalQuestion) carries the question (population / treatment / outcome).
# `plan` (IdentificationPlan) carries the strategy (estimator / story /
# assumptions / fallbacks / warnings). The estimating equation is *your*
# job to write down — paste it from the §2.1 table that matches plan.estimator.
from pathlib import Path
bullets = lambda xs: "\n".join(f"- {x}" for x in xs) if xs else "- (none)"
Path("artifacts/empirical_strategy.md").write_text(
f"# Empirical Strategy (pre-registration)\n\n"
f"**Population**: {q.population}\n"
f"**Treatment**: `{q.treatment}` **Outcome**: `{q.outcome}`\n"
f"**Estimand**: {plan.estimand}\n"
f"**Estimator**: `sp.{plan.estimator}`\n\n"
f"## Estimating equation (paste from §2.1 row matching `{plan.estimator}`)\n"
f"```\n<paste here>\n```\n\n"
f"## Identification story\n{plan.identification_story}\n\n"
f"## Identifying assumptions (must defend in §2)\n{bullets(plan.assumptions)}\n\n"
f"## Auto-flagged warnings\n{bullets(plan.warnings)}\n\n"
f"## Fallback estimators (Step 7 robustness)\n{bullets(plan.fallback_estimators)}\n"
)
# Machine-readable sidecar (full question, replayable):
Path("artifacts/causal_question.yaml").write_text(q.to_yaml())
result = q.estimate() # run only after the plan is committed to disk / git
```
### 2.5 (Optional) LLM-assisted DAG addendum
Useful when the user wants an explicit DAG to defend in §2 or §7. Pipe the discovered DAG into `sp.causal(..., dag=...)`.
```python
proposal = sp.llm_dag_propose(
variables=df.columns.tolist(),
domain="labor economics: training, wages, tenure",
client=my_llm_client, # .complete(prompt) -> str; None = heuristic
)
validation = sp.llm_dag_validate(proposal, df, alpha=0.05)
print(validation.edge_evidence)
discovered = sp.llm_dag_constrained(
df,
descriptions={"wage": "monthly wage USD", "training": "0/1 program"},
oracle=my_llm_client.suggest_edges, # optional; falls back to plain PC
max_iter=3,
)
# Pass into Step 4 as: sp.causal(..., dag=discovered.dag)
```
## Step 3 — Identification graphics (Section "Identification, graphical evidence")
AER convention: **the identification figure precedes the regression table**. The reader should see graphical evidence that PT holds / first stage is strong / RD jumps cleanly *before* you ask them to trust your point estimate.
### 3.1 Event-study plot + numerical pre-trends test (DID identification)
Pre-period coefficients ≈ 0 (with the −1 reference period normalized to zero) is the visual evidence for parallel trends. Pair the **figure** with a **numerical** pre-trends test so reviewers don't have to eyeball it.
```python
# Event-study estimates
es = sp.event_study(df, y="wage", treat_time="first_treat_year",
time="year", unit="worker_id",
window=(-4, 4), ref_period=-1,
covariates=["age", "edu"])
# Figure 2a — event-study coefficient plot
sp.enhanced_event_study_plot(
es, shade_pre=True,
title="Figure 2a. Event-study coefficients (95% CI; ref. period = −1)")\
.savefig("figures/fig2a_event_study.png", dpi=300)
# Numerical pre-trends test (Roth 2022 power) for the table footnote
print(sp.pretrends_summary(es)) # F-stat, p-value, max-PT bound
# Bacon decomposition figure for staggered DID (Figure 2a-bis)
bd = sp.bacon_decomposition(df, y="wage", treat="training",
time="year", id="worker_id")
sp.bacon_plot(bd, title="Figure 2a-bis. Goodman-Bacon weights")\
.savefig("figures/fig2a2_bacon.png", dpi=300)
# CS / SA dynamic effects figure (Figure 2a-ter): the post-period τ_e curve.
# Use `x=` for covariates (not `covariates=` — that kwarg does not exist).
cs = sp.callaway_santanna(df, y="wage", g="first_treat_year",
t="year", i="worker_id",
x=["age", "edu"])
sp.did_summary_plot(cs, title="Figure 2a-ter. Dynamic ATT (Callaway–Sant'Anna)")\
.savefig("figures/fig2a3_csdid.png", dpi=300)
# Borusyak–Jaravel–Spiess joint pre-trends test — needs the CS/SA result
# AND the underlying panel (NOT the event_study() output):
sp.bjs_pretrend_joint(cs, df, y="wage", group="first_treat_year",
time="year", first_treat="first_treat_year",
controls=["age", "edu"])
```
### 3.2 First-stage F-statistic + scatter (IV identification)
Rule of thumb: first-stage F ≥ 10 for OLS-style inference; F ≥ 23 for AR-equivalent inference (Stock–Yogo / Lee 2022).
```python
iv = sp.ivreg("wage ~ (training ~ Z1 + Z2) + age + edu", df, cluster="firm_id")
print(iv.summary()) # reports first-stage F (Cragg–Donald / KP)
sp.binscatter(df, y="training", x="Z1",
controls=["age", "edu"],
n_bins=20, ci=True)\
.savefig("figures/fig_first_stage.png", dpi=300)
```
### 3.3 RD: McCrary density + canonical RD plot + binscatter
The signature RD figure is `sp.rdplot` (CCT-style binned scatter with local-polynomial fit on each side), paired with the McCrary manipulation test. Together they answer: (a) is there a visual jump? (b) is the density continuous at the cutoff?
```python
# Figure 2b — canonical RD plot (binned means + local poly fit on each side)
sp.rdplot(df, y="y", x="running_var", c=0,
p=4, kernel="triangular", binselect="esmv",
shade_ci=True, ci_level=0.95)\
.savefig("figures/fig2b_rdplot.png", dpi=300)
# Figure 2b-bis — McCrary density (manipulation test)
sp.rddensity(df, x="running_var", c=0).plot()\
.savefig("figures/fig2b2_mccrary.png", dpi=300)
# Optional: covariate-adjusted binscatter (continuity in covariates is also testable)
sp.binscatter(df, y="age", x="running_var", n_bins=40, ci=True)\
.savefig("figures/fig2b3_cov_binscatter.png", dpi=300)
```
### 3.4 Matching: love plot (standardized differences)
```python
m = sp.match(df, y="wage", treat="training",
covariates=["age", "edu", "tenure"], method="nearest")
m.plot()\
.savefig("figures/fig2c_love_plot.png", dpi=300) # |std diff| pre vs post; target |Δ|<0.1
```
### 3.5 SCM: synthetic-control trajectory + gap plot
For synthetic-control designs the canonical Figure 2 is the treated-vs-synthetic time-series with treatment time annotated. `synthdid_plot` does this in one line.
```python
sc = sp.synth(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sc.plot().savefig("figures/fig2d_synth_trajectory.png", dpi=300) # treated vs synthetic + gap
sd = sp.sdid(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000)
sp.synthdid_plot(sd, title="Figure 2d. Synthetic DID")\
.savefig("figures/fig2d2_sdid.png", dpi=300)
```
### 3.6 Generic pre-flight (identification-independent)
```python
sp.diagnose(df, y="wage", x=["age", "edu", "tenure"]) # leverage, overlap, missing
```
> Identification-specific checks (PT for DID, weak-IV F, density for RD, common support for matching) **are also auto-run inside `sp.causal(...)`** in Step 4 — don't duplicate the numerics here, but DO produce the figures: a referee scans the figures first.
## Step 4 — Main results (multi-regression tables, AER style)
This is the densest section of an applied paper. A modern AER §4 typically contains **2–3 multi-regression tables and one coefficient plot**:
- **Table 2** (main): progressive controls, 4–6 columns
- **Table 2-bis** (design horse race): same coefficient under OLS / 2SLS / DID / DML
- **Table 2-ter** (multi-outcome): same treatment, several outcomes side-by-side
- **Figure 3** (coefplot): visual summary of β̂ and 95% CI across specs
> **Estimator routing** (memorize this — getting it wrong silently produces nonsense):
> - **No FE** → `sp.regress("y ~ x1 + x2", df, cluster="firm_id")`
> - **High-dim FE** → `sp.feols("y ~ x1 + x2 | fe1 + fe2", df, vcov={"CRV1":"firm_id"})`
> - **Two-way cluster** → `sp.feols(..., vcov={"CRV1":"firm_id+year"})`
> - **2SLS / IV** → `sp.ivreg("y ~ (x ~ z) + controls", df, cluster=...)`
> - **DID / event-study** → `sp.callaway_santanna(...)` / `sp.sun_abraham(...)`
>
> **Never** write `sp.regress("y ~ x | firm_id")` — `sp.regress` does not parse `|` and silently treats `x | firm_id` as a single variable name. Use `sp.feols` for any formula containing `|`.
`sp.regtable(*models, ...)` is the workhorse. Useful kwargs:
```
keep : list of coef names to display (e.g. ["training"])
drop : list of coef names to suppress (controls)
model_labels : column labels ["(1) Baseline", "(2) +Demog", ...]
dep_var_labels : dep-var-row labels (for multi-outcome tables)
panel_labels : panel-A / panel-B layout for stacked tables
coef_labels : pretty-print names for coefficients
stars : "aer" → * 0.10 ** 0.05 *** 0.01 (or "default", "none")
stats : footer rows ["N","R2","Cluster","FE","DV mean", ...]
output : "latex" | "html" | "markdown" | "text"
filename : path to write the table
```
### 4.1 Pattern A — Progressive controls (the canonical Table 2)
Stable β̂ across columns ⇒ less concern that selection on observables is driving the estimate (Oster 2019 selection-stability logic; quantified in Step 7.5). **`sp.regtable(*models)` is the StatsPAI equivalent of Stata `outreg2` / `esttab` and R `modelsummary::msummary` / `summary_col` — it consolidates N models into ONE table with one column per model.**
| | (1) Baseline | (2) +Demographics | (3) +Labor-market | (4) +Region×Industry FE | (5) +Worker FE |
|---|---|---|---|---|---|
| Controls | none | age, edu | + tenure, firm_size | high-dim FE | individual FE |
```python
# RULE: pure OLS → sp.regress; high-dim FE absorption → sp.feols
# (sp.regress does NOT parse `|` as FE — it's a thin OLS wrapper. Use
# `sp.feols("y ~ x | fe1 + fe2", df, vcov={"CRV1":"firm_id"})` for FE.)
M1 = sp.regress("wage ~ training", df, cluster="firm_id")
M2 = sp.regress("wage ~ training + age + edu", df, cluster="firm_id")
M3 = sp.regress("wage ~ training + age + edu + tenure + firm_size", df, cluster="firm_id")
M4 = sp.feols ("wage ~ training + age + edu + tenure + firm_size | region + industry + year",
df, vcov={"CRV1": "firm_id"})
M5 = sp.feols ("wage ~ training + age + edu + tenure + firm_size | worker_id + year",
df, vcov={"CRV1": "firm_id"})
# Consolidate 5 models into ONE table (= Stata `outreg2 [M1..M5] using ..., replace`).
# **Default = show ALL coefficients verbatim — controls AND the intercept**
# (AER convention; readers verify the full spec). Pass NO `keep=`/`drop=` and
# `regtable` will surface every estimated parameter. Add `drop=["Intercept"]`
# only if you want to suppress the constant for paper aesthetics; add
# `keep=[focal]` only when a focal-coefficient-only table is intentional.
rt = sp.regtable(M1, M2, M3, M4, M5,
template="aer", # auto-applies SE label, star levels, font
coef_labels={"training": "Job training"},
model_labels=["(1) Baseline", "(2) +Demog.", "(3) +Labor-mkt",
"(4) Region×Ind. FE", "(5) Worker FE"],
stats=["N", "R2", "Cluster", "FE", "DV mean"],
title="Table 2. Effect of training on wages")
# Variants (all opt-in — the default above is preferred):
# • drop intercept only: sp.regtable(..., drop=["Intercept"])
# • focal-coefficient only: sp.regtable(..., keep=["training"])
# • mixed-magnitude table: sp.regtable(..., fmt="auto")
# Use whenever a single table mixes dollar-magnitude coefficients
# (e.g. earnings ≈ 1500) with elasticity-magnitude coefficients
# (e.g. log-earnings ≈ 0.09). The default fmt="%.3f" pads the dollar
# side; a fixed fmt="%.0f" rounds the elasticity side to "0" while
# significance stars survive — the silent LaLonde-style precision
# trap. fmt="auto" picks per-value precision: thousands separator
# for |β|≥1000, integer for ≥100, 1 dp for ≥10, 2 dp for ≥1, 3 dp
# below — so neither magnitude is killed.
# Export to ALL THREE in three lines — Word for co-authors, Excel for editors, LaTeX for build:
rt.to_word ("tables/table2_main.docx")
rt.to_excel("tables/table2_main.xlsx")
open("tables/table2_main.tex", "w").write(rt.to_latex())
```
### 4.2 Pattern B — Design horse race (Table 2-bis)
Show the same coefficient of interest under multiple identification strategies. This is *the* AER credibility move: convergent evidence across designs each making different identifying assumptions.
```python
ols = sp.feols ("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"}) # OLS + 2-way FE
ivr = sp.ivreg("wage ~ (training ~ Z1 + Z2) + age + edu + tenure",
df, cluster="firm_id") # 2SLS
did = sp.callaway_santanna(df, y="wage", g="first_treat_year",
t="year", i="worker_id",
x=["age","edu","tenure"]) # CS-DID (kwarg is x=)
dml = sp.dml(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], model="plr") # DML
mtch = sp.match(df, y="wage", treat="training",
covariates=["age","edu","tenure"], method="nearest") # PSM
rt = sp.regtable(ols, ivr, did, dml, mtch,
template="aer",
coef_labels={"training": "Job training (β̂)"},
model_labels=["(1) OLS+FE", "(2) 2SLS", "(3) CS-DID",
"(4) DML-PLR", "(5) PSM"],
stats=["Estimator", "Identifying assumption",
"N", "R2 / Pseudo-R2", "Cluster"],
title="Table 2-bis. Convergent evidence across designs")
rt.to_word ("tables/table2b_design_race.docx")
rt.to_excel("tables/table2b_design_race.xlsx")
open("tables/table2b_design_race.tex", "w").write(rt.to_latex())
```
### 4.3 Pattern C — Multi-outcome table (same X, several Y's)
A single treatment, several outcomes. Use `dep_var_labels` so each column carries the Y name.
```python
ys = ["wage", "log_wage", "weeks_employed", "left_firm", "promoted"]
multi_y = [sp.feols(f"{y} ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
for y in ys]
rt = sp.regtable(*multi_y,
template="aer",
dep_var_labels=ys, # column header: dep var
model_labels=["(1)","(2)","(3)","(4)","(5)"],
stats=["N","R2","DV mean","Cluster"],
title="Table 2-ter. Effect of training on multiple outcomes")
rt.to_word ("tables/table2c_multi_outcome.docx")
rt.to_excel("tables/table2c_multi_outcome.xlsx")
open("tables/table2c_multi_outcome.tex", "w").write(rt.to_latex())
```
### 4.4 Pattern D — Stacked Panel A / Panel B table
Same model family, two horizons (short-run / long-run) or two samples (pre-2015 / post-2015) stacked vertically. Use `panel_labels`.
```python
panelA = [sp.feols("wage_t1 ~ training + X | industry + year", df, vcov={"CRV1":"firm_id"}),
sp.feols("wage_t1 ~ training + X | worker_id + year", df, vcov={"CRV1":"firm_id"})]
panelB = [sp.feols("wage_t5 ~ training + X | industry + year", df, vcov={"CRV1":"firm_id"}),
sp.feols("wage_t5 ~ training + X | worker_id + year", df, vcov={"CRV1":"firm_id"})]
rt = sp.regtable(*panelA, *panelB,
template="aer",
panel_labels=["Panel A. Short-run (1 year)",
"Panel A. Short-run (1 year)",
"Panel B. Long-run (5 years)",
"Panel B. Long-run (5 years)"],
model_labels=["(1) Industry FE","(2) Worker FE"]*2,
stats=["N","R2"],
title="Table 2-quater. Short- vs long-run effects")
rt.to_word ("tables/table2d_horizons.docx")
rt.to_excel("tables/table2d_horizons.xlsx")
open("tables/table2d_horizons.tex", "w").write(rt.to_latex())
```
### 4.5 Pattern E — IV reporting triplet (first-stage / reduced-form / 2SLS)
The textbook AER IV table presents the **first stage**, the **reduced form**, and the **2SLS** in three columns so the reader can verify Wald-ratio = RF / FS.
```python
fs = sp.feols ("training ~ Z + age + edu | industry + year", df, vcov={"CRV1":"firm_id"}) # 1st stage
rf = sp.feols ("wage ~ Z + age + edu | industry + year", df, vcov={"CRV1":"firm_id"}) # reduced form
iv = sp.ivreg ("wage ~ (training ~ Z) + age + edu | industry+year",
df, cluster="firm_id") # 2SLS
rt = sp.regtable(fs, rf, iv,
template="aer",
keep=["Z", "training"], # IV triplet is intentionally focal:
# show only Z + endog so the reader can
# eyeball Wald-ratio = RF / FS. For the
# full coef list, drop the kwarg entirely.
dep_var_labels=["training", "wage", "wage"],
model_labels=["(1) First stage", "(2) Reduced form", "(3) 2SLS"],
stats=["First-stage F", "N", "R2", "Cluster"],
title="Table 2-quinto. IV reporting triplet")
rt.to_word ("tables/table2e_iv_triplet.docx")
rt.to_excel("tables/table2e_iv_triplet.xlsx")
open("tables/table2e_iv_triplet.tex", "w").write(rt.to_latex())
```
### 4.6 Pattern F — Causal-design main via `sp.causal(...)`
For DID / IV / RD / SCM mains, the `sp.causal(...)` orchestrator returns a `CausalResult` plus diagnostics and an automatic robustness preview. Pipe `.result` into `regtable`:
```python
w = sp.causal(df, y="wage", treatment="training",
id="worker_id", time="year", design="did",
covariates=["age", "edu", "tenure"],
dag=discovered.dag) # optional
print(w.diagnostics) # PT verdict + warnings
print(w.recommendation) # which estimator + why
print(w.result.summary()) # point estimate + cluster-robust SE + CI
print(w.robustness_findings) # automated robustness battery preview
```
### 4.7 Figure 3 — coefficient plot of the main table
Replace one of the wall-of-numbers tables with a coefplot in the body, push the table to the appendix. Modern AER papers increasingly do this.
```python
sp.coefplot(M1, M2, M3, M4, M5,
model_names=["(1)","(2)","(3)","(4)","(5)"],
variables=["training"],
title="Figure 3. β̂ on training across specifications (95% CI)",
alpha=0.05)\
.savefig("figures/fig3_coefplot.png", dpi=300)
```
### Reporting checklist for the Table 2 footnote (AER house style)
- Standard-error cluster level (and whether it's two-way / Conley)
- Fixed-effects absorbed — `regtable` auto-adds **one footer row per FE name** (e.g. `Industry FE: Yes / Year FE: Yes / Worker_id FE: No`) whenever any column comes from `sp.feols(... | fe1 + fe2 ...)`. Don't hand-roll these rows.
- Sample size **and number of clusters**
- Estimator (OLS / 2SLS / CS-DID / SCM / DML)
- Stars convention `* 0.10 ** 0.05 *** 0.01`
- Mean of dependent variable in the estimation sample (so β̂ can be read as a % of the base rate)
## Step 5 — Heterogeneity (Table 3 + Figure 4)
The AER §5 *Heterogeneity* combines (a) a **subgroup regression table** with one column per subgroup (binary moderators + interaction terms), and (b) a **CATE / dose-response figure** for continuous moderators. Both should appear; they answer different questions.
### 5.1 Pattern G — Subgroup `regtable` (Table 3)
One column per subgroup, with the same specification re-run on each slice. Clean, easy to read, expected by referees.
```python
slices = {
"(1) All": df,
"(2) Female": df[df["female"] == 1],
"(3) Male": df[df["female"] == 0],
"(4) Low skill": df[df["skill_quartile"].isin([1, 2])],
"(5) High skill": df[df["skill_quartile"].isin([3, 4])],
"(6) Small firm": df[df["firm_size"] < 100],
"(7) Large firm": df[df["firm_size"] >= 100],
}
gmodels = [sp.feols("wage ~ training + age + edu + tenure | industry + year",
d, vcov={"CRV1": "firm_id"}) for d in slices.values()]
rt = sp.regtable(*gmodels,
template="aer",
coef_labels={"training": "Training"},
model_labels=list(slices),
stats=["N","R2","DV mean"],
title="Table 3. Heterogeneous effects of training")
rt.to_word ("tables/table3_heterogeneity.docx")
rt.to_excel("tables/table3_heterogeneity.xlsx")
open("tables/table3_heterogeneity.tex", "w").write(rt.to_latex())
```
### 5.2 Interaction-form heterogeneity (alternative Table 3)
Test moderation formally with interaction terms — referees often ask whether the gap between subgroups is statistically significant, which requires the interaction p-value.
```python
H1 = sp.feols("wage ~ training*female + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
H2 = sp.feols("wage ~ training*C(skill_quartile) + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
H3 = sp.feols("wage ~ training*log_firm_size + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
rt = sp.regtable(H1, H2, H3,
template="aer",
keep=["training", "training:female", # interaction-form heterogeneity
"training:C(skill_quartile)[T.2]", # is intentionally focal:
"training:C(skill_quartile)[T.3]", # only the main effect + interactions
"training:C(skill_quartile)[T.4]", # are reported. Drop this kwarg
"training:log_firm_size"], # entirely to show full controls.
model_labels=["(1) ×Female", "(2) ×Skill quartile", "(3) ×log(Firm size)"],
stats=["N","R2"],
title="Table 3-bis. Interaction-form heterogeneity")
rt.to_word ("tables/table3b_interactions.docx")
rt.to_excel("tables/table3b_interactions.xlsx")
open("tables/table3b_interactions.tex", "w").write(rt.to_latex())
```
### 5.3 Figure 4 — dose-response (continuous treatment)
```python
dr = sp.dose_response(df, y="wage", treat="training_hours",
covariates=["age","edu","tenure","firm_size"],
n_dose_points=20)
dr.plot(title="Figure 4a. Dose-response: training hours → wage")\
.savefig("figures/fig4a_dose_response.png", dpi=300)
# DID-flavored continuous treatment (de Chaisemartin–D'Haultfœuille):
sp.continuous_did(df, y="wage", dose="training_hours",
time="year", id="worker_id").plot()\
.savefig("figures/fig4a2_continuous_did.png", dpi=300)
```
### 5.4 Figure 4-bis — CATE distribution (DR-Learner / causal forest)
The CATE plotters need a result that exposes per-row conditional effects.
`sp.causal_forest` returns a *summary* result without `.cate_estimates`, so
for the CATE histogram and grouped bar chart use a meta-learner (or any
DR-/X-/R-learner) and pass its CATE table to `cate_group_plot`.
```python
ml = sp.metalearner(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], learner="dr")
sp.cate_plot(ml, kind="hist",
title="Figure 4b. Distribution of conditional ATE")\
.savefig("figures/fig4b_cate_hist.png", dpi=300)
# CATE by group bar chart: first compute the group-level table, THEN plot it.
# `cate_group_plot` takes a DataFrame, not the result object.
g = sp.cate_by_group(ml, df, by="skill_quartile", n_groups=4)
sp.cate_group_plot(g, title="Figure 4c. CATE by skill quartile")\
.savefig("figures/fig4c_cate_by_group.png", dpi=300)
# Tabular summary for the appendix
print(sp.cate_summary(ml))
print(g) # group-level CATE table
```
### 5.5 Subgroup-analysis dispatcher (one-liner)
```python
sp.subgroup_analysis(df, formula="wage ~ training + age + edu + tenure",
x="training",
by={"gender": "female", "skill": "skill_quartile"},
robust="hc1") # quick subgroup β̂ table (HC1 by default; no cluster arg)
```
For continuous moderators or many subgroups, prefer:
- `sp.continuous_did(...)` — dose-response under DID
- `sp.metalearner(..., learner="dr")` + `sp.cate_plot` / `sp.cate_by_group` — DR-Learner CATE (recommended for plotting)
- `sp.causal_forest(formula="wage ~ training | X", data=df)` — CATE summary only (no per-row `.cate_estimates`)
## Step 6 — Mechanisms / channels
```python
sp.mediation(df, y="wage", d="training", m="hours_worked",
X=["age", "edu", "tenure"]) # ACME / ADE / total effect
sp.decompose(...) # Oaxaca-Blinder / RIF / FFL / KOB
```
## Step 7 — Robustness gauntlet (the AER referee gauntlet)
The seven canonical robustness blocks of an applied paper. A modern AER paper expects most of these in the body or appendix — assemble a Table A1-style robustness panel from the outputs.
### 7.1 Placebo tests
```python
sp.rdplacebo(df, y="y", x="running_var", c=0,
placebo_cutoffs=[-2, -1, 1, 2]) # RD: fake cutoffs
sp.synth_time_placebo(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000,
n_placebo_times=10) # SCM in-time placebo
sp.synthdid_placebo(...) # SDID placebo
# For DID: re-run with a fake treat year before actual treatment and confirm β̂ ≈ 0.
```
### 7.2 Alternative samples
```python
result_no_outliers = sp.causal(df.query("wage < wage.quantile(0.99)"), ...)
result_drop_early = sp.causal(df.query("first_treat_year > 2008"), ...)
result_balanced = sp.causal(sp.balance_panel(df, entity="worker_id", time="year"), ...)
```
### 7.3 Alternative specifications (spec curve)
```python
sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age", "edu"], ["age", "edu", "tenure"]],
subsets={"all": None, "manuf": df["industry"].eq("manufacturing")})
```
### 7.4 Alternative standard errors
Cluster-level choice is itself a robustness check — show the result is not driven by an over-narrow cluster.
```python
# For statsmodels-backed sp.regress / sp.ivreg results:
sp.twoway_cluster(M3, df, cluster1="firm_id", cluster2="year") # two-way clustering
sp.conley(M3, df, lat="lat", lon="lon",
dist_cutoff=100, kernel="uniform") # spatial HAC (Conley 1999)
# For pyfixest-backed sp.feols results, set 2-way cluster directly in `vcov`:
sp.feols("y ~ x | firm_id + year", df,
vcov={"CRV1": "firm_id+year"}) # 2-way: firm × year
```
### 7.5 Oster (2019) selection bound
"How big would unobserved selection have to be for β to flip sign / vanish?" The Oster δ tells you whether the bound on selection on unobservables, relative to selection on observables, has to exceed an implausible value to overturn the result.
```python
sp.oster_bounds(data=df, y="wage", treat="training",
controls=["age", "edu", "tenure"],
r_max=1.3) # β* assuming δ=1, R̃²=1.3·R²
# `oster_delta` uses x_base / x_controls (NOT treat= / controls=):
sp.oster_delta(data=df, y="wage",
x_base=["training"], # treatment(s) of interest
x_controls=["age", "edu", "tenure"], # observed controls
r_max=1.3) # δ for which β=0
```
### 7.6 Honest DID — Rambachan–Roth (2023) PT sensitivity
`honest_did` only consumes a CS / SA / `did_multiplegt` event-study result
(or `aggte(result, type='dynamic')`). Pass the `cs` object built in §3.1,
not a generic OLS/FE main-table result:
```python
sp.honest_did(cs, method="smoothness") # bound β under bounded PT violation
```
### 7.7 E-value & unified sensitivity (unmeasured confounding)
```python
sp.evalue(estimate=result.params["training"], # E-value takes point + CI, NOT result
ci=tuple(result.conf_int().loc["training"]),
measure="RR")
sp.unified_sensitivity(result, r2_treated=0.05,
r2_controlled=0.10,
include_oster=True) # Cinelli-Hazlett + Oster combined
sp.sensitivity_dashboard(result) # one-page sensitivity figure
```
### 7.8 RD-specific bandwidth / kernel sensitivity
```python
sp.rdbwsensitivity(df, y="y", x="running_var", c=0,
bw_grid=[0.5, 1.0, 1.5, 2.0]) # is β̂ stable across bandwidths?
```
### 7.9 TWFE diagnostic (staggered DID)
Goodman-Bacon decomposition flags when the TWFE estimate is contaminated by forbidden 2×2's (already-treated as control).
```python
sp.bacon_decomposition(df, y="y", treat="training",
time="year", id="worker_id")
```
### 7.10 Sequential confounder blocks (Oster-style robustness table)
```python
blocks = {
"M1 base": [],
"M2 +demographics": ["age", "edu"],
"M3 +labor-market": ["age", "edu", "tenure", "firm_size"],
"M4 +psychosocial": ["age", "edu", "tenure", "firm_size", "motivation"],
}
models = [sp.regress(f"wage ~ training + {' + '.join(c) or '1'}",
df, cluster="firm_id")
for c in blocks.values()]
rt = sp.regtable(*models,
template="aer",
model_labels=list(blocks),
title="Table 7. Selection-stability across confounder blocks")
rt.to_word ("tables/table_robust_blocks.docx")
rt.to_excel("tables/table_robust_blocks.xlsx")
open("tables/table_robust_blocks.tex", "w").write(rt.to_latex())
```
### 7.11 Pattern H — Robustness master table (Table A1, one row per check)
The canonical AER appendix Table A1 stacks every robustness specification next to the baseline so reviewers see at a glance that β̂ survives. `sp.regtable` accepts any mix of `EconometricResults` / `CausalResult`, so build the list dynamically:
```python
baseline = sp.feols("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"})
rob = {
"(1) Baseline": baseline,
"(2) Drop top 1% wage": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df.query("wage < wage.quantile(0.99)"),
vcov={"CRV1": "firm_id"}),
"(3) Balanced panel": sp.feols("wage ~ training + age + edu + tenure | industry + year",
sp.balance_panel(df, entity="worker_id", time="year"),
vcov={"CRV1": "firm_id"}),
"(4) Drop early cohorts": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df.query("first_treat_year > 2008"),
vcov={"CRV1": "firm_id"}),
"(5) Worker FE": sp.feols("wage ~ training + age + edu + tenure | worker_id + year",
df, vcov={"CRV1": "firm_id"}),
"(6) 2-way cluster": sp.feols("wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id+year"}), # 2-way: firm × year
"(7) Conley spatial SE": sp.conley(baseline, df,
lat="lat", lon="lon", dist_cutoff=100),
"(8) Log outcome": sp.feols("log_wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"}),
"(9) IHS outcome": sp.feols("ihs_wage ~ training + age + edu + tenure | industry + year",
df, vcov={"CRV1": "firm_id"}),
"(10) PSM-weighted": sp.match(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"],
method="nearest"),
"(11) Entropy balance": sp.ebalance(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"]),
"(12) DML-PLR": sp.dml(df, y="wage", treat="training",
covariates=["age","edu","tenure","firm_size"], model="plr"),
}
# Robustness master = AER Table A1 — readers MUST see every coefficient
# across every spec to verify nothing is hiding behind `keep=`. Default to
# the full coef table (intercept included); only switch to
# `keep=["training"]` if a referee has explicitly asked for a focal-only
# summary, or add `drop=["Intercept"]` if you want the constant suppressed.
rt = sp.regtable(*rob.values(),
template="aer",
coef_labels={"training": "Training (β̂)"},
model_labels=list(rob),
stats=["N", "R2", "Cluster", "FE"],
title="Table A1. Robustness of the main estimate")
rt.to_word ("tables/tableA1_robustness.docx")
rt.to_excel("tables/tableA1_robustness.xlsx")
open("tables/tableA1_robustness.tex", "w").write(rt.to_latex())
# Equivalent one-shot via the paper-format multi-panel API — produces a
# single .docx / .xlsx that you can hand a co-author, with main + robustness
# (+ heterogeneity / placebo if you have them) auto-laid-out per AER style:
sp.paper_tables(main=[M1, M2, M3, M4, M5],
robustness=list(rob.values()),
template="aer",
coef_labels={"training": "Training"},
model_labels_main=["(1)","(2)","(3)","(4)","(5)"],
model_labels_robustness=list(rob),
# paper_tables only accepts `keep=`, not `drop=`. Omit both to
# show every coefficient (AER convention). Pass `keep=["training"]`
# only when a focal-only summary is desired.
).to_docx("tables/paper_tables.docx")
```
### 7.12 Figure 5 — coefficient forest plot of all robustness specs
A single visual summary that an AER referee can parse in 5 seconds: every β̂ and 95% CI on one axis. Confirms the estimate is not knife-edge.
```python
sp.coefplot(*rob.values(),
model_names=list(rob),
variables=["training"],
title="Figure 5. β̂ on training across robustness specifications",
alpha=0.05)\
.savefig("figures/fig5_robustness_forest.png", dpi=300)
```
### 7.13 Figure 5-bis — spec curve
The Simonsohn et al. (2020) specification curve plots β̂ across **every combination** of {controls × subsamples × outcome transforms × SE types}. Useful when you want to head off "what about specification X?" referee letters.
```python
sc = sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age","edu"], ["age","edu","tenure"],
["age","edu","tenure","firm_size"]],
se_types=["robust", "cluster_firm_id", "cluster_firm_id_year"],
y_transforms=["identity", "log", "ihs"],
subsets={"all": None,
"manuf": df["industry"].eq("manufacturing"),
"no99": df["wage"] < df["wage"].quantile(0.99)},
cluster_var="firm_id")
sc.plot(title="Figure 5-bis. Specification curve")\
.savefig("figures/fig5b_spec_curve.png", dpi=300)
```
### 7.14 Figure 6 — sensitivity dashboard
One-page Cinelli–Hazlett + Oster + E-value summary for the §7 closing argument.
```python
sens = sp.unified_sensitivity(baseline,
r2_treated=0.05, r2_controlled=0.10,
include_oster=True)
sp.sensitivity_plot(sens.results,
original_estimate=baseline.params["training"],
original_ci=tuple(baseline.conf_int().loc["training"]),
title="Figure 6. Sensitivity to unobserved confounding")\
.savefig("figures/fig6_sensitivity.png", dpi=300)
sp.sensitivity_dashboard(baseline)\
.savefig("figures/fig6b_sensitivity_dashboard.png", dpi=300)
```
### 7.15 One-stop robustness reporter
```python
sp.diagnose_result(result) # PT / weak-IV / overlap / leverage verdict
sp.robustness_report(df, formula="wage ~ training + age + edu",
x="training", cluster_var="firm_id")
sp.estat(result, test="all") # Stata-style postestimation battery
```
## Step 8 — Replication package
The agent's job at §8 is to produce a **single artifact a co-author can open in Word, Excel, or LaTeX without further StatsPAI calls**. There are three packaging tiers, picked by what you need to ship:
### 8.1 Per-result export (one estimator → one Word/Excel file)
```python
result.to_docx("tables/main_result.docx",
title="Table 2. Main result") # CausalResult → .docx
result.to_latex(caption="Main result", label="tab:main")
result.plot().savefig("figures/main.png", dpi=300) # publication-quality figure
print(sp.cite(result, "training")) # → "1.239*** (0.153)" ← inline citation
```
### 8.2 Per-table export (already covered in Steps 4 / 5 / 7)
Every `sp.regtable(*models)` returns a `RegtableResult` with `.to_word()` / `.to_excel()` / `.to_latex()` / `.to_markdown()` / `.to_html()`. Use these in §4–§7 so that by the time you reach §8 the `tables/` folder already has parallel `.docx` / `.xlsx` / `.tex` for every numbered table.
### 8.3 Multi-panel paper-format (Tier 2 — Tables 2 + 3 + A1 + A2 in one file)
```python
sp.paper_tables(
main = [M1, M2, M3, M4, M5], # → "Table 2. Main results"
heterogeneity = [g_full, g_fem, g_male], # → "Table 3. Heterogeneity"
robustness = list(rob.values()), # → "Table A1. Robustness"
placebo = [pb1, pb2], # → "Table A2. Placebo tests"
template = "aer",
coef_labels = {"training": "Training"},
keep = ["training"],
).to_docx("replication/paper_tables.docx") # → 4 panels in one .docx
# .to_xlsx(...) writes one sheet per panel; .to_latex(...) one .tex with section breaks.
```
### 8.4 Full session bundle (Tier 3 — the Stata `collect` equivalent)
The single most efficient §8 deliverable: descriptives + balance + main + heterogeneity + robustness + prose **in one Word file**. `sp.collect()` is the agent-native counterpart of Stata 15's `collect` and R's `gtsave`.
```python
c = sp.collect("Effect of Training on Wages — Replication", template="aer")
c.add_heading("§1. Descriptive statistics", level=1)
c.add_summary(df, vars=["wage","age","edu","tenure"],
stats=["mean","sd","n"],
title="Table 1. Summary statistics")
c.add_balance(df, treatment="training",
variables=["age","edu","tenure","firm_size"],
title="Table 1b. Balance by treatment")
c.add_heading("§4. Main results", level=1)
c.add_regression(M1, M2, M3, M4, M5,
model_labels=["(1)","(2)","(3)","(4)","(5)"],
stats=["N","R2","Cluster","FE"],
title="Table 2. Effect of training on wages")
c.add_heading("§5. Heterogeneity", level=1)
c.add_regression(*gmodels,
model_labels=list(slices),
title="Table 3. Heterogeneous effects")
c.add_heading("§7. Robustness", level=1)
c.add_regression(*rob.values(),
model_labels=list(rob),
title="Table A1. Robustness")
c.add_text(
"Standard errors clustered at the firm level. *** p<0.01, ** p<0.05, * p<0.10. "
"Sample restrictions and full variable definitions are documented in "
"artifacts/sample_construction.json and artifacts/data_contract.json.",
title="Notes",
)
# One artifact, three formats — auto-detected from the path extension:
c.save("replication/paper.docx") # editable Word, page-break between tables
c.save("replication/paper.xlsx") # one sheet per add_*() item
c.save("replication/paper.tex") # multi-section LaTeX
c.save("replication/paper.md") # GitHub-flavoured Markdown for the README
```
Inspect the bundle before saving:
```python
print(c) # → <Collection items=8 kinds=[heading, summary, balance, ...]>
print(c.list()) # DataFrame with name / kind / title for every item
```
### 8.5 Reproducibility stamp
```python
import json
json.dump({
"statspai": sp.__version__,
"seed": 42,
"n_obs": result.data_info["n_obs"],
"estimand": result.estimand,
"estimate": float(result.params["training"]),
"ci95": list(result.conf_int().loc["training"]),
"pre_registration": "artifacts/empirical_strategy.md",
"data_contract": "artifacts/data_contract.json",
"sample_log": "artifacts/sample_construction.json",
"paper_bundle": "replication/paper.docx",
}, open("artifacts/result.json", "w"), indent=2)
```
For full-draft generation (abstract + methods + results + bibliography), see `sp.paper(result, ...)` — out of scope for this skill; call it only when the user explicitly asks for a paper draft.
---
## Regtable cookbook (one-page recipe index)
`sp.regtable(*models, ...)` is the single primitive behind every multi-regression table in an AER paper. The eight patterns above map to:
| Pattern | What varies across columns | Step |
|---|---|---|
| **A. Progressive controls** | covariate set / FE depth | 4.1 — Table 2 |
| **B. Design horse race** | identification strategy (OLS / 2SLS / DID / DML / PSM) | 4.2 — Table 2-bis |
| **C. Multi-outcome** | dependent variable Y | 4.3 — Table 2-ter |
| **D. Stacked Panel A / B** | horizon / sample (panel rows × spec columns) | 4.4 — Table 2-quater |
| **E. IV reporting triplet** | first stage / reduced form / 2SLS | 4.5 — Table 2-quinto |
| **F. `sp.causal(...)` orchestrator** | 1 column, full diagnostics | 4.6 |
| **G. Subgroup table** | subsample (full / female / male / Q1…Q4) | 5.1 — Table 3 |
| **H. Robustness master** | every robustness check stacked | 7.11 — Table A1 |
Default `sp.regtable` settings for AER house style — and the export pipeline
(produce `.docx` + `.xlsx` + `.tex` from the same `RegtableResult`):
```python
rt = sp.regtable(*models,
template="aer", # journal preset: aer/qje/econometrica/restat/jf/aeja/jpe/restud
# AER convention: pass NEITHER `keep=` NOR `drop=` —
# `regtable` will then surface every estimated parameter
# (controls AND the intercept). Add `drop=["Intercept"]`
# only if you want the constant suppressed; add
# `keep=[focal]` only for an intentional focal-only table.
coef_labels={"training": "Training"},
model_labels=[...], # column labels
stats=["N", "R2", "Cluster", "FE", "DV mean"],
title="Table N. ...")
# One-call exports — never hand-roll Word/Excel from pandas:
rt.to_word ("tables/tableN.docx") # editable Word, AER book-tab borders
rt.to_excel("tables/tableN.xlsx") # editable Excel, one sheet
open("tables/tableN.tex", "w").write(rt.to_latex()) # LaTeX for the build
print(rt.to_text()) # quick terminal preview
```
For pyfixest-style native output, `sp.etable(*models, ...)` is the alternative; for stacking many tables in one `.docx`, use `sp.paper_tables(...)` (Tier 2) or `sp.collect()` (Tier 3) — see Step 8.
## Figure factory (the 12 standard AER figures)
| # | Figure | StatsPAI call | Section |
|---|---|---|---|
| 1a | Raw trends (DID Figure 1) | `sp.parallel_trends_plot(df, y, time, treat, treat_time, ci=True)` | §1 |
| 1b | Treatment rollout heatmap | `sp.treatment_rollout_plot(df, time, treat, id)` | §1 |
| 2a | Event-study coefficients | `sp.enhanced_event_study_plot(sp.event_study(...))` | §3 |
| 2a' | Bacon weights | `sp.bacon_plot(sp.bacon_decomposition(...))` | §3 |
| 2a'' | CS-DID dynamic effects | `sp.did_summary_plot(sp.callaway_santanna(...))` | §3 |
| 2b | RD canonical plot | `sp.rdplot(df, y, x, c)` | §3 |
| 2b' | McCrary density | `sp.rddensity(df, x, c).plot()` | §3 |
| 2c | Matching love plot | `sp.match(...).plot()` | §3 |
| 2d | SCM trajectory | `sp.synth(...).plot()` · `sp.synthdid_plot(sp.sdid(...))` | §3 |
| 3 | Coefficient plot of main specs | `sp.coefplot(M1...M5, variables=["x"])` | §4 |
| 4a | Dose-response | `sp.dose_response(...).plot()` | §5 |
| 4b | CATE histogram | `sp.cate_plot(ml, kind="hist")` *(ml = `sp.metalearner(..., learner='dr')`)* | §5 |
| 4c | CATE by group bar | `g = sp.cate_by_group(ml, df, by=..., n_groups=4); sp.cate_group_plot(g)` | §5 |
| 5 | Robustness forest plot | `sp.coefplot(*rob.values(), variables=["x"])` | §7 |
| 5b | Specification curve | `sp.spec_curve(...).plot()` | §7 |
| 6 | Sensitivity dashboard | `sp.sensitivity_dashboard(result)` · `sp.sensitivity_plot(...)` | §7 |
| 7 | Final result.plot() | `result.plot()` (estimator-specific) | §8 |
> Every plotting function above accepts `ax=` so panels can be combined with matplotlib subplots, and returns a Figure that supports `.savefig(path, dpi=300)` for publication output.
---
## §A. Epidemiology / public health pipeline (Mode A)
> **Convention**: STROBE (observational) / TRIPOD-AI (prediction) reporting. The modern epi gold standard is **target-trial emulation** (Hernán & Robins) — write the protocol of the hypothetical RCT first, then emulate it with observational data using a doubly-robust estimator. Outcomes are commonly **risk differences, risk ratios, hazard ratios, or restricted mean survival time**, not just OLS coefficients. The skill mirrors the AER 8-section flow but swaps the Step-4 estimator stack and adds survival/MR-specific reporting rows.
Running example: `statin_initiation → 5-yr_MACE` in an EHR cohort (`patient_id / index_date / age / sex / ldl_baseline / comorbidity_index / followup_days / event`). The exposure is time-varying, confounders are time-varying, and competing-risk censoring matters — the canonical setting where naïve OLS / Cox-with-baseline-adjustment is biased.
### A.0 Cohort construction & target-trial protocol
```python
import statspai as sp
# Eligibility, treatment-strategy, time-zero, follow-up, outcome — written down BEFORE estimation
protocol = sp.target_trial.TargetTrialProtocol(
eligibility = "adults 40-75, LDL ≥ 130, no prior MI/stroke, no statin in 12mo washout",
treatment_strategies = ["initiate statin within 30d of index", "no statin within 30d"],
assignment = "observational; emulate randomization via IPTW + g-formula",
time_zero = "index_date (first eligible cardiology visit)",
followup_end = "first MACE / death / disenrollment / index_date + 5yr",
outcome = "first MACE (composite: MI, stroke, cardiovascular death)",
causal_contrast = "per-protocol risk difference at 5 years",
analysis_plan = "IPTW-MSM + g-formula + TMLE triplet; report all three with CIs",
baseline_covariates = ["age","sex","ldl_baseline","comorbidity_index","smoker"],
time_varying_covariates = ["ldl_current"],
)
cohort = sp.target_trial_emulate(df, protocol=protocol, id="patient_id", time="followup_days",
treat="statin_initiation", event="mace")
```
### A.1 Table 1 — baseline characteristics by exposure
```python
# Same sumstats stack as AER mode; binary 0/1 by= auto-renders Control/Treated.
mc = sp.mean_comparison(cohort, ["age","sex","ldl_baseline","comorbidity_index","smoker"],
group="statin_initiation", test="ttest",
title="Table 1. Baseline characteristics by statin initiation")
mc.to_word ("tables/table1_epi.docx")
mc.to_excel("tables/table1_epi.xlsx")
```
### A.2 Identification — DAG, propensity overlap, KM curves
```python
# 2.1 DAG (manual or LLM-assisted)
dag = sp.dag(["age","sex","ldl_baseline","comorbidity_index","statin_initiation","mace"])
dag.add_edges([("age","ldl_baseline"),("age","statin_initiation"),
("ldl_baseline","statin_initiation"),("statin_initiation","mace"),
("ldl_baseline","mace"),("comorbidity_index","statin_initiation"),
("comorbidity_index","mace")])
adj = dag.adjustment_set(treatment="statin_initiation", outcome="mace") # back-door set
# 2.2 Propensity-score overlap (positivity check; epi convention before any IPW)
# Returns a pd.Series of fitted PS — draw mirrored histograms by exposure.
ps = sp.propensity_score(cohort, treatment="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
method="logit")
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6,4))
ax.hist(ps[cohort["statin_initiation"]==1], bins=40, alpha=0.5, label="Treated")
ax.hist(ps[cohort["statin_initiation"]==0], bins=40, alpha=0.5, label="Control")
ax.set_xlabel("Estimated propensity score"); ax.legend()
fig.savefig("figures/figA1_ps_overlap.png", dpi=300)
# 2.3 Crude KM curves by exposure (descriptive identification graphic)
km = sp.kaplan_meier(cohort, duration="followup_days", event="mace", group="statin_initiation")
km.plot().savefig("figures/figA2_km.png", dpi=300)
```
### A.3 Main estimate — IPTW · g-formula · TMLE triplet (the modern epi standard)
Report **all three** in one `regtable` so the reader sees convergent doubly-robust evidence — this is the epi equivalent of the AER design horse race:
```python
# (1) IPTW marginal structural model
iptw = sp.msm(cohort, y="mace", treat="statin_initiation",
id="patient_id", time="month",
time_varying=["ldl_current","comorbidity_index"],
baseline=["age","sex"])
# (2) Parametric g-formula (g-computation)
gcomp = sp.gformula(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
time_varying=["ldl_current"],
intervention="always_treat", reference="never_treat")
# (3) TMLE — doubly robust, the modern gold standard.
# Pass an sklearn-style library list for nuisance learners; statspai stacks them
# internally via SuperLearner. Keep `outcome_library` and `propensity_library`
# explicit so the reviewer can see your nuisance choices.
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
sl_lib = [LogisticRegression(max_iter=1000),
GradientBoostingClassifier(),
RandomForestClassifier()]
tmle = sp.tmle(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
outcome_library=sl_lib, propensity_library=sl_lib)
# (3-bis) HAL-TMLE if you want a fully nonparametric variant
hal = sp.hal_tmle(cohort, y="mace", treat="statin_initiation",
covariates=["age","sex","ldl_baseline","comorbidity_index","smoker"],
variant="ate")
# Convergent-evidence table — risk difference at 5 years
rt = sp.regtable(iptw, gcomp, tmle, hal,
model_labels=["(1) IPTW-MSM","(2) g-formula","(3) TMLE","(4) HAL-TMLE"],
stats=["N","Effect type","Risk diff. (RD)","Risk ratio (RR)"],
title="Table 2. Effect of statin initiation on 5-yr MACE — convergent estimators")
rt.to_word ("tables/table2_epi.docx"); rt.to_excel("tables/table2_epi.xlsx")
```
### A.4 Survival outcomes — KM / AFT / restricted mean
```python
# Restricted mean survival time (RMST) at 5 yr — preferred over hazard ratio when PH fails
aft = sp.aft("Surv(followup_days, mace) ~ statin_initiation + age + sex + ldl_baseline",
cohort, family="weibull")
rt_surv = sp.regtable(aft,
stats=["N","Events","Median survival","RMST (5yr)","HR (PH)"],
title="Table 3. Survival analysis (Weibull AFT)")
rt_surv.to_word("tables/table3_survival.docx")
```
### A.5 Mendelian randomization (genetic IV — when relevant)
```python
# Standard MR triple: IVW → Egger → weighted median, on summary statistics
ivw = sp.mr_ivw (beta_exposure, beta_outcome, se_exposure, se_outcome)
egger = sp.mr_egger (beta_exposure, beta_outcome, se_exposure, se_outcome) # tests pleiotropy
median = sp.mr_median(beta_exposure, beta_outcome, se_exposure, se_outcome, penalized=True)
rt_mr = sp.regtable(ivw, egger, median,
model_labels=["IVW","MR-Egger","Weighted median"],
title="Table 4. Mendelian randomization — sensitivity stack")
rt_mr.to_word("tables/table4_mr.docx")
```
### A.6 Robustness — E-value, bounds, principal stratification
```python
# E-value: minimum strength of unmeasured confounding to explain away the result
ev = sp.evalue(estimate=tmle.point_estimate, ci=tmle.ci, measure="RR")
# → "E-value 1.84; CI E-value 1.42" (a confounder must be ~2x associated with both
# exposure and outcome to nullify the effect — interpret in your domain)
# Manski / Lee bounds when a covariate is missing-not-at-random
bds = sp.bounds(cohort, y="mace", treat="statin_initiation", method="manski")
# Principal stratification (e.g. always-takers / never-takers / compliers)
ps_strat = sp.principal_strat(cohort, y="mace", treat="statin_initiation",
instrument="zip_pharmacy_density",
strata="compliance_type")
```
### A.7 Reporting checklist (epi-specific footer for `notes=`)
When producing the Table-2 footer, include — in addition to the AER stars/SE language:
- Cohort size, person-years of follow-up, event count
- **Adjustment set** (variables in the back-door set, not just "controls")
- **Positivity diagnostic** (PS truncation rule, % of cohort with extreme weights)
- **E-value** for the main effect and its CI bound
- For survival: **proportional-hazards check** (Schoenfeld residuals p-value) or "PH violated, RMST reported instead"
- STROBE checklist completion (cite as a supplementary file)
> **Output path stays identical**: every estimator above returns a `CausalResult` and slots straight into `sp.regtable(...) / sp.collect(...) / sp.paper_tables(...)`. Doubly-robust estimators (TMLE, HAL-TMLE, AIPW) are preferred over single-robust IPTW or g-formula alone — report all three for transparency, but treat TMLE as the primary.
---
## §B. ML causal inference pipeline (Mode B)
> **Convention**: estimand-first, doubly-robust, ML-nuisance-learned, with **CATE distribution + policy value** as first-class outputs (not just a single ATE). The skill mirrors the AER skeleton but the Step-4 estimator stack is **DML + meta-learners + causal forest + neural-causal + BCF**, and Step-5 always reports a CATE distribution. Uncertainty is quantified by **conformal prediction** (`sp.conformal_causal`), not just normal-approximation SE.
Running example: a marketing uplift study — `treatment = personalized_offer`, `outcome = revenue_30d`, with 80+ covariates including text features (`prior_browsing_text`).
### B.0 Prep + nuisance super-learner
```python
import statspai as sp
# 0.1 Train/holdout split — DML uses cross-fitting internally, but holdout is for policy eval.
# statspai doesn't expose its own splitter; use sklearn directly.
from sklearn.model_selection import train_test_split
train, holdout = train_test_split(df, test_size=0.2, stratify=df["treatment"], random_state=42)
# 0.2 SuperLearner library for nuisance — stacks GBM / RF / Lasso (pass sklearn estimators).
from sklearn.linear_model import LogisticRegression, LassoCV
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier, RandomForestRegressor, RandomForestClassifier
sl_outcome = sp.super_learner(X=train[X_cols].values, y=train["revenue_30d"].values,
library=[LassoCV(), GradientBoostingRegressor(), RandomForestRegressor()],
n_folds=5, task="regression")
sl_treat = sp.super_learner(X=train[X_cols].values, y=train["treatment"].values,
library=[LogisticRegression(max_iter=1000),
GradientBoostingClassifier(), RandomForestClassifier()],
n_folds=5, task="binary")
```
### B.1 Estimand & DAG learning (Step 2 + 2.5 in ML key)
```python
q = sp.causal_question(treatment="treatment", outcome="revenue_30d",
population="marketed users", estimand="ate")
plan = q.identify(strategy="ignorability_under_X", X=X_cols)
# DAG learning (when domain DAG isn't given)
proposed = sp.llm_dag_propose(variables=X_cols + ["treatment","revenue_30d"],
domain="e-commerce uplift")
constrained = sp.pc_algorithm(train[X_cols + ["treatment","revenue_30d"]],
variables=X_cols + ["treatment","revenue_30d"], alpha=0.05)
validated = sp.llm_dag_validate(dag=proposed, data=train, alpha=0.05)
# Alternative learners: sp.notears(...), sp.causal_discovery(..., method="ges")
```
### B.2 Estimator stack — DML / meta-learner / GRF / neural / Bayesian
```python
# (1) DML — Chernozhukov double machine learning
dml = sp.dml(train, y="revenue_30d", d="treatment", X=X_cols,
model="plr", # plr / irm / iv / pliv
ml_g=sl_outcome, ml_m=sl_treat, n_folds=5)
# (2) Meta-learners — S / T / X / R / DR
ml_dr = sp.metalearner(train, y="revenue_30d", treat="treatment", covariates=X_cols,
learner="dr", # 's' / 't' / 'x' / 'r' / 'dr'
outcome_model="xgb", propensity_model="xgb")
# (3) Causal forest (GRF / honest splits)
cf = sp.causal_forest("revenue_30d ~ treatment | " + " + ".join(X_cols),
train, n_estimators=4000, honest=True)
# (4) Neural causal — Dragonnet / TARNet / CEVAE
dn = sp.dragonnet(train, y="revenue_30d", treat="treatment", covariates=X_cols,
repr_layers=(200,100), head_layers=(100,))
tar = sp.tarnet (train, y="revenue_30d", treat="treatment", covariates=X_cols)
# (5) Bayesian causal forest (full posterior over CATE)
bcf = sp.bcf(train, y="revenue_30d", treat="treatment", covariates=X_cols,
n_trees_mu=200, n_trees_tau=50)
# (6) Panel matrix completion (when units × periods)
mc = sp.matrix_completion(panel_df, y="revenue", d="treatment", unit="user_id", time="week")
# Convergent evidence table — same regtable / collect stack
rt = sp.regtable(dml, ml_dr, cf, dn, bcf,
model_labels=["(1) DML-PLR","(2) DR-Learner","(3) Causal forest",
"(4) Dragonnet","(5) BCF"],
stats=["N","ATE","CATE 5–95% range","Cross-fit folds","Nuisance R²"],
title="Table 2. ATE — ML estimator horse race")
rt.to_word ("tables/table2_ml.docx"); rt.to_excel("tables/table2_ml.xlsx")
```
### B.3 CATE distribution & subgroup view (the ML-causal headline)
```python
# 3.1 Per-row CATE — DR-learner / X-learner expose .cate_estimates directly
sp.cate_plot(ml_dr, kind="hist",
title="Figure B1. CATE distribution — DR-Learner") \
.savefig("figures/figB1_cate_dist.png", dpi=300)
# 3.2 CATE by group (skill quartiles, gender, channel, …)
g = sp.cate_by_group(ml_dr, train, by="customer_value_quartile", n_groups=4)
sp.cate_group_plot(g, title="Figure B2. CATE by customer-value quartile") \
.savefig("figures/figB2_cate_group.png", dpi=300)
# 3.3 Causal-forest local effect surface
cf.local_effects().plot(...).savefig("figures/figB3_local.png", dpi=300)
```
### B.4 Policy learning + off-policy evaluation
```python
# 4.1 Learn an interpretable policy tree from CATE estimates
pol_tree = sp.policy_tree(train, y="revenue_30d", d="treatment", X=X_cols, max_depth=3)
pol_tree.plot().savefig("figures/figB4_policy.png", dpi=300)
# 4.2 Safe policy under cost constraint
safe = sp.offline_safe_policy(holdout, state=X_cols, action="treatment",
reward="revenue_30d", cost="offer_cost", cost_threshold=2.50)
# 4.3 Off-policy evaluation on holdout — IPS / DR / SNIPS.
# sp.ope exposes estimator-level entry points: ips / direct_method / doubly_robust /
# snips / switch_dr. Each takes (X, actions, rewards, pi_b, pi_e[, reward_model]).
import numpy as np
X_test = holdout[X_cols].values
A_test = holdout["treatment"].values
R_test = holdout["revenue_30d"].values
pi_b = sl_treat.predict_proba(X_test)[:, 1] # behavior policy: SL on treatment
pi_e = pol_tree.predict(X_test) # evaluation policy (deterministic)
opv = sp.ope.doubly_robust(X_test, A_test, R_test, pi_b=pi_b, pi_e=pi_e,
reward_model=sl_outcome)
print(f"Policy value (DR): {opv.value:.3f} ± {opv.se:.3f}")
```
### B.5 Uncertainty + fairness + robustness
```python
# 5.1 Conformal prediction intervals on CATE — distribution-free coverage.
# sp.conformal_causal exposes conformal_cate / conformal_ite / conformal_continuous /
# conformal_fair / conformal_interference and more — pick by estimand.
cp = sp.conformal_causal.conformal_cate(train, y="revenue_30d", treat="treatment",
covariates=X_cols, alpha=0.10) # 90% PI
# 5.2 Subgroup fairness audit — DP / EO gaps across protected attributes.
# fairness_audit takes scored predictions on the dataset (not the model directly);
# either materialize predictions into a column or pass `predictor=`.
holdout = holdout.assign(pred=ml_dr.predict(holdout[X_cols]))
fair = sp.fairness.fairness_audit(holdout, predictions="pred",
protected="gender", labels="revenue_30d",
threshold=0.10)
# 5.3 Sensitivity dashboard — ATE robustness to unmeasured confounding
sd = sp.sensitivity_dashboard(dml, train,
dimensions=["unmeasured_confounding","positivity","model_misspec"])
sd.plot().savefig("figures/figB5_sensitivity.png", dpi=300)
# 5.4 (Reuse AER §7 robustness) Spec curve over nuisance choices
sc = sp.spec_curve(train, y="revenue_30d", x="treatment",
controls=[["age"],["age","gender"],X_cols],
se_types=["robust","cluster"])
sc.plot().savefig("figures/figB6_spec_curve.png", dpi=300)
```
### B.6 Reporting checklist (ML-causal-specific footer)
When producing the Table-2 footer, include — in addition to the AER stars/SE language:
- **Nuisance learners** used (e.g., "outcome: SuperLearner[xgb, rf, lasso, nn]; treatment: same")
- **Cross-fitting**: number of folds, sample-splitting scheme
- **Overlap diagnostic**: PS distribution range, `% trimmed`
- **CATE summary**: mean / 5–95% range / share with CATE > 0
- **Policy value**: off-policy DR value vs. random / vs. always-treat baselines
- **Conformal coverage**: empirical coverage of nominal 1−α PI on holdout
- **Fairness audit**: subgroup CATE gaps vs. acceptable thresholds
> **Doubly-robust DML / DR-Learner / TMLE are preferred over single-robust S- or T-learner alone.** Report S- or T-learner only as a baseline in the horse race. Always check overlap before reporting any IPW-flavored estimator.
---
## Method Catalog
### Classical
**Choose by FE structure:**
- **No FE / single low-cardinality FE** → `sp.regress` (statsmodels OLS wrapper)
- **High-dim FE absorption (`y ~ x | fe1 + fe2`)** → `sp.feols` (pyfixest backend, AER workhorse)
- **Two-way panel (entity × time)** → `sp.panel(...)` (linearmodels backend, standard panel diagnostics)
```python
sp.regress("y ~ x1 + x2", df, cluster="firm_id") # OLS — `|` is NOT FE here
sp.feols ("y ~ x1 + x2 | firm_id + year", df, vcov={"CRV1":"firm_id"})# OLS + 2-way FE absorbed
sp.feols ("y ~ x1 + x2 | firm_id", df, vcov={"CRV1":"firm_id+year"}) # 2-way cluster
sp.fepois ("count ~ x1 + x2 | firm_id", df, vcov={"CRV1":"firm_id"})# Poisson + FE (count outcomes)
sp.feglm ("y ~ x1 + x2 | firm_id", df, family="logit", vcov={"CRV1":"firm_id"}) # Logit + FE
sp.ivreg ("y ~ (x1 ~ z1 + z2) + x2", df, cluster="state") # IV/2SLS — (endog ~ instruments) + exog
sp.panel (df, "y ~ x1 + x2", entity="firm", time="year", method="fe") # Panel FE (within / between / RE / FD)
sp.heckman(df, y="wage", x=["age", "edu"],
select="in_labor_force", z=["marital", "kids"]) # Heckman selection
sp.qreg (df, formula="y ~ x1 + x2", quantile=0.5) # Quantile regression
```
> **`sp.regress` does NOT parse `|` as a FE separator** — it forwards the formula to statsmodels which treats `edu | firm_id` as a single garbage variable name. Use `sp.feols` (or `sp.panel`) whenever your formula has `|`. Models from `sp.regress`, `sp.feols`, `sp.ivreg`, `sp.panel`, `sp.fepois`, `sp.feglm`, `sp.qreg`, `sp.heckman` all flow through `sp.regtable / sp.coefplot / sp.collect / sp.paper_tables` — mix freely in the same table.
### Difference-in-Differences
```python
sp.did(df, y="y", treat="treated", time="post") # 2×2 DID (time = 2 values)
sp.callaway_santanna(df, y="y", g="first_treat_year", t="year", i="firm_id") # CS 2021
sp.sun_abraham(df, y="y", g="first_treat_year", t="year", i="firm_id") # SA 2021 event study
sp.bacon_decomposition(df, y="y", treat="treated", time="year", id="firm_id")# TWFE diagnostic
sp.continuous_did(df, y="y", dose="dose", time="year", id="firm_id") # Continuous treatment
sp.honest_did(cs_result, method="smoothness") # PT sensitivity (RR 2023) — needs CS/SA result
sp.event_study(df, y="y", treat_time="first_treat_year",
time="year", unit="firm_id", window=(-4, 4)) # Event-study coefficients
```
### Regression Discontinuity
```python
sp.rdrobust(df, y="y", x="running_var", c=0) # Sharp RD (CCT 2014)
sp.rdrobust(df, y="y", x="running_var", c=0, fuzzy="treatment") # Fuzzy RD
sp.rddensity(df, x="running_var", c=0) # McCrary density test
sp.rdmc(df, y="y", x="running_var", cutoffs=[0, 5, 10]) # Multi-cutoff RD
sp.rkd(df, y="y", x="running_var", c=0) # Regression kink
sp.rdplacebo(df, y="y", x="running_var", c=0,
placebo_cutoffs=[-2, -1, 1, 2]) # RD placebo
sp.rdbwsensitivity(df, y="y", x="running_var", c=0,
bw_grid=[0.5, 1.0, 1.5, 2.0]) # Bandwidth sensitivity
```
### Matching & Reweighting
```python
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="nearest") # PSM (default)
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="cem") # Coarsened EM
sp.ebalance(df, y="wage", treat="training", covariates=["age", "edu"]) # Entropy balancing
```
### Synthetic Control
```python
sp.synth(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000) # ADH SCM (method='augmented' default)
sp.sdid(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000) # Synthetic DID (Arkhangelsky et al. 2021)
sp.synth_time_placebo(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000,
n_placebo_times=10) # SCM in-time placebo
```
### ML Causal
```python
sp.dml(df, y="wage", treat="training", covariates=["age", "edu"], model="plr") # DML
sp.causal_forest(formula="wage ~ training | age + edu", data=df) # Causal Forest (formula API)
sp.metalearner(df, y="wage", treat="training", covariates=["age", "edu"], learner="dr") # DR-Learner
sp.tmle(df, y="wage", treat="training", covariates=["age", "edu"]) # Targeted MLE
sp.aipw(df, y="wage", treat="training", covariates=["age", "edu"]) # Augmented IPW
```
### Neural Causal
```python
sp.tarnet(df, y="wage", treat="training", covariates=["age", "edu"])
sp.cfrnet(df, y="wage", treat="training", covariates=["age", "edu"])
sp.dragonnet(df, y="wage", treat="training", covariates=["age", "edu"])
```
### Text Causal (v1.6 P1, experimental)
```python
sp.causal_text.text_treatment_effect(
df, text_col="doc", outcome="y", treatment="t",
covariates=["age", "edu"], embedder="hash", n_components=20) # Veitch–Wang–Blei 2020
sp.causal_text.llm_annotator_correct(
annotations_llm=df["t_llm"], # aligned pd.Series (all rows)
annotations_human=df["t_true"], # NaN where unlabelled
outcome=df["y"], covariates=df[["age", "edu"]],
method="hausman") # Egami et al. 2024
```
### Mechanisms / Decomposition
```python
sp.mediation(df, y="wage", d="training", m="hours_worked",
X=["age", "edu"]) # ACME / ADE
sp.decompose(...) # Oaxaca-Blinder / RIF / FFL / KOB
```
### Robustness, Sensitivity & Inference
```python
sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age", "edu"], ["age", "edu", "tenure"]])
sp.robustness_report(df, formula="wage ~ training + age + edu",
x="training", cluster_var="firm_id")
sp.subgroup_analysis(df, formula="wage ~ training + age + edu",
x="training", by={"gender": "female", "age_bin": "age_quartile"})
sp.oster_bounds(df, y="wage", treat="training",
controls=["age", "edu"], r_max=1.3) # Oster 2019
sp.unified_sensitivity(result, r2_treated=0.05, r2_controlled=0.10,
include_oster=True) # Cinelli-Hazlett + Oster
sp.sensitivity_dashboard(result)
sp.evalue(estimate=..., ci=(..., ...), measure="RR")
sp.twoway_cluster(result, df, cluster1="firm_id", cluster2="year") # two-way SE
sp.conley(result, df, lat="lat", lon="lon", dist_cutoff=100) # spatial HAC
fig = result.plot()
sp.interactive(fig) # WYSIWYG editor, 29 academic themes
```
---
## Common Mistakes
| Anti-pattern | Correct form |
|---|---|
| Reporting Table 2 without writing the estimating equation | Step 2 — write the equation + identifying assumption to `artifacts/empirical_strategy.md` *before* estimating |
| Skipping the event-study figure and going straight to the DID coefficient | Step 3.1 — `sp.event_study(...)` + `sp.enhanced_event_study_plot(...)` precedes the regression table |
| Reporting IV without first-stage F | Step 3.2 — `iv.summary()` reports first-stage F; bench-mark F ≥ 10 (≥ 23 for AR-equivalent inference) |
| Reporting RD without McCrary + binscatter | Step 3.3 — `sp.rddensity` + `sp.binscatter` |
| Single-spec main result with no robustness panel | Step 7 — placebo, Oster, honest_did, alt-SE, spec_curve are *expected*, not optional |
| Cluster at observation level when treatment is at firm/state level | Cluster at the level of treatment assignment; use `sp.twoway_cluster` if multi-dim |
| Raw panel → staggered DID without balance check | Run Step 0 `data_contract`; inspect `sp.balance_panel` output and cohort sizes |
| `spec_curve(controls=["a","b","c"])` (flat list) | `controls=[["a"], ["a","b"], ["a","b","c"]]` — each inner list = one spec |
| `sp.rdrobust(..., cutoff=0)` | Kwarg is `c=0` across `rdrobust` / `rkd` / `rdplacebo` / `rdbwsensitivity` |
| `sp.evalue(result)` | `sp.evalue(estimate=<point>, ci=(lo, hi), measure="RR")` |
| `sp.match(df, treat="t", y="y", ...)` | Signature is `(df, y, treat, covariates, ...)` — **y before treat** |
| `sp.sun_abraham(df, y, g, t)` — no unit id | Staggered DID **requires** `i=<unit_id>` |
| `sp.synth(..., treated_period=2000)` | Kwarg is `treatment_time=` (singular) |
| `sp.panel(df, formula, fe=True)` | Kwarg is `method="fe"` |
| `sp.robustness_report(result, ...)` | Takes `(data, formula, x, ...)` — not a result object |
| `sp.mediation(df, y, treat, mediator)` | Kwargs are `(df, y, d, m, X)` — `d` for treatment, `m` for mediator |
| Pre-computed embeddings to `text_treatment_effect` | Pass `text_col=<column_name>`; control vectorisation via `embedder=` |
| `llm_annotator_correct(df)` | Takes aligned `pd.Series` (not DataFrame); NaN for unlabelled rows |
| `sp.callaway_santanna(..., covariates=[...])` | Kwarg is `x=[...]`, not `covariates=` |
| `sp.subgroup_analysis(..., cluster=...)` | Kwarg is `robust='hc1'` (or `'hc0'`/`'hc2'`/`'hc3'`); no cluster slot |
| `sp.oster_delta(..., treat=, controls=, r_max=)` | Real signature: `(data, y, x_base, x_controls, r_max)` |
| `sp.power_did(..., power_target=...)` | Wrappers don't auto-solve. Use dispatcher: `sp.power('did', ..., power_target=..., n_periods=, n_treated_periods=)` |
| `sp.power_cluster_rct(n_clusters=..., power_target=...)` | Use dispatcher: `sp.power('cluster_rct', cluster_size=, icc=, effect_size=, power_target=)` |
| `sp.cate_group_plot(forest, group=...)` | Takes a DataFrame: `g = sp.cate_by_group(ml, df, by=..., n_groups=4); sp.cate_group_plot(g)`. Forest result lacks per-row CATEs — use `sp.metalearner(..., learner='dr')` |
| `sp.cate_plot(causal_forest_result, ...)` | Same — needs `metalearner` (or any X/DR/R-learner) result that exposes `.cate_estimates` |
| `sp.bjs_pretrend_joint(es)` | Real signature: `(cs_or_sa_result, data, y=, group=, time=, first_treat=, controls=)` — NOT `event_study()` output |
| `sp.honest_did(ols_result, ...)` | Only accepts CS / SA / `did_multiplegt` / `aggte(..., 'dynamic')` results — pass a `callaway_santanna` object |
| `sp.sumstats(df, groups={...}, ...)` | No `groups=` kwarg; loop `sp.sumstats(vars=v_panel, ...)` per panel and concat |
| `sp.sumstats(..., by="treat")` always shows numeric "0" / "1" panel headers | Binary 0/1 `by=` auto-renders as **Control / Treated** (no kwarg needed). For non-binary or alternative wording, pass `by_labels={0:"Untrained", 1:"Trained"}` |
| Fixing `fmt="%.0f"` (or any fixed format) on a regtable that mixes dollar-magnitude (~$1500) and elasticity-magnitude (~0.09) coefficients | Silently rounds the elasticities to `0` while stars survive — the LaLonde precision trap. Use `fmt="auto"` for magnitude-adaptive precision: thousands separator for ≥1000, integer for ≥100, 1 dp for ≥10, 2 dp for ≥1, 3 dp below |
| `plan.population` / `plan.equation` / `plan.threats` | Not exposed on `IdentificationPlan`. Available: `assumptions / estimand / estimator / fallback_estimators / identification_story / warnings / summary()`. Use `q.population / q.treatment / q.outcome` from the `CausalQuestion` |
| `sp.regtable(..., output="docx")` / `output="xlsx"` | Enum is `{"text","latex","tex","html","markdown","md","qmd","quarto","word","excel"}`. Either use `output="word"`/`"excel"` or — preferred — drop `output=` and call `.to_word(filename)` / `.to_excel(filename)` on the result |
| `sp.sumstats(..., output="docx")` returns plain text | `sumstats` doesn't natively emit binary docx/xlsx. For Word/Excel use `sp.collect().add_summary(...).save("file.docx")` or convert via `sp.mean_comparison(...).to_word(...)` |
| Hand-rolling Word from `pandas.DataFrame.to_string()` / writing LaTeX manually | `RegtableResult.to_word/.to_excel/.to_latex/.to_markdown/.to_html` already apply book-tab borders, AER stars, and the right SE label. `sp.collect()` bundles many such tables into one file |
| Forgetting `template="aer"` (or `qje`/`econometrica`/`restat`/`jf`/`jpe`/`restud`/`aeja`) on `regtable` | Without `template=`, you lose the journal-correct SE label, star levels, and notes. List presets via `sp.list_journal_templates()` |
| Saving each regression to its own `.tex` and stitching by hand in LaTeX | Use `sp.paper_tables(main=, heterogeneity=, robustness=, placebo=)` for a single multi-panel `.docx` / `.xlsx`, or `sp.collect()` for a full Word/Excel/Markdown bundle (Step 8.4) |
| `sp.regtable(..., keep=[focal_var])` (or `drop=["Intercept"]`) as the *default* for every table | AER convention is to **show every estimated parameter verbatim — controls AND the intercept** so the reader can verify the full spec. `regtable()` does this when you pass NEITHER `keep=` NOR `drop=`. Reserve `drop=["Intercept"]` for when you actively want to suppress the constant; reserve `keep=[focal]` for intentionally focal-only tables (IV first-stage triplet, interaction-form heterogeneity) — each with a comment explaining why |
| `sp.regress("y ~ x \| firm_id", df, cluster="firm_id")` for FE | **Silently produces wrong numbers** — `sp.regress` is a thin statsmodels OLS wrapper that does NOT parse `\|` as a FE separator; it interprets `x \| firm_id` as a single garbage variable name. Use `sp.feols("y ~ x \| firm_id", df, vcov={"CRV1":"firm_id"})` for any formula containing `\|`. Two-way cluster: `vcov={"CRV1":"firm_id+year"}` |
| `sp.feols(..., cluster="firm_id")` | feols uses pyfixest convention: `vcov={"CRV1":"firm_id"}` (one-way) or `vcov={"CRV1":"firm_id+year"}` (two-way). The `cluster=` kwarg is for `sp.regress` / `sp.ivreg` (statsmodels) only |
| `sp.twoway_cluster(feols_result, df, cluster1=, cluster2=)` | `sp.twoway_cluster` consumes statsmodels-backed results only. For feols, pass two-way directly: `sp.feols(..., vcov={"CRV1":"firm_id+year"})` |
| Trusting SEs without checking convergence / weak-IV / overlap | Always read `result.summary()` warnings and `result.diagnostics` |
---
## Agent Integration Pattern
```python
import statspai as sp
sp.list_functions() # discover
info = sp.describe_function("callaway_santanna") # understand
schema = sp.function_schema("callaway_santanna") # structured call spec
result = sp.callaway_santanna(df, y="y",
g="first_treat_year", t="year", i="firm_id")
print(result.summary())
result.to_latex("tables/did_results.tex")
```
---
## When to Use StatsPAI vs Alternatives
| Scenario | Use StatsPAI | Alternative |
|---|---|---|
| One-stop EDA → estimand → DAG → estimate → robustness pipeline | ✅ single import covers all eight AER sections | assemble pyfixest + econml + causalml + differences + ... |
| Agent-driven analysis with self-describing API | ✅ `list_functions` / `describe_function` / `function_schema` | statsmodels / pyfixest (no agent API) |
| Estimand-first "DID vs RD vs IV?" decision | ✅ `sp.causal_question` + `sp.causal` | manual judgement call |
| Stata → Python migration (same API names) | ✅ `sp.regress`, `sp.estat`, `sp.sumstats`, `sp.feols`, `sp.panel` (Stata `xtreg` → `sp.feols("y ~ x | id + year", df)` or `sp.panel(..., method="fe"/"re")`) | linearmodels (partial) |
| Full AER-style robustness gauntlet from one package | ✅ Oster / honest_did / E-value / Conley / 2-way / spec_curve / placebo all in `sp.*` | manually wire 5+ packages |
| **Epidemiology / public health** (target-trial emulation, IPTW + g-formula + TMLE triplet, MR, KM/AFT survival, E-value, STROBE/TRIPOD reporting) | ✅ `sp.target_trial.TargetTrialProtocol` + `sp.target_trial_emulate` + `sp.gformula` + `sp.msm` + `sp.tmle` + `sp.hal_tmle` + `sp.mendelian` (`sp.mr_ivw`/`sp.mr_egger`/`sp.mr_median`) + `sp.kaplan_meier` + `sp.aft` + `sp.evalue` + `sp.principal_strat` — see §A. | hand-stitched zEpid + lifelines + statsmodels + manual MR scripts |
| **ML causal inference** (DML / S/T/X/R/DR-Learner / causal forest / Dragonnet / TARNet / CEVAE / BCF / matrix completion / policy learning / OPE / conformal CATE / fairness audit / DAG learning) | ✅ `sp.dml` + `sp.metalearner` + `sp.causal_forest` + `sp.dragonnet`/`tarnet`/`cevae` + `sp.bcf` + `sp.matrix_completion` + `sp.policy_tree` + `sp.offline_safe_policy` + `sp.ope.*` + `sp.conformal_causal.*` + `sp.fairness.fairness_audit` + `sp.causal_discovery`/`pc_algorithm`/`notears`/`llm_dag_propose`+`llm_dag_validate` — see §B. | EconML + DoWhy + CausalML + GRF + zEpid + dowhy-gcm assembled by hand |