Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-41-sticerd-eee-sewage-econometrics-check-skills-identifygit clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.gitcp Awesome-Agent-Skills-for-Empirical-Research/SKILL.MD ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-41-sticerd-eee-sewage-econometrics-check-skills-identify/SKILL.md---
name: identify
description: Design or review identification strategy for the sewage-house-prices project. Produces strategy memos with estimand, assumptions, pseudo-code, robustness plan, falsification tests, and referee objection anticipation. This skill should be used when asked to "design the strategy", "identify the effect", "write a strategy memo", or "think through identification".
argument-hint: "[research question, approach name, or 'review existing']"
allowed-tools: ["Read", "Grep", "Glob", "Write", "Agent"]
---
# Identification Strategy
Design or review an identification strategy for the sewage-house-prices project.
**Input:** `$ARGUMENTS` — a research question, approach name (e.g. "hedonic", "dry spills"), or "review existing" to audit all current strategies.
---
## Project-Specific Context
### Existing Strategies
1. **Hedonic** — Cross-sectional: `log(price) ~ spill_metrics + controls | lsoa + year_quarter`. Assumption: spill exposure is conditionally exogenous given LSOA FE.
2. **Repeat sales** — Within-property: `Δlog(price) ~ Δspill_metrics | house_id`. Eliminates time-invariant unobservables.
3. **Long difference** — Grid-level: changes in average prices within 250m grids. Eliminates level differences.
4. **News/media DiD** — Treatment = post-media-coverage × exposure. Tests whether information matters for capitalisation.
5. **Upstream/downstream** — River network topology via PostGIS. Downstream sites receive upstream pollution. Tests directionality.
6. **Dry spills** — Spills without rainfall. If dry spills affect prices, suggests awareness/stigma channel over physical damage.
7. **Hydraulic capacity instrument** — Planned IV using sewer capacity as instrument for spill frequency.
### Key Data Features
- EDM data: 2021-2024+, high-frequency (event-level)
- Land Registry: universe of transactions
- Zoopla: rental listings
- Met Office: daily rainfall at LSOA level
- River networks: PostGIS topology
- Treatment radii: 250m, 500m, 1000m, 2000m, 5000m, 10000m
---
## Workflow
### Step 1: Context Gathering
1. Read existing manuscript sections in `docs/overleaf/` for how strategies are currently described
2. Read relevant analysis scripts in `scripts/R/09_analysis/`
3. Read `scripts/R/utils/spill_aggregation_utils.R` for treatment construction
4. Check `docs/overleaf/refs.bib` for methodological references
### Step 2: Strategy Development
For a new or revised strategy, produce:
1. **Strategy memo** — Design choice, estimand (ATT/ATE/LATE), key assumptions, comparison group
2. **Estimating equation** — LaTeX-formatted with clear variable definitions
3. **Pseudo-code** — Implementation sketch (what the R code will do)
4. **Robustness plan** — Ordered list with rationale:
- Radius sensitivity (250m → 10km)
- Time period variation (prior period vs full period)
- Alternative treatment measures (count vs hours vs binary)
- Subsample analysis (sales vs rentals, urban vs rural)
5. **Falsification tests** — What SHOULD NOT show effects and why
6. **Referee objection anticipation** — Top 5 objections with pre-emptive responses
### Step 3: Strategy Review
If reviewing an existing strategy:
#### Phase 1: Claim Identification
- What is the claimed design?
- What is the estimand?
- What is the treatment / comparison?
#### Phase 2: Core Design Validity
- Are identifying assumptions stated and defensible?
- Are the biggest threats acknowledged?
- Does the specification match the stated design?
#### Phase 3: Robustness Assessment
- Does the robustness plan address the right concerns?
- Are falsification tests well-chosen?
- Is there radius sensitivity analysis?
### Step 4: Present Results
```markdown
# Identification Strategy: [Approach]
**Date:** YYYY-MM-DD
**Design:** [Hedonic / Repeat Sales / Long Diff / DiD / IV / etc.]
**Estimand:** [ATT / ATE / LATE]
## Strategy Summary
[2-3 sentence description]
## Estimating Equation
$$\log(p_{it}) = \alpha + \beta \cdot \text{SpillMetric}_{it} + \gamma X_{it} + \mu_i + \delta_t + \varepsilon_{it}$$
## Key Assumptions
1. [Assumption 1] — [defense]
2. [Assumption 2] — [defense]
## Assessment: [SOUND / CONCERNS / CRITICAL ISSUES]
## Robustness Plan (ordered)
1. [Most important check]
2. [Second check]
...
## Falsification Tests
1. [Test 1] — [expected null and why]
## Anticipated Referee Objections
1. [Objection] — [Response]
## Next Steps
- [ ] Implement main specification
- [ ] Run falsification tests
- [ ] Generate pre-trend evidence
```
Save to `output/log/strategy_memo_[approach].md`.
---
## Principles
- **Catch problems before coding.** A flawed strategy caught now saves weeks of wasted analysis.
- **Multiple strategies are strength.** This paper uses 6+ approaches — consistency across them is the key argument.
- **Cross-reference approaches.** Each strategy should address threats the others cannot.
- **The user decides.** Present trade-offs, don't make choices unilaterally.
- **Strategy memo is the contract.** Once approved, analysis scripts implement it faithfully.