Researchbrycewang-stanfordFree

responsible-ai-guide

Resources for trustworthy, fair, and ethical AI research

Repo bundle on Versuzbrycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research747 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research Yours? Claim it ↗

§ 01 — Stats

Stars903

Prior1164

Quality—

Score—

Tasks—

§ 02 — Install

Get responsible-ai-guide.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

npx versuz@latest install brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-43-wentorai-research-plugins-skills-domains-ai-ml-respons

Or clone the repo

$git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge brycewang-stanford-awesome-agent-skills-for-empirical-research-skills-43-wentorai-research-plugins-skills-domains-ai-ml-respons↵

Show SKILL.md content (~1.1k tokens)

---
name: responsible-ai-guide
description: "Resources for trustworthy, fair, and ethical AI research"
metadata:
  openclaw:
    emoji: "⚖️"
    category: "domains"
    subcategory: "ai-ml"
    keywords: ["responsible AI", "AI ethics", "fairness", "trustworthy AI", "AI safety", "bias"]
    source: "https://github.com/AthenaCore/AwesomeResponsibleAI"
---

# Responsible AI Guide

## Overview

A comprehensive collection of resources for building trustworthy, fair, and ethical AI systems. Covers fairness metrics, bias detection and mitigation, explainability methods, privacy-preserving techniques, robustness testing, and governance frameworks. Essential reading for researchers working on AI safety, alignment, and deploying models in high-stakes domains.

## Topic Taxonomy

```
Responsible AI
├── Fairness
│   ├── Bias detection (data, model, outcome)
│   ├── Fairness metrics (demographic parity, equalized odds)
│   ├── Bias mitigation (pre/in/post-processing)
│   └── Intersectional fairness
├── Explainability
│   ├── Feature attribution (SHAP, LIME, IG)
│   ├── Concept-based (TCAV, concept bottleneck)
│   ├── Counterfactual explanations
│   └── Mechanistic interpretability
├── Privacy
│   ├── Differential privacy
│   ├── Federated learning
│   ├── Membership inference attacks
│   └── Machine unlearning
├── Robustness
│   ├── Adversarial attacks/defenses
│   ├── Distribution shift
│   ├── Uncertainty quantification
│   └── Out-of-distribution detection
├── Safety & Alignment
│   ├── RLHF and preference learning
│   ├── Constitutional AI
│   ├── Red teaming
│   └── Guardrails and filters
└── Governance
    ├── Model cards
    ├── Datasheets for datasets
    ├── AI impact assessments
    └── Regulatory compliance (EU AI Act)
```

## Key Tools

| Tool | Category | Purpose |
|------|----------|---------|
| **Fairlearn** | Fairness | Bias assessment + mitigation |
| **AI Fairness 360** | Fairness | IBM fairness toolkit |
| **SHAP** | Explainability | Shapley value explanations |
| **Captum** | Explainability | PyTorch interpretability |
| **Opacus** | Privacy | Differential privacy for PyTorch |
| **ART** | Robustness | Adversarial robustness toolbox |
| **Alibi** | Explainability | ML model explanations |

## Fairness Assessment

```python
from fairlearn.metrics import MetricFrame
from sklearn.metrics import accuracy_score, recall_score

# Assess fairness across demographic groups
metrics = MetricFrame(
    metrics={
        "accuracy": accuracy_score,
        "recall": recall_score,
    },
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=demographics,
)

print("Overall:")
print(metrics.overall)
print("\nBy group:")
print(metrics.by_group)
print("\nDifference (max - min):")
print(metrics.difference())
```

## Reading Roadmap

```markdown
### Foundations
1. "Fairness and Machine Learning" (Barocas, Hardt, Narayanan)
2. "Datasheets for Datasets" (Gebru et al., 2021)
3. "Model Cards for Model Reporting" (Mitchell et al., 2019)

### Fairness
4. "On Fairness and Calibration" (Pleiss et al., 2017)
5. "Fairness Through Awareness" (Dwork et al., 2012)

### Explainability
6. "A Unified Approach to Interpreting Model Predictions" (SHAP)
7. "Why Should I Trust You?" (LIME, Ribeiro et al., 2016)

### Safety
8. "Constitutional AI" (Bai et al., 2022)
9. "Red Teaming Language Models" (Perez et al., 2022)
10. "Scaling Monosemanticity" (Anthropic, 2024)
```

## Use Cases

1. **Bias auditing**: Check models for demographic biases
2. **Compliance**: EU AI Act and regulatory requirements
3. **Model documentation**: Model cards and impact assessments
4. **Research ethics**: Ethical considerations for AI research
5. **Course material**: Teach responsible AI principles

## References

- [AwesomeResponsibleAI](https://github.com/AthenaCore/AwesomeResponsibleAI)
- [Fairlearn](https://fairlearn.org/)
- [EU AI Act](https://artificialintelligenceact.eu/)