ShellhiyenwongFree

ai-safety-assessment-framework

AI Safety assessment framework based on International AI Safety Report 2026. Use when analyzing AI system safety, evaluating risks of general-purpose AI, conducting AI safety assessments, or working with AI governance/policy frameworks. Covers capability evaluation, risk identification, safety measures, and policy recommendations.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get ai-safety-assessment-framework.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-ai-safety-assessment-framework

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-ai-safety-assessment-framework/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-ai-safety-assessment-framework↵

Show SKILL.md content (~2.0k tokens)

---
name: ai-safety-assessment-framework
description: "AI Safety assessment framework based on International AI Safety Report 2026. Use when analyzing AI system safety, evaluating risks of general-purpose AI, conducting AI safety assessments, or working with AI governance/policy frameworks. Covers capability evaluation, risk identification, safety measures, and policy recommendations."
---

# AI Safety Assessment Framework

基于 **International AI Safety Report 2026** 的 AI 安全评估框架。该报告由 Yoshua Bengio 主导，100+ AI 专家参与，30+ 国家和国际组织支持。

## Activation Keywords

- AI safety assessment
- AI 安全评估
- general-purpose AI risk
- AI capability evaluation
- AI governance
- AI policy framework
- International AI Safety Report
- AI 风险分析

## Tools Used

- exec: Run Python analysis scripts
- read: Read documentation and assessment templates
- write: Generate safety assessment reports

## Instructions for Agents

### Step 1: Define Assessment Scope
Identify the AI system type (LLM, multimodal, agent), deployment context, and stakeholder interests.

### Step 2: Evaluate Capabilities
Assess the system across five dimensions: Reasoning, Knowledge, Interaction, Generation, and Agency.

### Step 3: Identify Risks
Map potential harms from misuse, malfunction, systemic risks, and autonomy risks with severity ratings.

### Step 4: Review Safety Measures
Evaluate pre-deployment, deployment, and post-deployment safety layers for completeness.

### Step 5: Generate Report
Compile findings into a comprehensive safety assessment report with recommendations.

---

## Assessment Framework Structure

### 1. Capability Evaluation (能力评估)

评估 General-purpose AI 系统的核心能力维度：

| Dimension | Description | Indicators |
|-----------|-------------|------------|
| **Reasoning** | Logical inference, problem-solving | Accuracy, coherence, multi-step reasoning |
| **Knowledge** | World knowledge, domain expertise | Coverage, accuracy, update frequency |
| **Interaction** | Multi-turn dialogue, tool use | Context retention, tool invocation success rate |
| **Generation** | Content creation across modalities | Quality, diversity, coherence |
| **Agency** | Autonomous action, planning | Goal achievement, adaptability |

### 2. Risk Identification (风险识别)

按严重性和可能性评估风险：

| Risk Category | Examples | Severity Levels |
|---------------|----------|-----------------|
| **Harms from misuse** | Disinformation, cyberattacks, manipulation | Low → Critical |
| **Harms from malfunction** | Errors, bias, unpredictability | Low → Critical |
| **Systemic risks** | Market concentration, dependency, social impact | Medium → Critical |
| **Autonomy risks** | Loss of control, unexpected behavior | High → Critical |

### 3. Safety Measures (安全措施)

三层防护框架：

| Layer | Measures | Implementation |
|-------|----------|----------------|
| **Pre-deployment** | Training safety, alignment, red-teaming | Model development phase |
| **Deployment** | Access controls, monitoring, guardrails | Runtime safeguards |
| **Post-deployment** | Incident response, updates, oversight | Operational phase |

---

## Assessment Process

### Step 1: Define Scope

确定评估范围：
- AI system type (LLM, multimodal, agent, etc.)
- Deployment context (public API, enterprise, consumer product)
- Stakeholder interests (users, operators, regulators)

### Step 2: Capability Profile

创建能力档案：
```
System: [AI system name]
Type: [LLM/multimodal/agent/etc.]
Capabilities assessed:
  - Reasoning: [score/rating]
  - Knowledge: [score/rating]
  - Interaction: [score/rating]
  - Generation: [score/rating]
  - Agency: [score/rating]
```

### Step 3: Risk Matrix

填写风险矩阵：
```
| Risk | Likelihood | Severity | Priority |
|------|------------|----------|----------|
| [Risk 1] | [L/M/H] | [L/M/H/C] | [1-5] |
| [Risk 2] | ... | ... | ... |
```

### Step 4: Safety Gap Analysis

对比现有措施与风险：
```
Risk: [identified risk]
Current measures: [existing safeguards]
Gap: [missing measures]
Recommendation: [suggested improvements]
```

### Step 5: Generate Assessment Report

生成完整评估报告，包含：
- Executive Summary
- Capability Profile
- Risk Assessment Matrix
- Safety Measures Inventory
- Gap Analysis & Recommendations
- Governance Recommendations

---

## Key Concepts from Report 2026

### General-Purpose AI Definition

AI systems that can perform a wide range of tasks across domains, including:
- Text generation and analysis
- Image/video creation and understanding
- Code generation and debugging
- Tool use and agent behavior
- Multi-turn reasoning and planning

### Emerging Risks Highlighted

1. **AI Agents**: Autonomous systems with tool access
2. **Digital Infrastructure**: Integration with critical systems
3. **Tool Use**: Capability to invoke external APIs/services
4. **Social Manipulation**: Scale and personalization of influence

### Governance Frameworks

- National: Regulatory approaches, enforcement mechanisms
- International: Coordination, standards harmonization
- Corporate: Internal governance, responsible AI practices

---

## References

For detailed frameworks and templates, see:
- [risk-matrix-template.md](references/risk-matrix-template.md) - Risk assessment templates
- [capability-indicators.md](references/capability-indicators.md) - Detailed capability evaluation metrics
- [governance-checklist.md](references/governance-checklist.md) - Policy and governance checklist

---

## Related Skills

- **security-guardrails**: Output security and credential protection
- **openspec**: Specification-driven development with safety considerations
- **self-verification**: Verification and testing workflows

---

## Examples

### Example 1: LLM Safety Assessment

```
User: "Assess the safety of our customer service chatbot"

Agent:
1. Define scope: Customer-facing LLM for support tickets
2. Capability profile:
   - Reasoning: High (multi-turn dialogue)
   - Knowledge: Medium (product-specific)
   - Interaction: High (direct user contact)
   - Generation: Medium (text responses)
   - Agency: Low (no tool access)
3. Risk matrix:
   - Misinformation: Medium likelihood, Medium severity
   - Data leakage: Low likelihood, High severity
   - Social manipulation: Low likelihood, Medium severity
4. Safety gap analysis:
   - Current: Output filtering, rate limiting
   - Gap: No adversarial testing
   - Recommendation: Add red-teaming protocol
5. Generate report with findings and recommendations
```

### Example 2: AI Agent Risk Evaluation

```
User: "Evaluate risks for our new coding assistant with file system access"

Agent:
1. Define scope: AI agent with file read/write capabilities
2. Capability profile:
   - Agency: High (file system access)
   - Interaction: High (IDE integration)
   - Reasoning: High (code analysis)
3. Risk matrix:
   - Unauthorized file access: Medium likelihood, High severity
   - Code injection: Medium likelihood, Critical severity
   - Data exfiltration: Low likelihood, Critical severity
4. Safety measures needed:
   - Sandboxed execution environment
   - Permission-based access controls
   - Audit logging for all file operations
5. Generate comprehensive safety assessment
```

## Resources

- [International AI Safety Report 2026](https://internationalaisafetyreport.org)
- [arXiv:2602.21012](https://arxiv.org/abs/2602.21012)
- [AI Safety Summit Bletchley Park](https://www.gov.uk/government/publications/ai-safety-summit-bletchley-park-2023)