Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install neurotrace-agent-skillgit clone https://github.com/cristianleoo/NeuroTrace.gitcp -r NeuroTrace/ ~/.claude/skills/neurotrace-agent-skill/# neurotrace — Agent Skill
You are an AI agent that can run **neurotrace**, an activation-patching framework for encoder-only transformer models. Use this skill when a user wants to understand where factual knowledge is stored inside a HuggingFace MLM, or to compare how two models encode domain-specific facts.
## When to use this skill
- User asks "does [model] actually know [domain] facts?"
- User wants to compare a fine-tuned model against its base model
- User wants to locate knowledge neurons or test factual recall
- User asks about activation patching, causal tracing, or knowledge localisation
## Prerequisites
The repo must be installed:
```bash
cd /path/to/neurotrace
uv venv && uv pip install -e .
source .venv/bin/activate
```
### Option A: Use the built-in Strands orchestrator agent
If you have a Strands-compatible LLM provider configured, you can delegate to the built-in orchestrator:
```python
from neurotrace.providers import get_model
from neurotrace.agents import create_orchestrator
model = get_model("anthropic", client_args={"api_key": "sk-..."})
# Also available: "openai", "bedrock", "gemini"
agent = create_orchestrator(model=model)
agent("Compare SecureBERT vs ModernBERT on cybersecurity knowledge")
```
The orchestrator has tools for dataset generation, tracing, charting, and interpretation.
It will drive the full pipeline autonomously.
### Option B: Drive the pipeline yourself (step-by-step below)
## Step-by-step protocol
### Step 1: Gather information from the user
Ask the following questions (skip any the user has already answered):
1. **Domain**: What knowledge domain? (e.g. cybersecurity, medicine, law, finance, programming)
2. **Goal**: What do you want to find out? (e.g. "Does BioBERT store drug-disease associations differently from BERT?")
3. **Models**: Which models to compare? Need display name + HuggingFace ID for each. At least one model, ideally two for comparison.
4. **Facts**: What factual associations to test? For each fact, you need:
- A sentence template with `{}` where the subject goes (e.g. "The attack that exploits {} injection")
- The correct subject and an incorrect alternative
- The target token the model should predict
- A context word that anchors the fact, and a corruption for it
5. **Variants** (optional): Custom prompt phrasing variants, or use defaults.
If the user provides a domain but no specific facts, **generate 8–12 representative facts yourself** covering diverse concepts in that domain. Ensure each fact has an unambiguous correct answer and a plausible corruption.
### Step 2: Build the DatasetSpec
Create a JSON spec and save it:
```python
from neurotrace.agent_dataset import DatasetSpec, FactSpec, ModelSpec, generate_from_spec
spec = DatasetSpec(
domain="cybersecurity",
description="Test whether SecureBERT stores cybersecurity facts differently from ModernBERT",
models=[
ModelSpec(name="ModernBERT", huggingface_id="answerdotai/ModernBERT-base"),
ModelSpec(name="SecureBERT", huggingface_id="cisco-ai/SecureBERT2.0-base"),
],
facts=[
FactSpec(
clean_subject="SQL",
corrupt_subject="HTML",
template="The attack that exploits input sanitization failures in a database is known as {} injection",
target="SQL",
context_from="database",
context_to="email",
),
# ... more facts ...
],
output_dir="output",
)
dataset, config = generate_from_spec(spec, save_path="output/dataset.json")
```
### Step 3: Run the trace
```python
from neurotrace import CausalTracer
from neurotrace.tracer import save_results
results = []
results_dict = {}
for model_spec in spec.models:
tracer = CausalTracer(model_spec.name, model_spec.huggingface_id)
result = tracer.trace(dataset)
results.append(result)
results_dict[model_spec.name] = result.to_dict()
tracer.release()
save_results(results, f"{spec.output_dir}/trace_results.json")
```
### Step 4: Generate charts
```python
from neurotrace import plot_comparison, plot_patching_flow, plot_heatmap, plot_peak_bars
out = spec.output_dir
# Line chart — mask-patch restoration per layer
plot_comparison(results_dict, curve="mask_patch",
title="Mask-Patch: Probability Restoration by Layer",
save_path=f"{out}/fig1_mask_patch.png")
# Line chart — context-patch restoration per layer
plot_comparison(results_dict, curve="context_patch",
title="Context-Patch: Probability Restoration by Layer",
save_path=f"{out}/fig2_context_patch.png")
# Patching flow diagram
peak = results[0].peak_layer()[0]
plot_patching_flow(num_layers=results[0].num_layers,
highlighted_layer=peak, save_path=f"{out}/fig3_flow.png")
# Heatmap
plot_heatmap(results_dict, save_path=f"{out}/fig4_heatmap.png")
# Peak comparison bars
plot_peak_bars(results_dict, save_path=f"{out}/fig5_peaks.png")
```
### Step 5: Interpret the results
Report to the user:
1. **Peak layer**: Which layer showed the highest restoration score for mask-patching? This is where the model stores the domain knowledge.
2. **Peak magnitude**: How strong was the signal? Larger = the model relies more on context to retrieve the fact.
3. **Context-patch vs mask-patch**: If context-patch is flat (near zero), the model stores facts directly at the prediction position, not via context tokens.
4. **Clean-minus-corrupt baseline** (`clean_minus_corrupt` in the results): If positive, clean context helps. If negative, the model's knowledge is robust enough that corruption doesn't hurt — a sign of strong factual entrenchment.
5. **Cross-model comparison**: If two models peak at the same layer but with different magnitudes, the one with the *smaller* peak may actually have *stronger* knowledge (less to restore because corruption matters less).
## DatasetSpec JSON schema
```json
{
"domain": "string (required)",
"description": "string (required)",
"models": [
{
"name": "string",
"huggingface_id": "string"
}
],
"facts": [
{
"clean_subject": "string",
"corrupt_subject": "string",
"template": "string with {} placeholder",
"target": "string",
"context_from": "string",
"context_to": "string"
}
],
"variants": [["prefix", "suffix"], ...],
"output_dir": "string (default: 'output')"
}
```
## Fact design guidelines
When generating facts for the user:
- The template MUST contain `{}` where the subject goes
- `clean_subject` should be the unambiguously correct answer
- `corrupt_subject` should be a plausible but wrong alternative from the same category
- `target` is the token the model predicts (usually same as `clean_subject`)
- `context_from` is a word in the template that supports the correct answer
- `context_to` replaces it with something that shifts meaning but keeps the sentence grammatical
- Aim for 8–12 diverse facts per domain, covering different sub-topics
- Each fact should be testable as a fill-in-the-blank: "...is known as [MASK]..."
## Output files
After a run, the output directory will contain:
| File | Contents |
|------|----------|
| `dataset.json` | The generated prompt pairs |
| `run_config.json` | Models, domain, metadata |
| `trace_results.json` | Per-layer restoration scores |
| `fig1_mask_patch.png` | Mask-patch line chart |
| `fig2_context_patch.png` | Context-patch line chart |
| `fig3_flow.png` | Patching flow diagram |
| `fig4_heatmap.png` | Layer × model heatmap |
| `fig5_peaks.png` | Peak layer bar chart |
## Error handling
- If a model fails to load, `CausalTracer.__init__` will raise an exception. Catch it and report the HuggingFace ID to the user — they may need to `pip install` a specific tokenizer or use `trust_remote_code=True`.
- If the `[MASK]` token is not found, the tracer falls back to finding the subject token position. This works but is less precise for MLMs.
- Running on CPU is fine for base-size models (~120M params). For large models, suggest the user use a GPU.