Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-cheesebench-rodent-neurosciencegit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-cheesebench-rodent-neuroscience/SKILL.md---
name: cheesebench-rodent-neuroscience
description: "CheeseBench benchmark for evaluating LLMs on classical rodent behavioral neuroscience paradigms. Includes 9 tasks covering water maze, T-maze, Morris water maze, and other established behavioral tests. Cross-paradigm evaluation for neuroscience AI systems."
version: 1.0.0
metadata:
hermes:
source_paper: "CheeseBench: Evaluating LLMs on Rodent Behavioral Neuroscience (arXiv:2604.13661)"
tags: [neuroscience, benchmark, llm-evaluation, rodent, behavioral, paradigm]
---
# CheeseBench: Rodent Neuroscience LLM Evaluation
## Overview
Comprehensive benchmark evaluating LLMs on classical rodent behavioral neuroscience paradigms. Contains 9 tasks covering established behavioral tests (water maze, T-maze, open field, fear conditioning, etc.) for systematic evaluation of neuroscience AI systems.
## Benchmark Structure
| Task | Paradigm | Evaluation |
|------|----------|-----------|
| 1 | Water Maze | Spatial learning/memory |
| 2 | T-Maze | Working memory |
| 3 | Open Field | Locomotor activity |
| 4 | Fear Conditioning | Associative learning |
| 5 | Morris Water Maze | Spatial reference memory |
| 6 | Elevated Plus Maze | Anxiety-like behavior |
| 7 | Social Interaction | Social behavior |
| 8 | Novel Object Recognition | Recognition memory |
| 9 | Forced Swim | Behavioral despair |
## Usage
```python
def run_cheesebench(model, task_id=None):
"""Evaluate model on CheeseBench tasks."""
if task_id:
return evaluate_single_task(model, task_id)
return evaluate_all_tasks(model)
# Tasks probe understanding of:
# - Experimental design in neuroscience
# - Behavioral interpretation
# - Statistical analysis
# - Translational relevance
```
## Applications
- **Neuroscience AI evaluation**: Benchmark domain-specific reasoning
- **LLM capability assessment**: Test scientific reasoning in neuroscience
- **Educational tools**: Validate AI teaching assistants for neuroscience
- **Research assistance**: Evaluate AI support for experimental design
## References
- Original paper: arXiv:2604.13661v1
- Published: 2026-04-15