OtherhiyenwongFree

cheesebench-rodent-neuroscience

CheeseBench benchmark for evaluating LLMs on classical rodent behavioral neuroscience paradigms. Includes 9 tasks covering water maze, T-maze, Morris water maze, and other established behavioral tests. Cross-paradigm evaluation for neuroscience AI systems.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get cheesebench-rodent-neuroscience.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-cheesebench-rodent-neuroscience

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-cheesebench-rodent-neuroscience/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-cheesebench-rodent-neuroscience↵

Show SKILL.md content (~574 tokens)

---
name: cheesebench-rodent-neuroscience
description: "CheeseBench benchmark for evaluating LLMs on classical rodent behavioral neuroscience paradigms. Includes 9 tasks covering water maze, T-maze, Morris water maze, and other established behavioral tests. Cross-paradigm evaluation for neuroscience AI systems."
version: 1.0.0
metadata:
  hermes:
    source_paper: "CheeseBench: Evaluating LLMs on Rodent Behavioral Neuroscience (arXiv:2604.13661)"
    tags: [neuroscience, benchmark, llm-evaluation, rodent, behavioral, paradigm]
---

# CheeseBench: Rodent Neuroscience LLM Evaluation

## Overview

Comprehensive benchmark evaluating LLMs on classical rodent behavioral neuroscience paradigms. Contains 9 tasks covering established behavioral tests (water maze, T-maze, open field, fear conditioning, etc.) for systematic evaluation of neuroscience AI systems.

## Benchmark Structure

| Task | Paradigm | Evaluation |
|------|----------|-----------|
| 1 | Water Maze | Spatial learning/memory |
| 2 | T-Maze | Working memory |
| 3 | Open Field | Locomotor activity |
| 4 | Fear Conditioning | Associative learning |
| 5 | Morris Water Maze | Spatial reference memory |
| 6 | Elevated Plus Maze | Anxiety-like behavior |
| 7 | Social Interaction | Social behavior |
| 8 | Novel Object Recognition | Recognition memory |
| 9 | Forced Swim | Behavioral despair |

## Usage

```python
def run_cheesebench(model, task_id=None):
    """Evaluate model on CheeseBench tasks."""
    if task_id:
        return evaluate_single_task(model, task_id)
    return evaluate_all_tasks(model)

# Tasks probe understanding of:
# - Experimental design in neuroscience
# - Behavioral interpretation
# - Statistical analysis
# - Translational relevance
```

## Applications

- **Neuroscience AI evaluation**: Benchmark domain-specific reasoning
- **LLM capability assessment**: Test scientific reasoning in neuroscience
- **Educational tools**: Validate AI teaching assistants for neuroscience
- **Research assistance**: Evaluate AI support for experimental design

## References

- Original paper: arXiv:2604.13661v1
- Published: 2026-04-15