Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-llm-key-neurons-grammargit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-llm-key-neurons-grammar/SKILL.md---
name: brain-llm-key-neurons-grammar
description: "Brain-LLM analogy methodology for identifying grammar-specialized neurons in Large Language Models. Uses brain lesion study-inspired approaches to find POS-tag-specific neurons in Llama 3. Activation triggers: grammar neurons, LLM interpretability, part-of-speech, brain-LLM analogy, neuron identification, grammar subspace."
---
# Brain-LLM Key Neurons for Grammar Perception
> Identifying grammar-specialized neurons in Llama 3 using brain lesion study-inspired methodology
## Metadata
- **Source**: arXiv:2511.06519
- **Authors**: Sanaz Saki Norouzi, Mohammad Masjedi, Pascal Hitzler
- **Published**: 2025-11-09
- **Categories**: q-bio.NC, cs.AI, cs.CL
## Core Methodology
### Key Innovation
This research establishes a direct analogy between how the human brain processes grammatical categories and how Large Language Models (LLMs) handle part-of-speech tags. By treating the identification of specialized neurons as analogous to brain lesion studies, the authors demonstrate that:
1. **LLMs contain neurons specialized for specific grammatical categories** - Different neurons respond preferentially to different part-of-speech tags
2. **These neurons form a dedicated "grammar subspace"** - The specialized neurons occupy a coherent subspace within the model
3. **Activation patterns can reliably predict POS tags** - A classifier trained on key neuron activations achieves high accuracy on fresh data
4. **Pattern resembles brain lesion findings** - The specialization pattern mirrors findings from human brain lesion studies in neuroscience
### Technical Framework
**Step 1: Neuron Identification**
- Use Llama 3 as the target model
- Analyze activations for words belonging to different POS tags
- Identify neurons with highest activation correlation to specific grammatical categories
- Apply statistical thresholding to select "key neurons"
**Step 2: Subspace Analysis**
- Map the identified neurons to determine if they form a coherent subspace
- Analyze overlap between neuron sets for different POS tags
- Compare patterns to brain lesion study findings from neuroscience literature
**Step 3: Validation via Classification**
- Train a classifier on activation patterns of key neurons
- Use fresh test data to validate POS tag prediction capability
- Demonstrate that a small subset of neurons captures grammatical information
## Implementation Guide
### Prerequisites
```python
# Required libraries
import torch
import transformers
from transformers import AutoModel, AutoTokenizer
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from collections import defaultdict
```
### Step-by-Step
1. **Load Model and Prepare Data**
```python
# Load Llama 3 model and tokenizer
model_name = "meta-llama/Meta-Llama-3-8B"
model = AutoModel.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Prepare POS-tagged dataset (e.g., from Penn Treebank or Universal Dependencies)
def load_pos_dataset():
# Example: Load from CoNLL format
texts = []
pos_tags = []
# ... load your data
return texts, pos_tags
```
2. **Extract Hidden State Activations**
```python
def extract_activations(text, target_word_idx, layer_idx=-1):
"""Extract activations for a specific word position."""
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
# Get activations from specified layer
# Shape: [batch, seq_len, hidden_dim]
activations = outputs.hidden_states[layer_idx]
# Extract activations for target word
return activations[0, target_word_idx, :].cpu().numpy()
def collect_activations(texts, pos_tags_list):
"""Collect activations for all words with their POS tags."""
activations_by_pos = defaultdict(list)
for text, pos_tags in zip(texts, pos_tags_list):
tokens = tokenizer.tokenize(text)
word_positions = align_tokens_to_words(tokens, text)
for word_idx, pos_tag in enumerate(pos_tags):
act = extract_activations(text, word_positions[word_idx])
activations_by_pos[pos_tag].append(act)
return activations_by_pos
```
3. **Identify Key Neurons**
```python
def identify_key_neurons(activations_by_pos, top_k=100):
"""Identify neurons most correlated with each POS tag."""
all_pos_tags = list(activations_by_pos.keys())
# Stack all activations and create labels
X = []
y = []
for pos_tag, activations in activations_by_pos.items():
X.extend(activations)
y.extend([pos_tag] * len(activations))
X = np.array(X) # Shape: [num_samples, hidden_dim]
# For each POS tag, find neurons with highest correlation
key_neurons = {}
for pos_tag in all_pos_tags:
# Create binary labels
binary_labels = np.array([1 if label == pos_tag else 0 for label in y])
# Compute correlation for each neuron
correlations = []
for neuron_idx in range(X.shape[1]):
corr = np.corrcoef(X[:, neuron_idx], binary_labels)[0, 1]
correlations.append(abs(corr))
# Select top-k neurons
top_neurons = np.argsort(correlations)[-top_k:]
key_neurons[pos_tag] = top_neurons
return key_neurons
```
4. **Analyze Grammar Subspace**
```python
def analyze_subspace(key_neurons):
"""Analyze if key neurons form a coherent subspace."""
# Combine all key neurons
all_key_neurons = set()
for pos_tag, neurons in key_neurons.items():
all_key_neurons.update(neurons)
all_key_neurons = sorted(list(all_key_neurons))
# Compute overlap between POS-specific neuron sets
overlap_matrix = np.zeros((len(key_neurons), len(key_neurons)))
pos_tags = list(key_neurons.keys())
for i, pos1 in enumerate(pos_tags):
for j, pos2 in enumerate(pos_tags):
overlap = len(set(key_neurons[pos1]) & set(key_neurons[pos2]))
overlap_matrix[i, j] = overlap / min(len(key_neurons[pos1]), len(key_neurons[pos2]))
return all_key_neurons, overlap_matrix, pos_tags
```
5. **Train Validation Classifier**
```python
def validate_with_classifier(X, y, key_neurons_combined):
"""Train classifier using only key neuron activations."""
# Extract activations of key neurons only
X_key = X[:, key_neurons_combined]
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X_key, y, test_size=0.2, random_state=42, stratify=y
)
# Train classifier
clf = LogisticRegression(max_iter=1000, multi_class='multinomial')
clf.fit(X_train, y_train)
# Evaluate
train_acc = clf.score(X_train, y_train)
test_acc = clf.score(X_test, y_test)
return clf, train_acc, test_acc
```
### Complete Example
```python
# Main pipeline
def analyze_grammar_neurons(texts, pos_tags_list):
# Step 1: Collect activations
print("Collecting activations...")
activations_by_pos = collect_activations(texts, pos_tags_list)
# Step 2: Identify key neurons
print("Identifying key neurons...")
key_neurons = identify_key_neurons(activations_by_pos, top_k=100)
# Step 3: Analyze subspace
print("Analyzing grammar subspace...")
all_key_neurons, overlap_matrix, pos_tags = analyze_subspace(key_neurons)
print(f"Total unique key neurons: {len(all_key_neurons)}")
print(f"Subspace size: {len(all_key_neurons)} / {model.config.hidden_size}")
# Step 4: Validate with classifier
print("Training validation classifier...")
X = []
y = []
for pos_tag, activations in activations_by_pos.items():
X.extend(activations)
y.extend([pos_tag] * len(activations))
X = np.array(X)
clf, train_acc, test_acc = validate_with_classifier(X, y, all_key_neurons)
print(f"Classifier accuracy - Train: {train_acc:.3f}, Test: {test_acc:.3f}")
return {
'key_neurons': key_neurons,
'all_key_neurons': all_key_neurons,
'overlap_matrix': overlap_matrix,
'pos_tags': pos_tags,
'classifier': clf,
'accuracy': {'train': train_acc, 'test': test_acc}
}
```
## Applications
### LLM Interpretability
- **Understanding linguistic knowledge**: Identify where grammatical knowledge resides in models
- **Neuron specialization**: Distinguish between specialized and general-purpose neurons
- **Localization**: Pinpoint which layers and neurons encode specific linguistic features
### Model Editing
- **Targeted grammar modification**: Adjust specific grammatical behaviors without retraining
- **Controlled intervention**: Steer language generation through neuron manipulation
- **Style transfer**: Modify grammatical style by adjusting neuron activations
### Neuroscience Research
- **Cross-validation**: Validate brain-LLM analogies through comparative studies
- **Hypothesis generation**: Generate testable predictions about biological neural networks
- **Bridge building**: Connect artificial and biological language processing research
### Educational Tools
- **Visualization**: Create visualizations showing how LLMs "understand" grammar
- **Teaching aids**: Demonstrate neural network interpretability concepts
- **Interactive demos**: Build tools for exploring model internals
## Pitfalls
1. **Model Specificity**: Findings may not generalize across different LLM architectures
2. **Language Limitation**: Study focuses on specific languages; grammatical structures vary across languages
3. **Static Analysis**: Identifies specialization but doesn't capture dynamic processing over time
4. **Causal Claims**: Correlation doesn't imply causation; identified neurons may be downstream effects
5. **Layer Selection**: Results depend on which layers are analyzed; deeper layers may show different patterns
6. **Dataset Bias**: POS tag distribution in training data affects neuron identification
## Expected Results
Based on the paper:
- **Subspace size**: ~1000-2000 neurons out of 4096 (Llama 3 8B)
- **Classifier accuracy**: >85% on test data for POS tag prediction
- **Overlap**: Moderate overlap between POS-specific neuron sets (20-40%)
- **Pattern**: Grammar neurons tend to cluster in middle-to-late layers
## Related Skills
- neuroscience-of-transformers
- bleg-llm-functions-as-powerful-fmri
- contrastive-semantic-projection-neuron-labeling
- llm-concept-neurons-control
- neural-digital-twins-bci