Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-gaslight-gatekeep-v1-v3-early-visualgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-gaslight-gatekeep-v1-v3-early-visual/SKILL.md---
name: gaslight-gatekeep-v1-v3-early-visual
description: "Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation t... Activation: brain, fmri, neural, neuroscience"
---
# Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
## Overview
Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety. We investigate this question by evaluating 12 open-weight vision-language models spanning 6 architecture families and a 40$\times$ parameter range (2
## Source Paper
- **Title:** Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
- **Authors:** Arya Shah, Vaibhav Tripathi, Mayank Singh et al.
- **arXiv:** [2604.13803v1](https://arxiv.org/abs/2604.13803v1)
- **Published:** 2026-04-15
- **Categories:** cs.CV, cs.AI
- **PDF:** [Download](https://arxiv.org/pdf/2604.13803v1)
## Core Concepts
### Key Contributions
### 1. Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understo
### 2. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open que
### Methodology
Primary methods: See paper for methodology details
## Implementation
```python
# Example implementation skeleton based on Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
import torch
import torch.nn as nn
class GaslightGatekeepV1V3EarlyVisualModel(nn.Module):
"""
Model architecture inspired by the paper:
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
"""
def __init__(self, input_dim=128, hidden_dim=256, output_dim=10):
super().__init__()
# Core components based on: See paper for methodology details
self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
)
self.head = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
features = self.encoder(x)
return self.head(features)
```
## Practical Applications
- **Analysis**: Application of See paper for methodology details for analysis
## References
- Arya Shah et al. (2026). "Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation." arXiv:2604.13803v1.
## Activation Keywords
- brain, fmri, neural, neuroscience