DocumenthiyenwongFree

gaslight-gatekeep-v1-v3-early-visual

Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation t... Activation: brain, fmri, neural, neuroscience

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get gaslight-gatekeep-v1-v3-early-visual.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-gaslight-gatekeep-v1-v3-early-visual

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-gaslight-gatekeep-v1-v3-early-visual↵

Show SKILL.md content (~860 tokens)

---
name: gaslight-gatekeep-v1-v3-early-visual
description: "Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation t... Activation: brain, fmri, neural, neuroscience"
---

# Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

## Overview

Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety. We investigate this question by evaluating 12 open-weight vision-language models spanning 6 architecture families and a 40$\times$ parameter range (2

## Source Paper

- **Title:** Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
- **Authors:** Arya Shah, Vaibhav Tripathi, Mayank Singh et al.
- **arXiv:** [2604.13803v1](https://arxiv.org/abs/2604.13803v1)
- **Published:** 2026-04-15
- **Categories:** cs.CV, cs.AI
- **PDF:** [Download](https://arxiv.org/pdf/2604.13803v1)

## Core Concepts

### Key Contributions

### 1. Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understo
### 2. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open que

### Methodology

Primary methods: See paper for methodology details

## Implementation

```python
# Example implementation skeleton based on Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
import torch
import torch.nn as nn

class GaslightGatekeepV1V3EarlyVisualModel(nn.Module):
    """
    Model architecture inspired by the paper:
    Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
    """
    def __init__(self, input_dim=128, hidden_dim=256, output_dim=10):
        super().__init__()
        # Core components based on: See paper for methodology details
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
        )
        self.head = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        features = self.encoder(x)
        return self.head(features)
```

## Practical Applications

- **Analysis**: Application of See paper for methodology details for analysis

## References

- Arya Shah et al. (2026). "Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation." arXiv:2604.13803v1.

## Activation Keywords

- brain, fmri, neural, neuroscience