Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-event2vec-neuromorphic-representationgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-event2vec-neuromorphic-representation/SKILL.md---
name: event2vec-neuromorphic-representation
description: "Event2Vec: Processing neuromorphic events directly via vector representations for efficient event camera data processing compatible with Transformer architectures. Activation triggers: event camera, neuromorphic vision, event2vec, DVS, asynchronous events, sparse events, event-based vision."
---
# Event2Vec: Processing Neuromorphic Events Directly by Representations in Vector Space
> Novel representation enabling direct processing of asynchronous, sparse neuromorphic event data in vector space, fully compatible with Transformer architectures while maintaining event camera advantages of temporal resolution, power efficiency, and dynamic range.
## Metadata
- **Source**: arXiv:2504.15371 [cs.CV]
- **Authors**: Wei Fang, Priyadarshini Panda
- **Published**: 2025-04-21 (v5: 2026-02-05)
- **Categories**: cs.CV (Computer Vision), cs.NE (Neural and Evolutionary Computing)
- **Code**: Available at https URL
## Core Methodology
### Key Innovation
Neuromorphic event cameras produce asynchronous, sparse event streams that are incompatible with standard deep learning pipelines. Event2Vec addresses this by:
- **Word-to-Event Analogy**: Drawing inspiration from word embeddings (Word2Vec)
- **Vector Representation**: Direct encoding of events in continuous vector space
- **Transformer Compatibility**: Seamless integration with standard Transformer architectures
- **Preserved Sparsity**: Maintains event advantages without conversion to dense frames
### Technical Framework
#### 1. Event Representation
Event cameras output asynchronous events:
$$e = (x, y, t, p)$$
Where:
- $(x, y)$: Pixel location
- $t$: Timestamp
- $p$: Polarity (+1 for ON event, -1 for OFF event)
#### 2. Event2Vec Embedding
Inspired by Word2Vec, events are mapped to a vector space:
- **Event Embeddings**: Each event type encoded as vector
- **Positional Embeddings**: Spatial and temporal location information
- **Contextual Embeddings**: Event relationships via attention
#### 3. Event Stream Encoding
```
Raw Events → Event2Vec Embedding → Transformer → Task Output
↓ ↓ ↓ ↓
Sparse Continuous Vectors Attention Classification/
Asynchronous Maintains Processing Regression
```
#### 4. Key Advantages
- **Parameter Efficiency**: Dramatically fewer parameters than frame-based
- **High Throughput**: Parallel processing of sparse events
- **Low Latency**: Direct event processing, no accumulation window
- **Scalability**: Effective at ultra-low spatial resolutions
## Implementation Guide
### Prerequisites
- Python 3.8+
- PyTorch 2.0+
- Event data library (e.g., tonic, Tonic)
- NumPy, Matplotlib
- Optional: CUDA for GPU acceleration
### Step-by-Step Implementation
```python
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import List, Tuple, Optional
import numpy as np
class Event2Vec(nn.Module):
"""
Event2Vec: Direct event representation in vector space
"""
def __init__(
self,
embedding_dim: int = 128,
num_polarities: int = 2, # ON and OFF events
spatial_bins: Tuple[int, int] = (128, 128), # H, W
temporal_resolution: float = 1e-3, # 1ms
max_time_window: float = 0.1, # 100ms
use_positional_encoding: bool = True,
num_heads: int = 8,
num_layers: int = 6,
dropout: float = 0.1
):
super().__init__()
self.embedding_dim = embedding_dim
self.num_polarities = num_polarities
self.spatial_bins = spatial_bins
self.temporal_resolution = temporal_resolution
self.max_time_window = max_time_window
# Event type embeddings (polarity)
self.polarity_embedding = nn.Embedding(num_polarities, embedding_dim)
# Spatial embeddings (grid-based or learned)
self.spatial_embedding = nn.Parameter(
torch.randn(spatial_bins[0], spatial_bins[1], embedding_dim // 2)
)
# Temporal embeddings (learned or sinusoidal)
if use_positional_encoding:
self.temporal_embedding = SinusoidalPositionalEncoding(
embedding_dim // 2,
max_len=int(max_time_window / temporal_resolution)
)
else:
self.temporal_embedding = nn.Embedding(
int(max_time_window / temporal_resolution),
embedding_dim // 2
)
# Projection to full embedding dimension
self.event_proj = nn.Linear(embedding_dim * 2, embedding_dim)
# Transformer encoder
encoder_layer = nn.TransformerEncoderLayer(
d_model=embedding_dim,
nhead=num_heads,
dim_feedforward=embedding_dim * 4,
dropout=dropout,
batch_first=True
)
self.transformer = nn.TransformerEncoder(encoder_layer, num_layers)
# Output head (task-specific)
self.output_head = nn.Identity() # To be defined
def encode_events(
self,
events: torch.Tensor,
spatial_size: Optional[Tuple[int, int]] = None
) -> torch.Tensor:
"""
Encode events to vector representations
Args:
events: [N, 4] tensor of (x, y, t, p) for N events
spatial_size: (H, W) spatial dimensions
Returns:
embeddings: [N, embedding_dim] event embeddings
"""
if spatial_size is None:
spatial_size = self.spatial_bins
x, y, t, p = events[:, 0], events[:, 1], events[:, 2], events[:, 3]
# Discretize coordinates
x_idx = (x * self.spatial_bins[1] / spatial_size[1]).long().clamp(0, self.spatial_bins[1] - 1)
y_idx = (y * self.spatial_bins[0] / spatial_size[0]).long().clamp(0, self.spatial_bins[0] - 1)
# Convert polarity to index (0 for ON, 1 for OFF)
p_idx = ((p + 1) / 2).long()
# Temporal index (relative to window start)
t_idx = (t / self.temporal_resolution).long().clamp(
0, int(self.max_time_window / self.temporal_resolution) - 1
)
# Get embeddings
polarity_emb = self.polarity_embedding(p_idx) # [N, embedding_dim]
spatial_emb = self.spatial_embedding[y_idx, x_idx] # [N, embedding_dim//2]
if hasattr(self.temporal_embedding, 'forward'):
temporal_emb = self.temporal_embedding(t_idx) # [N, embedding_dim//2]
else:
temporal_emb = self.temporal_embedding(t_idx)
# Combine spatial and temporal
spatiotemporal_emb = torch.cat([spatial_emb, temporal_emb], dim=-1)
# Combine all: polarity + spatiotemporal
combined = torch.cat([polarity_emb, spatiotemporal_emb], dim=-1)
# Project to embedding dimension
embeddings = self.event_proj(combined)
return embeddings
def forward(
self,
events: torch.Tensor,
spatial_size: Optional[Tuple[int, int]] = None
) -> torch.Tensor:
"""
Process event stream
Args:
events: [N, 4] events
spatial_size: (H, W) spatial dimensions
Returns:
output: Task-specific output
"""
# Encode events
event_embeddings = self.encode_events(events, spatial_size)
# Transformer processing
# Add batch dimension: [1, N, embedding_dim]
embeddings = event_embeddings.unsqueeze(0)
# Self-attention over events
encoded = self.transformer(embeddings)
# Global pooling or task-specific head
output = self.output_head(encoded)
return output
class SinusoidalPositionalEncoding(nn.Module):
"""
Sinusoidal positional encoding for temporal information
"""
def __init__(self, d_model: int, max_len: int = 5000):
super().__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(
torch.arange(0, d_model, 2).float() *
(-np.log(10000.0) / d_model)
)
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
self.register_buffer('pe', pe)
def forward(self, positions: torch.Tensor) -> torch.Tensor:
"""
Args:
positions: [N] temporal positions
Returns:
embeddings: [N, d_model]
"""
return self.pe[positions]
class Event2VecClassifier(nn.Module):
"""
Complete Event2Vec classifier for event-based datasets
"""
def __init__(
self,
num_classes: int,
embedding_dim: int = 128,
**event2vec_kwargs
):
super().__init__()
self.event2vec = Event2Vec(
embedding_dim=embedding_dim,
**event2vec_kwargs
)
# Classification head
self.classifier = nn.Sequential(
nn.LayerNorm(embedding_dim),
nn.Linear(embedding_dim, embedding_dim // 2),
nn.GELU(),
nn.Dropout(0.1),
nn.Linear(embedding_dim // 2, num_classes)
)
def forward(
self,
events: torch.Tensor,
spatial_size: Optional[Tuple[int, int]] = None
) -> torch.Tensor:
"""
Args:
events: [N, 4] event stream
spatial_size: (H, W) spatial size
Returns:
logits: [num_classes] classification logits
"""
# Process events
encoded = self.event2vec(events, spatial_size)
# Global average pooling over events
pooled = encoded.mean(dim=1) # [1, embedding_dim]
# Classify
logits = self.classifier(pooled.squeeze(0))
return logits
```
### Handling Event Streams Efficiently
```python
def preprocess_event_stream(
events: np.ndarray,
time_window: float = 0.1, # 100ms
spatial_crop: Optional[Tuple[int, int, int, int]] = None
) -> torch.Tensor:
"""
Preprocess raw event stream for Event2Vec
Args:
events: [N, 4] raw events (x, y, t, p)
time_window: Time window for processing
spatial_crop: (x_min, x_max, y_min, y_max) crop region
Returns:
processed: [M, 4] processed events
"""
# Filter by time
t_start = events[:, 2].min()
mask = (events[:, 2] - t_start) < time_window
events = events[mask]
# Spatial crop if specified
if spatial_crop:
x_min, x_max, y_min, y_max = spatial_crop
mask = (
(events[:, 0] >= x_min) & (events[:, 0] < x_max) &
(events[:, 1] >= y_min) & (events[:, 1] < y_max)
)
events = events[mask]
# Normalize coordinates
events[:, 0] -= x_min
events[:, 1] -= y_min
# Normalize timestamps
events[:, 2] -= events[:, 2].min()
return torch.from_numpy(events).float()
class EventBatcher:
"""
Batch sparse events for efficient processing
"""
def __init__(self, max_events_per_sample: int = 10000):
self.max_events = max_events_per_sample
def collate_events(
self,
event_list: List[torch.Tensor]
) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Collate variable-length event streams
Args:
event_list: List of [N_i, 4] event tensors
Returns:
batched_events: [total_N, 4] concatenated events
batch_indices: [total_N] batch assignment
"""
batched = []
batch_indices = []
for batch_idx, events in enumerate(event_list):
# Truncate if too many events
if len(events) > self.max_events:
events = events[:self.max_events]
batched.append(events)
batch_indices.extend([batch_idx] * len(events))
batched_events = torch.cat(batched, dim=0)
batch_indices = torch.tensor(batch_indices, dtype=torch.long)
return batched_events, batch_indices
```
### Training on Event Datasets
```python
import tonic
import tonic.transforms as transforms
from torch.utils.data import DataLoader
def get_dvs_gesture_loader(
batch_size: int = 32,
data_path: str = './data'
):
"""
Create DataLoader for DVS Gesture dataset
"""
# Event2Vec transform
transform = transforms.Compose([
transforms.ToTensor(),
transforms.DropEvent(p=0.1), # Random dropout for robustness
])
train_dataset = tonic.datasets.DVSGesture(
save_to=data_path,
train=True,
transform=transform
)
test_dataset = tonic.datasets.DVSGesture(
save_to=data_path,
train=False,
transform=transforms.ToTensor()
)
# Custom collate for variable-length events
def collate_fn(batch):
events_list = [item[0] for item in batch]
labels = torch.tensor([item[1] for item in batch])
# Batch events
batcher = EventBatcher()
batched_events, batch_indices = batcher.collate_events(events_list)
return batched_events, batch_indices, labels
train_loader = DataLoader(
train_dataset,
batch_size=batch_size,
shuffle=True,
collate_fn=collate_fn,
num_workers=4
)
return train_loader, test_dataset
def train_event2vec(
model: Event2VecClassifier,
train_loader: DataLoader,
num_epochs: int = 100,
device: str = 'cuda'
):
"""
Train Event2Vec on event dataset
"""
model = model.to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, num_epochs)
criterion = nn.CrossEntropyLoss()
for epoch in range(num_epochs):
model.train()
total_loss = 0
correct = 0
total = 0
for batched_events, batch_indices, labels in train_loader:
batched_events = batched_events.to(device)
labels = labels.to(device)
optimizer.zero_grad()
# Forward (process each sample)
logits_list = []
for i in range(len(labels)):
mask = batch_indices == i
sample_events = batched_events[mask]
logits = model(sample_events)
logits_list.append(logits)
logits = torch.stack(logits_list)
loss = criterion(logits, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
correct += (logits.argmax(dim=1) == labels).sum().item()
total += len(labels)
scheduler.step()
acc = 100. * correct / total
print(f'Epoch {epoch}: Loss={total_loss/len(train_loader):.4f}, Acc={acc:.2f}%')
```
## Applications
### 1. Gesture Recognition
- **DVS Gesture**: 11-class hand gesture dataset
- **ASL-DVS**: American Sign Language alphabet
- **DVS Lip**: Lip reading from events
- **Advantages**: Low latency, motion blur immunity
### 2. Autonomous Navigation
- **Obstacle Detection**: Fast response to motion
- **High Dynamic Range**: Works in extreme lighting
- **Low Power**: Suitable for drones, robots
- **Event Cameras**: DAVIS, ATIS, Prophesee
### 3. Surveillance
- **Motion Detection**: Event-triggered recording
- **Privacy-Preserving**: No full-frame capture
- **24/7 Operation**: Low power always-on
- **Anomaly Detection**: Sparse event analysis
### 4. Robotics
- **SLAM**: Event-based simultaneous localization
- **Visual Servoing**: High-speed tracking
- **Collision Avoidance**: Sub-millisecond latency
- **Industrial Inspection**: High-speed quality control
## Pitfalls
1. **Spatial Quantization**: Grid-based encoding loses sub-pixel precision
- *Mitigation*: Learned continuous spatial embeddings, attention mechanisms
2. **Temporal Alignment**: Events from different sources may need synchronization
- *Mitigation*: Temporal normalization, learned time warping
3. **Variable Event Count**: Different samples have different numbers of events
- *Mitigation*: Set-based processing, attention pooling, truncation
4. **Static Scenes**: No events in static regions
- *Mitigation*: Periodic frame integration, hybrid approaches
5. **Dataset Specificity**: Hyperparameters tuned per dataset
- *Mitigation*: Meta-learning, domain adaptation
## Related Skills
- snn-event-processing: Spiking neural networks for events
- neuromorphic-vision: General neuromorphic vision methods
- event-camera-denoising: Noise removal from event streams
- sparse-transformer: Efficient attention for sparse data
## References
```bibtex
@article{fang2025event2vec,
title={Event2Vec: Processing Neuromorphic Events Directly by Representations in Vector Space},
author={Fang, Wei and Panda, Priyadarshini},
journal={arXiv preprint arXiv:2504.15371},
year={2025}
}
```
## Further Reading
- Event Cameras: Gallego et al., "Event-based Vision: A Survey"
- Word2Vec: Mikolov et al., "Efficient Estimation of Word Representations"
- Transformer: Vaswani et al., "Attention is All You Need"
- Neuromorphic Vision: Davies et al., "Advancing Neuromorphic Computing"