Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-causally-emergent-alignment-hypothesisgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-causally-emergent-alignment-hypothesis/SKILL.md--- name: causally-emergent-alignment-hypothesis description: > Causal emergence (ΦID) predicts and aligns with RL agent reward trajectories. The Causally Emergent Alignment Hypothesis states that successful RL agents exhibit causal emergence that is predictive of final reward early in training and whose representational dynamics align with reward improvement. Use when analyzing RL agent representations, measuring causal emergence in neural networks, studying AI-biology alignment, or investigating ΦID as a learning metric. arXiv: 2605.06746 --- # Causally Emergent Alignment Hypothesis **arXiv:** 2605.06746 (2026-05-07) **Authors:** Federico Pigozzi, Michael Levin **Categories:** cs.NE ## Core Hypothesis The **Causally Emergent Alignment Hypothesis** proposes that causal emergence — the degree to which an agent exerts unique predictive power on its future — is consistently predictive of final reward in RL agents early in training, and its representational dynamics align with reward improvement. ## Background - Biological agents increase their causal emergence after learning new memories - Causal emergence measures the degree of causal power an agent has over subsequent events - ΦID (Partial Information Decomposition) framework quantifies causal emergence - Gap: whether artificial RL agents exhibit similar causal emergence patterns ## Methodology ### ΦID Computation - Compute causal emergence of RL agent latent-space representations over their lifetimes - Use ΦID (Partial Information Decomposition) framework to estimate causal emergence - Track across training trajectories in multiple environments ### Experimental Setup - Multiple RL algorithms tested across diverse environments - Six environments arranged on a complexity spectrum - Different agent architectures evaluated - Causal emergence computed consistently across all conditions ## Key Findings 1. **Early prediction:** Causal emergence predicts final reward early in training 2. **Representational alignment:** Emergence dynamics align with reward improvement in most tasks 3. **Cross-environment robustness:** Pattern holds across diverse environments and algorithms 4. **Bio-artificial alignment:** Connects learning dynamics of biological and artificial agents ## Implications - Causal emergence as an **undisclosed axis of neural representation reorganization** in RL - Potential for causal interventions to improve RL agent training - Bridge between biological learning and artificial agent learning - New metric for evaluating agent quality beyond raw reward ## Applications - RL agent analysis and debugging - Early prediction of training success - Designing better RL architectures - Studying biological-artificial learning parallels - Causal representation analysis in neural networks ## Workflow 1. Train RL agent in target environment 2. Record latent-space representations over training lifetime 3. Compute ΦID-based causal emergence at regular intervals 4. Correlate causal emergence trajectory with reward trajectory 5. Use early emergence patterns to predict final performance 6. Design interventions to maximize causal emergence ## Pitfalls - ΦID computation is computationally expensive for high-dimensional representations - Results may vary across environment complexity levels - Causal emergence is necessary but not sufficient for good performance - Requires careful dimensionality reduction of latent spaces for ΦID estimation ## Activation Keywords causal emergence, ΦID, RL alignment, causal power agent, representation dynamics, biologically plausible RL, 2605.06746, Pigozzi Levin ## References - Paper: https://arxiv.org/abs/2605.06746 - PDF: https://arxiv.org/pdf/2605.06746