Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-heterogeneous-stochastic-momentum-admmgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-heterogeneous-stochastic-momentum-admm/SKILL.md---
name: heterogeneous-stochastic-momentum-admm
description: 分布式非凸复合优化的异构随机动量ADMM方法。通过节点自适应步长策略解耦算法稳定性与全局网络属性,实现任意连接拓扑下的鲁棒收敛。适用于分布式机器学习、联邦学习、网络优化。触发词:分布式优化、ADMM、随机优化、非凸优化、动量加速、异构网络、distributed optimization、stochastic ADMM、heterogeneous network。
user-invocable: true
---
# Heterogeneous Stochastic Momentum ADMM
分布式非凸复合优化的异构随机动量交替方向乘子法
## 核心方法论
**来源:** arXiv:2603.07682
**效用:** 0.98
### 问题背景
分布式随机非凸非光滑复合优化问题:
- 现有随机方法依赖全局网络参数(最大节点度、谱半径)
- 异构网络拓扑下步长必须保守减小以确保稳定性
- 造成严重性能瓶颈
### HSM-ADMM 创新点
| 特性 | 传统方法 | HSM-ADMM |
|------|---------|----------|
| 步长策略 | 全局一致 | 节点自适应 |
| 网络依赖 | 需要全局参数 | 无需全局知识 |
| 收敛复杂度 | O(ε⁻²) | **O(ε⁻¹·⁵)** |
| 通信开销 | 多变量传输 | **单变量传输** |
| 结构 | 多循环/双循环 | **单循环** |
### 核心算法
```python
import numpy as np
from typing import Callable, List, Tuple
class HSMADMM:
"""
Heterogeneous Stochastic Momentum ADMM
用于分布式非凸复合优化
"""
def __init__(
self,
n_nodes: int,
local_dims: List[int],
adjacency: np.ndarray,
degrees: np.ndarray,
rho_base: float = 1.0,
momentum_beta: float = 0.9
):
"""
参数:
n_nodes: 节点数量
local_dims: 各节点本地变量维度
adjacency: 邻接矩阵
degrees: 各节点度数
rho_base: 基础惩罚参数
momentum_beta: 动量系数
"""
self.n_nodes = n_nodes
self.local_dims = local_dims
self.adjacency = adjacency
self.degrees = degrees
self.rho_base = rho_base
self.beta = momentum_beta
# 节点自适应步长(核心创新)
self.rho_i = rho_base * np.sqrt(degrees) # 按度数缩放
# 初始化变量
self.x = [np.zeros(d) for d in local_dims] # 本地变量
self.z = {} # 共识变量(边级别)
self.lambda_ = {} # 对偶变量
self.momentum = [np.zeros(d) for d in local_dims] # 动量
# 初始化边级变量
for i in range(n_nodes):
for j in range(i + 1, n_nodes):
if adjacency[i, j] > 0:
self.z[(i, j)] = np.zeros(local_dims[i])
self.lambda_[(i, j)] = np.zeros(local_dims[i])
def step(
self,
grad_f: Callable[[int, np.ndarray], np.ndarray], # 本地梯度函数
prox_g: Callable[[int, np.ndarray, float], np.ndarray], # 近端算子
batch_size: int = 1
) -> Tuple[List[np.ndarray], float]:
"""
执行一步 HSM-ADMM 更新
参数:
grad_f: 本地目标函数梯度(随机)
prox_g: 非光滑部分的近端算子
batch_size: 小批量大小
返回:
更新后的变量列表,收敛指标
"""
primal_residual = 0.0
for i in range(self.n_nodes):
# 1. 随机梯度估计(带动量)
g_i = grad_f(i, self.x[i])
self.momentum[i] = self.beta * self.momentum[i] + (1 - self.beta) * g_i
# 2. 共识项梯度
consensus_grad = np.zeros_like(self.x[i])
neighbors = np.where(self.adjacency[i] > 0)[0]
for j in neighbors:
edge = (min(i, j), max(i, j))
consensus_grad += self.lambda_[edge] + self.rho_i[i] * (self.x[i] - self.z[edge])
# 3. 本地变量更新
total_grad = self.momentum[i] + consensus_grad
self.x[i] = self.x[i] - (1.0 / self.rho_i[i]) * total_grad
# 4. 近端更新(如果有非光滑部分)
self.x[i] = prox_g(i, self.x[i], 1.0 / self.rho_i[i])
# 5. 计算原始残差
for j in neighbors:
edge = (min(i, j), max(i, j))
primal_residual += np.linalg.norm(self.x[i] - self.z[edge])
# 6. 共识变量更新(边级)
for edge in self.z.keys():
i, j = edge
self.z[edge] = 0.5 * (self.x[i] + self.x[j])
# 7. 对偶变量更新
for edge in self.lambda_.keys():
i, j = edge
self.lambda_[edge] += self.rho_i[i] * (self.x[i] - self.z[edge])
return self.x, primal_residual
def solve(
self,
grad_f: Callable[[int, np.ndarray], np.ndarray],
prox_g: Callable[[int, np.ndarray, float], np.ndarray],
max_iter: int = 1000,
tol: float = 1e-6,
verbose: bool = True
) -> List[np.ndarray]:
"""
求解分布式优化问题
"""
for k in range(max_iter):
x, residual = self.step(grad_f, prox_g)
if verbose and k % 100 == 0:
print(f"Iter {k}: primal residual = {residual:.6f}")
if residual < tol:
if verbose:
print(f"Converged at iteration {k}")
break
return self.x
class AdaptiveStepSize:
"""
节点自适应步长策略
核心创新:将步长与本地度数关联,而非全局网络参数
"""
@staticmethod
def compute_local_rho(
degree: int,
rho_base: float = 1.0,
strategy: str = "sqrt"
) -> float:
"""
计算节点级步长
参数:
degree: 节点度数
rho_base: 基础步长
strategy: 缩放策略 ('sqrt', 'linear', 'log')
返回:
节点自适应步长
"""
if strategy == "sqrt":
return rho_base * np.sqrt(degree)
elif strategy == "linear":
return rho_base * degree
elif strategy == "log":
return rho_base * np.log(1 + degree)
else:
return rho_base
class STORMMomentum:
"""
STORM: Stochastic Recursive Momentum
实现最优 oracle 复杂度 O(ε^-1.5)
"""
def __init__(self, dim: int, eta: float = 0.1, c: float = 1.0):
self.dim = dim
self.eta = eta
self.c = c
self.momentum = np.zeros(dim)
self.prev_grad = None
def update(
self,
grad: np.ndarray,
grad_variance: float
) -> np.ndarray:
"""
更新动量估计
参数:
grad: 当前随机梯度
grad_variance: 梯度方差估计
返回:
更新后的动量
"""
if self.prev_grad is None:
self.momentum = grad
else:
# 自适应动量系数
beta = 1.0 - self.c * np.sqrt(grad_variance)
beta = max(0, min(1, beta))
self.momentum = beta * self.momentum + (1 - beta) * grad + beta * (grad - self.prev_grad)
self.prev_grad = grad
return self.momentum
```
### 理论保证
**定理(收敛性):**
- 在适当假设下,HSM-ADMM 在 $K$ 次迭代后达到:
$$\mathbb{E}[\|\nabla f(x_K)\|] \leq \mathcal{O}(K^{-1/3})$$
- Oracle 复杂度:$\mathcal{O}(\epsilon^{-1.5})$ 达到 $\epsilon$-驻点
**关键性质:**
1. 步长完全解耦全局网络属性
2. 适用于任意连通拓扑
3. 无需全局结构知识
## 应用场景
### 1. 分布式机器学习
```python
# 分布式训练示例
n_nodes = 10
dims = [100] * n_nodes
adjacency = generate_random_graph(n_nodes, p=0.3)
degrees = np.sum(adjacency, axis=1)
optimizer = HSMADMM(n_nodes, dims, adjacency, degrees)
# 定义本地梯度(基于本地数据)
def local_grad(node_id, x):
# 从本地数据采样
batch = sample_local_data(node_id, batch_size=32)
return compute_gradient(x, batch)
# 定义近端算子(如 L1 正则化)
def prox_l1(node_id, x, step):
return np.sign(x) * np.maximum(np.abs(x) - 0.01 * step, 0)
# 求解
result = optimizer.solve(local_grad, prox_l1)
```
### 2. 联邦学习
- 客户端本地更新
- 服务器共识聚合
- 降低通信开销
### 3. 网络资源分配
- 分布式功率控制
- 带宽分配优化
## 与其他方法对比
| 方法 | 复杂度 | 通信量 | 异构支持 |
|------|--------|--------|----------|
| DGD | O(ε⁻²) | 高 | 否 |
| EXTRA | O(ε⁻¹) | 高 | 否 |
| DLM | O(ε⁻¹) | 中 | 部分 |
| **HSM-ADMM** | **O(ε⁻¹·⁵)** | **低** | **是** |
## Activation Keywords
- 分布式优化
- ADMM
- 随机优化
- 非凸优化
- 动量加速
- 异构网络
- distributed optimization
- stochastic ADMM
- heterogeneous network
- federated learning optimization
## Tools Used
- numpy
- scipy
- networkx (拓扑分析)
## Instructions for Agents
1. 理解分布式优化问题:多节点协同优化目标
2. 掌握ADMM框架:分解-协调方法
3. 理解动量加速:STORM估计器
4. 注意节点自适应步长:按度数缩放
5. 应用场景:分布式机器学习、联邦学习
## Examples
```python
# 使用示例
from heterogeneous_stochastic_momentum_admm import HSMADMM
# 1. 定义网络拓扑
n_nodes = 5
adjacency = np.array([
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
[1, 1, 0, 1, 1],
[0, 1, 1, 0, 1],
[0, 0, 1, 1, 0]
])
degrees = np.sum(adjacency, axis=1)
# 2. 创建优化器
optimizer = HSMADMM(
n_nodes=n_nodes,
local_dims=[10] * n_nodes,
adjacency=adjacency,
degrees=degrees
)
# 3. 求解
result = optimizer.solve(local_grad, prox_g, max_iter=1000)
```
## 参考文献
- Xiong, Y., et al. (2026). "Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization" arXiv:2603.07682
- STORM: Cutkosky & Orabona (2019). "Momentum-based Variance Reduction in Non-Convex SGD"