Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-improved-convergence-decentralized-stochasticgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-improved-convergence-decentralized-stochastic/SKILL.md--- name: improved-convergence-decentralized-stochastic description: "Decentralized stochastic optimization has emerged as a fundamental paradigm for large-scale machine learning. However, practical implementations often rely on biased gradient estim... Activation: optimization, decentralized, stochastic" --- # Improved Convergence for Decentralized Stochastic Optimization with Biased Gradients ## Overview Decentralized stochastic optimization has emerged as a fundamental paradigm for large-scale machine learning. However, practical implementations often rely on biased gradient estimators arising from communication compression or inexact local oracles, which severely degrade convergence in the presence of data heterogeneity. To address the challenge, we propose Decentralized Momentum Tracking with Biased Gradients (Biased-DMT), a novel decentralized algorithm designed to operate reliably under biased gradient information. We establish a comprehensive convergence theory for Biased-DMT in nonconvex settings and show that it achieves linear speedup with respect to the number of agents. The theoretical analysis shows that Biased-DMT decouples the effects of network topology from data heterogeneity, enabling robust performance even in sparse communication networks. Notably, when the gradient oracle introduces only absolute bias, the proposed method eliminates the structural heterogeneity error and converges to the exact physical error floor. For the case of relative bias, we further characterize the convergence limit and show that the remaining error is an unavoidable physical consequence of locally injected noise. Extensive numerical experiments corroborate our theoretical analysis and demonstrate the practical effectiveness of Biased-DMT across a range of decentralized learning scenarios. ## Source Paper - **Title**: Improved Convergence for Decentralized Stochastic Optimization with Biased Gradients - **Authors**: Qing Xu, Yiwei Liao, Wenqi Fan, Xingxing You, Songyi Dian - **arXiv**: 2604.08236v1 - **Published**: 2026-04-09 - **Categories**: math.OC - **Primary Category**: math.OC ## Core Concepts This paper presents research on systems engineering with focus areas including: - Novel methodological frameworks - Theoretical foundations and analysis - Practical implementation strategies - Experimental validation ## Technical Contributions 1. **Novel Approach**: Advanced methodology for complex systems problems 2. **Theoretical Foundation**: Rigorous mathematical analysis 3. **Practical Implementation**: Real-world application and validation ## Applications - Systems engineering research and development - Distributed systems design and optimization - Control system implementation - Multi-agent coordination ## Implementation Guidelines 1. Review the source paper for detailed methodology 2. Understand the theoretical framework 3. Implement the proposed approach 4. Validate with appropriate experiments ## References - Qing Xu et al. (2026). "Improved Convergence for Decentralized Stochastic Optimization with Biased Gradients." arXiv:2604.08236v1. - arXiv URL: https://arxiv.org/abs/2604.08236v1 ## Activation Keywords optimization, decentralized, stochastic