delta-aware-multi-agent-orchestration

Show SKILL.md content (~3.1k tokens)
---
name: delta-aware-multi-agent-orchestration
description: "DAOEF framework for scaling multi-agent edge systems beyond 100 agents without synergistic collapse. Three co-designed mechanisms: differential neural caching (delta-aware activation reuse), criticality-based action space pruning (O(n log n) coordination), and learned hardware affinity matching (GPU/CPU/NPU task routing). Activation: multi-agent edge orchestration, synergistic collapse, MADDPG scaling, differential caching, action space pruning, hardware affinity, edge computing latency, vision task scheduling, camera network coordination."
---

# Delta-Aware Multi-Agent Edge Orchestration (DAOEF)

> A co-designed framework that prevents synergistic collapse in multi-agent edge
> deployments (>100 agents) through three mechanisms: delta-aware neural caching,
> criticality-based action space pruning, and hardware affinity matching.

## Metadata

- **Source**: arXiv:2604.20129v1 [cs.LG, cs.DC, cs.PF, cs.SE]
- **Authors**: Samaresh Kumar Singh, Joyjit Roy
- **Published**: 2026-04-22
- **Title**: A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

## Core Problem: Synergistic Collapse

Scaling multi-agent reinforcement learning (e.g., MADDPG) beyond ~100 agents on
edge infrastructure causes **superlinear performance degradation** — not merely
additive slowdown, but cascading failures where multiple bottlenecks compound.

**Observed failure case** (Smart City, 150 cameras):
- Deadline Satisfaction Rate: 78% → 34% (drop of 44 points)
- Annual cost overruns: ~$180K

### Three Interacting Failure Factors

| Factor | Mechanism | Impact |
|--------|-----------|--------|
| **Exponential action-space growth** | K^N for N agents, K nodes | +85-120ms decision delay per step |
| **Computational redundancy** | 65% similar features in adjacent cameras | 35-42% cache hit with naive caching |
| **Task-agnostic scheduling** | Vision tasks routed to CPUs not GPUs | 2-5x slowdown, compounding across fleet |

These factors are **synergistic**: each amplifies the others. Fixing one in
isolation yields sublinear gains; all three must be co-designed.

## DAOEF: Three Co-Designed Mechanisms

### 1. Differential Neural Caching

**Problem**: Output-level caching misses 65% of reuse opportunities because
adjacent camera frames differ slightly but produce different final outputs.

**Solution**: Cache intermediate layer activations instead. Compute only the
**input delta** (difference between current and cached input), then forward
propagate the delta through remaining layers rather than recomputing from scratch.

**How it works**:
1. Store intermediate activations at a calibrated layer boundary (e.g., after
   ResNet block 3, before final classification head)
2. On new input, compute similarity score against cached input (e.g., SSIM or
   L2 distance on normalized features)
3. If similarity > threshold θ: compute delta = current_input - cached_input,
   forward only the delta through remaining layers
4. If similarity ≤ θ: full forward pass, update cache

**Calibrating the similarity threshold θ**:
- Too high → few cache hits, wasted storage
- Too low → accuracy degradation from accumulated approximation error
- Empirical approach: sweep θ on validation set, plot hit rate vs accuracy loss,
  select θ where accuracy loss stays within 2% tolerance
- Typical effective range: θ ∈ [0.85, 0.95] for cosine similarity on feature maps

**Expected improvement**: 2.1x higher cache hit ratios (72% vs 35% for output-
level caching) with <2% accuracy loss.

### 2. Criticality-Based Action Space Pruning

**Problem**: Full multi-agent coordination requires O(n²) pairwise interactions.
At 150+ agents, the decision space becomes intractable.

**Solution**: Organize agents into **priority tiers** based on task criticality,
then restrict full coordination to high-criticality agents only. Lower tiers use
simplified or greedy policies.

**Three-tiered priority filtering**:

```
Tier 1 (Critical)    → Full MARL coordination with all Tier 1 peers
                       O(k²) where k << n (typically 5-15% of agents)
Tier 2 (Important)   → Coordination with Tier 1 + local greedy optimization
                       O(k log k) local grouping
Tier 3 (Best-effort) → Pure greedy / rule-based, no inter-agent coordination
                       O(1) per agent
```

**Complexity reduction**: O(n²) → O(n log n) overall coordination cost.
**Optimality loss**: <6% vs full coordination on standard benchmarks.

**Assigning criticality tiers**:
- **Tier 1**: Safety-critical tasks, SLA-bounded deadlines, high-value cameras
  (e.g., traffic intersections, emergency corridors)
- **Tier 2**: Quality-of-service tasks, medium-priority monitoring
  (e.g., pedestrian zones, parking lots)
- **Tier 3**: Best-effort analytics, deferred processing acceptable
  (e.g., historical traffic analysis, periodic environment checks)

**Implementation pattern**:
```python
# Criticality assignment (static or dynamic)
def assign_tier(agent, workload_metrics):
    if agent.task.is_safety_critical or agent.sla_deadline < 500:
        return Tier.CRITICAL
    elif agent.task.qos_weight > 0.7:
        return Tier.IMPORTANT
    else:
        return Tier.BEST_EFFORT

# Tier-aware coordination
def coordinate(agents):
    critical = [a for a in agents if a.tier == Tier.CRITICAL]
    important = [a for a in agents if a.tier == Tier.IMPORTANT]

    # Full MARL for critical agents only
    critical_actions = marl_policy(critical)

    # Important agents coordinate with critical + local optimization
    important_actions = greedy_with_critical(important, critical_actions)

    # Best-effort agents: pure greedy
    best_effort_actions = greedy_local(
        [a for a in agents if a.tier == Tier.BEST_EFFORT]
    )

    return merge_actions(critical_actions, important_actions, best_effort_actions)
```

**Dynamic tier reassignment**: Re-evaluate tiers periodically (e.g., every 500
steps) based on changing workload conditions. Use hysteresis to prevent
thrashing: require sustained metric change over multiple windows before moving
an agent between tiers.

### 3. Learned Hardware Affinity Matching

**Problem**: Task-agnostic scheduling sends compute-intensive vision tasks to
CPUs, causing 2-5x slowdowns. Simple heuristics (e.g., "always GPU for vision")
fail for mixed workloads with heterogeneous accelerators.

**Solution**: Learn a **hardware affinity model** that maps task features to
optimal accelerator type (GPU, CPU, NPU, FPGA) based on historical performance
data.

**Feature space for affinity model**:
- Task type (classification, detection, segmentation, tracking)
- Input resolution and frame rate
- Model architecture (ResNet-50, YOLOv8, ViT, etc.)
- Batch size requirements
- Deadline slack (time remaining until SLA breach)
- Current accelerator utilization levels

**Training approach**:
1. Collect execution traces: (task_features, hardware) → (latency, energy)
2. Train a lightweight classifier (e.g., gradient-boosted trees or small MLP)
   to predict optimal accelerator
3. Deploy as a pre-scheduling filter before the MARL decision layer

**Expected improvement**: Prevents compounding mismatch penalties. Combined with
the other two mechanisms, contributes to the 1.45x multiplicative gain.

## Combined Results

When all three mechanisms are deployed together (not independently):

| Metric | Without DAOEF | With DAOEF | Improvement |
|--------|--------------|------------|-------------|
| Latency (200 agents) | 735ms | 280ms | 62% reduction |
| Latency growth | Superlinear | Sub-linear (up to 250 agents) | Scalability |
| Energy consumption | 117.5 MWh/yr | 44.7 MWh/yr | 62% savings |
| Multiplicative gain | 1.0x (baseline) | 1.45x over independent | Synergy confirmed |

The **1.45x multiplicative gain** is key evidence that the three mechanisms are
genuinely co-designed: their combined effect exceeds the sum of individual gains.

## Implementation Guide

### Prerequisites
- Multi-agent RL framework (e.g., PyMARL, PettingZoo, custom MADDPG)
- Edge cluster with heterogeneous accelerators (GPU, CPU, NPU, FPGA)
- Model serving infrastructure supporting intermediate activation access
  (e.g., TorchServe, Triton with custom hooks)
- Telemetry pipeline for collecting execution traces

### Step-by-Step Deployment

#### Phase 1: Instrumentation (1-2 weeks)
1. Add hooks to model serving pipeline to expose intermediate activations
2. Deploy telemetry collection for latency, energy, and cache hit metrics
3. Baseline: measure current performance without any DAOEF mechanism

#### Phase 2: Differential Caching (2-3 weeks)
1. Identify caching layer boundary (experiment with 2-3 layer splits)
2. Implement delta computation: store (input_hash, activation, timestamp)
3. Calibrate similarity threshold θ on validation workload
4. Deploy caching, measure hit rate improvement and accuracy impact

#### Phase 3: Action Space Pruning (2-3 weeks)
1. Classify agents into criticality tiers (start with static assignment)
2. Implement tier-aware coordination in the MARL policy
3. A/B test: compare full coordination vs tiered coordination
4. Add dynamic tier reassignment with hysteresis

#### Phase 4: Hardware Affinity (2-4 weeks)
1. Collect execution traces across all accelerator types
2. Train affinity classifier on historical data
3. Deploy as pre-scheduler before the MARL decision layer
4. Monitor and retrain affinity model as workload patterns shift

### Key Design Decisions

**Caching layer boundary selection**:
- Earlier layer → more reuse opportunity, but larger activation size
- Later layer → smaller activation, but less reuse (more task-specific)
- Recommendation: split after the feature extraction backbone, before task heads

**Tier size ratios** (starting point for 150-agent deployment):
- Tier 1: 10-20 agents (7-13%) — critical coordination group
- Tier 2: 40-60 agents (27-40%) — semi-coordinated group
- Tier 3: remaining agents — independent greedy

**Affinity model retraining cadence**:
- Retrain weekly or when workload drift detected (KL divergence > threshold)
- Maintain a shadow model for safe rollout testing

## Pitfalls

- **Cascade risk from tier misclassification**: If too many agents are assigned
  to Tier 3, quality degradation may be unacceptable. Start conservative (larger
  Tier 1 and 2) and shrink tiers as you validate performance.
- **Cache staleness in high-churn workloads**: If input distribution shifts
  rapidly (e.g., sudden weather change affecting camera feeds), cached
  activations become invalid. Implement TTL-based eviction (e.g., 30-60s).
- **Affinity model overfitting**: If trained on narrow workload patterns, the
  model may make poor predictions for edge cases. Maintain a fallback heuristic
  (e.g., "vision tasks → GPU unless NPU utilization > 90%").
- **Multiplicative gain is not guaranteed**: The 1.45x result depends on all
  three mechanisms being correctly implemented and tuned. Deploying only one or
  two mechanisms yields sublinear improvements.
- **Threshold calibration drift**: The similarity threshold θ for caching may
  need recalibration as models are updated or input distributions shift.
  Monitor cache hit rate and accuracy loss continuously.
- **Coordination overhead at tier boundaries**: Agents at the boundary between
  Tier 1 and Tier 2 may experience inconsistent policies. Use smooth transitions
  (weighted combination) rather than hard cutoffs.

## Related Skills

- knowledge-graph-ops
- arxiv-to-skill-research-workflow
delta-aware-multi-agent-orchestration

Get delta-aware-multi-agent-orchestration.

vz-bench-debug

vz-scrape-runner

Think you can beat it?