acquire

Show SKILL.md content (~5.7k tokens)
---
namespace: aiwg
name: acquire
platforms: [all]
description: Download media from discovered sources with format selection and progress tracking
commandHint:
  argumentHint: --plan <sources.yaml> | --url <URL> [--format audio|video|best] [--output <dir>] [--parallel N]
  allowedTools: Bash, Read, Write, Glob, Grep
  model: sonnet
  category: media-curator
---

# /acquire

Download media from discovered sources with intelligent format selection, parallel execution, and comprehensive progress tracking.

## Purpose

The `/acquire` command orchestrates media downloads from various sources (YouTube, Internet Archive, Bandcamp, direct links) using the Acquisition Manager agent. It handles:

- Single URL or batch downloads from source plans
- Automatic format selection based on content type
- Parallel download execution with resource limits
- Real-time progress tracking and status reporting
- Error recovery with retry strategies
- Quality verification and metadata recording
- Network mount awareness for optimal performance

## Parameters

### Required (one of)

**`--plan <sources.yaml>`**
- Source plan file generated by `/find-sources` command
- YAML format containing URLs, metadata, and acquisition strategy
- Example: `.curator/sources/plan-001.yaml`

**`--url <URL>`**
- Single URL for one-off downloads
- Supports: YouTube, Internet Archive, Bandcamp, SoundCloud, Vimeo, direct links
- Example: `https://youtube.com/watch?v=abc123`

### Optional

**`--format <audio|video|best>`**
- Format preference for downloads
- `audio`: Extract audio only (Opus 128K default)
- `video`: Best video up to 1080p with audio
- `best`: Let agent decide based on content type (default)
- Example: `--format audio`

**`--output <directory>`**
- Base output directory for downloads
- Default: `./downloads/<timestamp>`
- Structure: `<output>/<artist>/<era>/audio/` and `/video/`
- Example: `--output /mnt/archive/music`

**`--parallel <N>`**
- Maximum concurrent downloads
- Range: 1-5 (default: 3)
- Lower for slow networks, higher for fast connections
- Example: `--parallel 5`

**`--local-work <directory>`**
- Local working directory (required for network mount outputs)
- Downloads happen here first, then copied to output
- Default: `/tmp/curator-work-$$`
- Example: `--local-work /fast-local-disk/tmp`

**`--verify-after`**
- Enable post-download verification
- Checks file integrity, size, media validity
- Adds ~5% overhead but prevents corruption
- Example: `--verify-after`

**`--extract-audio`**
- Automatically extract audio from downloaded videos
- Creates parallel audio files in audio/ directory
- Uses Opus 128K by default
- Example: `--extract-audio`

**`--resume <session-id>`**
- Resume interrupted download session
- Loads state from `.curator/sessions/<session-id>/state.json`
- Skips completed downloads, retries failed
- Example: `--resume 20260214-143022`

## Usage Examples

### Single URL Download

```bash
# Download single YouTube video (audio only)
/acquire --url "https://youtube.com/watch?v=abc123" --format audio

# Download single video with metadata
/acquire --url "https://youtube.com/watch?v=abc123" \
  --format video \
  --output /mnt/archive/concerts \
  --verify-after
```

### Batch Download from Plan

```bash
# Download entire source plan (generated by /find-sources)
/acquire --plan .curator/sources/plan-001.yaml

# Download to network mount (uses local working dir)
/acquire --plan .curator/sources/plan-001.yaml \
  --output /mnt/network-storage/archive \
  --local-work /fast-ssd/curator-work \
  --parallel 3
```

### Audio Extraction Workflow

```bash
# Download videos and auto-extract audio
/acquire --plan .curator/sources/plan-001.yaml \
  --format video \
  --extract-audio \
  --verify-after

# Result structure:
# downloads/artist/era/video/concert.mkv
# downloads/artist/era/audio/concert.opus
```

### Resume Interrupted Session

```bash
# Resume after network failure
/acquire --resume 20260214-143022

# Resume with different output (move completed files)
/acquire --resume 20260214-143022 \
  --output /new/location
```

## Workflow

### 1. Initialization

```bash
# Validate parameters
- Check --plan exists OR --url provided
- Verify output directory is writable
- Check required tools (yt-dlp, wget, curl, ffmpeg)
- Test network mount performance if applicable
- Create session directory structure

# Session setup
SESSION_ID=$(date +%Y%m%d-%H%M%S)
SESSION_DIR=".curator/sessions/$SESSION_ID"
mkdir -p "$SESSION_DIR"/{logs,metadata}

# Initialize state file
cat > "$SESSION_DIR/state.json" <<EOF
{
  "session_id": "$SESSION_ID",
  "started_at": "$(date -Iseconds)",
  "status": "initializing",
  "parameters": {
    "plan": "$PLAN_FILE",
    "format": "$FORMAT",
    "output": "$OUTPUT_DIR",
    "parallel": $PARALLEL
  },
  "downloads": []
}
EOF
```

### 2. Source Validation

```bash
# If --plan provided, parse YAML
if [[ -n "$PLAN_FILE" ]]; then
  # Extract URLs and metadata
  SOURCES=$(yq eval '.sources[] | .url' "$PLAN_FILE")
  TOTAL_SOURCES=$(echo "$SOURCES" | wc -l)

  # Validate each URL is accessible
  while IFS= read -r url; do
    if yt-dlp --dump-json "$url" >/dev/null 2>&1; then
      echo "VALID: $url"
    else
      echo "WARNING: Cannot access $url"
    fi
  done <<< "$SOURCES"
fi

# If --url provided, validate single URL
if [[ -n "$SINGLE_URL" ]]; then
  if ! yt-dlp --dump-json "$SINGLE_URL" >/dev/null 2>&1; then
    echo "ERROR: Cannot access URL: $SINGLE_URL"
    exit 1
  fi
fi
```

### 3. Directory Structure Creation

```bash
create_acquisition_structure() {
  local output_base="$1"
  local artist="$2"
  local era="$3"

  # Sanitize directory names
  local safe_artist=$(echo "$artist" | sed 's/[^a-zA-Z0-9_-]/_/g')
  local safe_era=$(echo "$era" | sed 's/[^a-zA-Z0-9_-]/_/g')

  # Create directory tree
  local audio_dir="$output_base/$safe_artist/$safe_era/audio"
  local video_dir="$output_base/$safe_artist/$safe_era/video"

  mkdir -p "$audio_dir/.curator"
  mkdir -p "$video_dir/.curator"
  mkdir -p "$output_base/$safe_artist/.curator"

  # Write artist metadata
  cat > "$output_base/$safe_artist/.curator/artist-info.json" <<EOF
{
  "name": "$artist",
  "era": "$era",
  "created_at": "$(date -Iseconds)",
  "session_id": "$SESSION_ID"
}
EOF

  echo "$audio_dir:$video_dir"
}
```

### 4. Launch Downloads

```bash
# Download orchestration with concurrency control
ACTIVE_DOWNLOADS=()
MAX_CONCURRENT=${PARALLEL:-3}

launch_download() {
  local url="$1"
  local output_dir="$2"
  local format="$3"
  local download_id="dl-$SESSION_ID-$(date +%s)"

  # Wait for slot availability
  while [[ ${#ACTIVE_DOWNLOADS[@]} -ge $MAX_CONCURRENT ]]; do
    check_completed_downloads
    sleep 2
  done

  # Determine target directory (audio vs video)
  local target_dir
  if [[ "$format" == "audio" || "$format" == "bestaudio" ]]; then
    target_dir="$output_dir/audio"
  else
    target_dir="$output_dir/video"
  fi

  # Launch download in background
  (
    download_with_retry "$url" "$target_dir" "$format" "$download_id"
    echo "COMPLETED:$download_id:$?" >> "$SESSION_DIR/completion.log"
  ) &

  local pid=$!
  ACTIVE_DOWNLOADS+=("$download_id:$pid")

  # Update state file
  update_state_file "add" "$download_id" "$url" "in_progress"
}

download_with_retry() {
  local url="$1"
  local target_dir="$2"
  local format="$3"
  local download_id="$4"

  local attempt=0
  local max_attempts=3

  while [[ $attempt -lt $max_attempts ]]; do
    attempt=$((attempt + 1))

    echo "[$download_id] Attempt $attempt/$max_attempts"

    if yt-dlp -f "$format" \
        --output "$target_dir/%(title)s.%(ext)s" \
        --write-info-json \
        --write-thumbnail \
        --no-playlist \
        "$url" \
        2>&1 | tee "$SESSION_DIR/logs/$download_id.log"; then

      update_state_file "complete" "$download_id" "" "completed"
      return 0
    fi

    # Check error type
    local error_type=$(classify_error "$SESSION_DIR/logs/$download_id.log")

    if [[ "$error_type" == "disk_full" || "$error_type" == "permission_denied" ]]; then
      echo "FATAL: $error_type - aborting all downloads"
      pkill -P $$  # Kill all child processes
      exit 2
    fi

    if [[ $attempt -lt $max_attempts ]]; then
      local wait_time=$((5 * attempt))
      echo "Waiting ${wait_time}s before retry..."
      sleep "$wait_time"
    fi
  done

  update_state_file "fail" "$download_id" "" "failed"
  return 1
}

check_completed_downloads() {
  local new_active=()

  for entry in "${ACTIVE_DOWNLOADS[@]}"; do
    local pid="${entry#*:}"
    if kill -0 "$pid" 2>/dev/null; then
      new_active+=("$entry")
    fi
  done

  ACTIVE_DOWNLOADS=("${new_active[@]}")
}
```

### 5. Track Progress

```bash
# Real-time progress monitoring
monitor_progress() {
  local session_dir="$1"

  while true; do
    # Load current state
    local state_file="$session_dir/state.json"

    local total=$(jq '.downloads | length' "$state_file")
    local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
    local in_progress=$(jq '[.downloads[] | select(.status == "in_progress")] | length' "$state_file")
    local failed=$(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file")

    # Clear screen and display status
    clear
    cat <<EOF
ACQUISITION PROGRESS
====================
Session: $SESSION_ID
Started: $(jq -r '.started_at' "$state_file")

Status: $completed/$total completed ($failed failed, $in_progress in progress)

Active Downloads:
EOF

    # Show active download details
    jq -r '.downloads[] | select(.status == "in_progress") | "  [\(.progress_percent)%] \(.filename) @ \(.speed_mbps)MB/s (ETA: \(.eta_seconds)s)"' "$state_file"

    # Exit if all done
    if [[ $((completed + failed)) -eq $total ]]; then
      echo ""
      echo "All downloads completed."
      break
    fi

    sleep 5
  done
}

# Run in background
monitor_progress "$SESSION_DIR" &
MONITOR_PID=$!
```

### 6. Extract Audio (if requested)

```bash
if [[ "$EXTRACT_AUDIO" == "true" ]]; then
  echo "Extracting audio from video files..."

  find "$OUTPUT_DIR" -type d -name "video" | while read -r video_dir; do
    local audio_dir="${video_dir%/video}/audio"

    find "$video_dir" -type f \( -name "*.mkv" -o -name "*.mp4" -o -name "*.webm" \) | while read -r video_file; do
      local basename=$(basename "$video_file" | sed 's/\.[^.]*$//')
      local audio_file="$audio_dir/${basename}.opus"

      echo "  $video_file -> $audio_file"

      ffmpeg -i "$video_file" \
        -vn \
        -acodec libopus \
        -b:a 128K \
        "$audio_file" \
        2>&1 | tee -a "$SESSION_DIR/logs/audio-extraction.log"

      if [[ ${PIPESTATUS[0]} -eq 0 ]]; then
        echo "SUCCESS: $audio_file"
      else
        echo "FAILED: $video_file"
      fi
    done
  done
fi
```

### 7. Verify Downloads (if requested)

```bash
if [[ "$VERIFY_AFTER" == "true" ]]; then
  echo "Verifying downloaded files..."

  local verification_results="$SESSION_DIR/verification.json"
  echo '{"verified": [], "failed": []}' > "$verification_results"

  find "$OUTPUT_DIR" -type f \( -name "*.opus" -o -name "*.mkv" -o -name "*.mp4" -o -name "*.flac" \) | while read -r file; do
    echo "Verifying: $file"

    # Check file size
    local size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file")
    if [[ $size -eq 0 ]]; then
      echo "FAILED: Zero-byte file"
      jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
      mv "$verification_results.tmp" "$verification_results"
      continue
    fi

    # Check media integrity with ffmpeg
    if ffmpeg -v error -i "$file" -f null - 2>&1 | grep -q "error"; then
      echo "FAILED: Corrupted media file"
      jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
      mv "$verification_results.tmp" "$verification_results"
      continue
    fi

    echo "VERIFIED: $file ($size bytes)"
    jq ".verified += [\"$file\"]" "$verification_results" > "$verification_results.tmp"
    mv "$verification_results.tmp" "$verification_results"
  done

  local verified_count=$(jq '.verified | length' "$verification_results")
  local failed_count=$(jq '.failed | length' "$verification_results")

  echo ""
  echo "Verification complete: $verified_count verified, $failed_count failed"
fi
```

### 8. Copy to Network Mount (if applicable)

```bash
if [[ -n "$LOCAL_WORK" && "$OUTPUT_DIR" != "$LOCAL_WORK"* ]]; then
  echo "Copying from local work directory to network mount..."

  # Batch copy with progress
  rsync -av --progress \
    --exclude='.DS_Store' \
    --exclude='Thumbs.db' \
    "$LOCAL_WORK/" "$OUTPUT_DIR/"

  # Verify checksums
  echo "Verifying network copy..."
  (cd "$LOCAL_WORK" && find . -type f -exec sha256sum {} \; | sort) > /tmp/local-checksums
  (cd "$OUTPUT_DIR" && find . -type f -exec sha256sum {} \; | sort) > /tmp/remote-checksums

  if diff /tmp/local-checksums /tmp/remote-checksums; then
    echo "Verification SUCCESS - removing local working copy"
    rm -rf "$LOCAL_WORK"
  else
    echo "WARNING: Checksum mismatch - keeping local copy at $LOCAL_WORK"
  fi
fi
```

## Progress Display Format

Real-time console output during acquisition:

```
ACQUISITION PROGRESS
====================
Session: 20260214-143022
Started: 2026-02-14T14:30:22Z

Status: 5/12 completed (1 failed, 3 in progress)

Active Downloads:
  [67%] concert-1.mkv @ 12.5MB/s (ETA: 245s)
  [23%] album.flac @ 8.2MB/s (ETA: 680s)
  [91%] podcast-ep5.opus @ 2.1MB/s (ETA: 45s)

Recent Completions:
  [✓] documentary.mp4 (1.2GB, 8m 34s)
  [✓] live-set.opus (85MB, 1m 12s)

Recent Failures:
  [✗] unavailable-video.mkv (Video unavailable - marked for alternate source)

Total Downloaded: 4.8GB / ~12.5GB estimated
Average Speed: 8.9MB/s
Estimated Completion: 14:58 (28 minutes remaining)
```

## Error Handling

### Common Errors and Responses

| Error | Detection | Response | User Action Required |
|-------|-----------|----------|---------------------|
| **URL inaccessible** | Pre-download validation fails | Skip URL, mark for manual review | Check URL, update source plan |
| **Network timeout** | Download stalls for 60s | Retry with exponential backoff (3x) | Check network connection |
| **Rate limited** | HTTP 429 response | Wait 60s, retry (2x) | Reduce parallel count |
| **Disk full** | Write fails with ENOSPC | STOP all downloads, escalate | Free disk space |
| **Format unavailable** | yt-dlp format error | Try fallback formats | Accept lower quality or skip |
| **Corrupted download** | ffmpeg verification fails | Delete and retry (2x) | Report to source |
| **Permission denied** | Write fails with EACCES | STOP, escalate | Fix permissions |
| **Mount failure** | Network mount timeout | Fall back to local-only mode | Check mount health |

### Error Log Structure

```bash
# Per-download error log: .curator/sessions/<session-id>/logs/<download-id>.log
[2026-02-14 14:35:22] Starting download: https://youtube.com/watch?v=abc123
[2026-02-14 14:35:23] Format selected: bestvideo[height<=1080]+bestaudio
[2026-02-14 14:37:45] ERROR: HTTP Error 429: Too Many Requests
[2026-02-14 14:37:45] Classified as: rate_limited
[2026-02-14 14:37:45] Applying retry strategy: wait 60s (attempt 1/2)
[2026-02-14 14:38:45] Retrying download...
[2026-02-14 14:42:10] Download completed: concert-1.mkv (1.2GB)
```

### Session Failure Recovery

```bash
# Detect stale sessions and offer recovery
detect_incomplete_sessions() {
  find .curator/sessions -name "state.json" -mtime -7 | while read -r state_file; do
    local status=$(jq -r '.status' "$state_file")
    local session_id=$(jq -r '.session_id' "$state_file")

    if [[ "$status" == "in_progress" ]]; then
      local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
      local total=$(jq '.downloads | length' "$state_file")

      echo "Incomplete session detected: $session_id ($completed/$total completed)"
      echo "Resume with: /acquire --resume $session_id"
    fi
  done
}
```

## Report Generation

### Final Session Report

```bash
generate_acquisition_report() {
  local session_dir="$1"
  local state_file="$session_dir/state.json"
  local report_file="$session_dir/report.md"

  cat > "$report_file" <<EOF
# Acquisition Session Report

**Session ID**: $(jq -r '.session_id' "$state_file")
**Started**: $(jq -r '.started_at' "$state_file")
**Completed**: $(date -Iseconds)
**Duration**: $(calculate_duration "$(jq -r '.started_at' "$state_file")" "$(date -Iseconds)")

## Summary

- **Total Downloads**: $(jq '.downloads | length' "$state_file")
- **Completed**: $(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file")
- **Failed**: $(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file")
- **Total Size**: $(jq '[.downloads[] | select(.status == "completed") | .filesize_bytes] | add | . / 1073741824' "$state_file") GB

## Successful Downloads

$(jq -r '.downloads[] | select(.status == "completed") | "- \(.filename) (\(.filesize_bytes | tonumber / 1048576 | floor)MB)"' "$state_file")

## Failed Downloads

$(jq -r '.downloads[] | select(.status == "failed") | "- \(.url)\n  Error: \(.error)"' "$state_file")

## Parameters

- **Source Plan**: $(jq -r '.parameters.plan // "N/A"' "$state_file")
- **Format Preference**: $(jq -r '.parameters.format' "$state_file")
- **Output Directory**: $(jq -r '.parameters.output' "$state_file")
- **Parallel Downloads**: $(jq -r '.parameters.parallel' "$state_file")

## File Locations

- **Session Directory**: $session_dir
- **Download Logs**: $session_dir/logs/
- **Metadata**: $session_dir/metadata/
- **Verification Results**: $session_dir/verification.json

EOF

  echo "Report generated: $report_file"
  cat "$report_file"
}
```

### Metadata Export

```bash
# Export metadata for external tools
export_metadata() {
  local session_dir="$1"
  local export_file="$session_dir/metadata-export.json"

  jq '{
    session_id: .session_id,
    started_at: .started_at,
    downloads: [
      .downloads[] | select(.status == "completed") | {
        url: .url,
        filename: .filename,
        format: .format,
        filesize_bytes: .filesize_bytes,
        duration_seconds: .duration_seconds,
        checksum_sha256: .checksum_sha256
      }
    ]
  }' "$session_dir/state.json" > "$export_file"

  echo "Metadata exported: $export_file"
}
```

## Integration with Other Commands

### Typical Workflow

```bash
# 1. Discover sources
/find-sources --artist "Pink Floyd" --era "1970s" --sources youtube,archive

# 2. Acquire media from discovered sources
/acquire --plan .curator/sources/plan-001.yaml \
  --format video \
  --extract-audio \
  --verify-after \
  --output /mnt/archive/pink-floyd

# 3. Extract metadata (runs automatically or manually)
/extract-metadata --source /mnt/archive/pink-floyd/The_Wall/audio

# 4. Organize final collection
/organize --source /mnt/archive/pink-floyd
```

## Performance Considerations

### Network Mount Optimization

- **Download to local first**: Always use `--local-work` when output is network mount
- **Batch copy**: Single rsync after all downloads complete
- **Verify checksums**: Before deleting local copy
- **Concurrent limit**: Reduce `--parallel` for network mounts (recommend 2)

### Disk Space Management

- **Pre-flight check**: Estimate total size from source plan
- **Monitor during**: Watch for disk space warnings
- **Cleanup strategy**: Remove verified local copies after network copy

### Network Bandwidth

- **Parallel tuning**: More concurrent downloads for high bandwidth
- **Rate limiting**: Add `--limit-rate` to yt-dlp for shared networks
- **Peak hours**: Schedule large downloads for off-peak times

## Security Considerations

- **URL validation**: Never execute arbitrary commands from URLs
- **Directory traversal**: Sanitize all path components
- **Disk quota**: Respect user/system quotas
- **Network access**: Only connect to explicitly allowed domains
- **Credential handling**: Never log or expose API keys/tokens

## References

- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before irreversible actions (overwrites, deletions)
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research sources before deciding on acquisition strategy
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used before acquisition
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/youtube-acquisition/SKILL.md — yt-dlp patterns; **read the Prerequisites section before any YouTube acquisition** (EJS / PO-token / pip-vs-zipapp gotchas)
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Verify downloaded files after acquisition
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Track origin and derivation of acquired media
Get acquire.

vz-bench-debug

vz-scrape-runner

Think you can beat it?