regression-baseline

Show SKILL.md content (~3.4k tokens)
---
namespace: aiwg
name: regression-baseline
platforms: [all]
description: Create and maintain regression test baselines for comparison and drift detection across versioned snapshots

---

# regression-baseline

Create and maintain regression test baselines for comparison and drift detection.

## Triggers


Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

- "snapshot tests" → baseline creation shorthand
- "capture current behavior" → baseline establishment

## Purpose

This skill manages regression baselines by:
- Capturing known-good system states as baselines
- Storing snapshots of expected outputs
- Maintaining versioned baseline history
- Comparing current state to baseline
- Detecting drift from established baselines
- Managing baseline updates and approvals

## Behavior

When triggered, this skill:

1. **Identifies baseline scope**:
   - Determine what to baseline (tests, outputs, performance)
   - Select baseline type (functional, visual, performance, API)
   - Identify affected components
   - Check for existing baselines

2. **Captures baseline data**:
   - Run tests and capture outputs
   - Take snapshots (visual, data, API responses)
   - Measure performance metrics
   - Record system configuration
   - Generate checksums/hashes

3. **Stores baseline**:
   - Version baseline with semantic versioning
   - Tag with git commit/release
   - Store in `.aiwg/testing/baselines/`
   - Update baseline manifest
   - Link to requirements/features

4. **Validates baseline quality**:
   - Ensure baseline is stable (run multiple times)
   - Verify baseline is reproducible
   - Check for known issues in baseline
   - Require approval for critical baselines

5. **Documents baseline**:
   - Record what is baselined
   - Note when and why baseline was created
   - Document known deviations
   - Link to issues/requirements

6. **Manages baseline lifecycle**:
   - Archive old baselines
   - Update baselines on intentional changes
   - Track baseline drift over time
   - Alert on excessive drift

## Baseline Types

### Functional Baseline

```yaml
functional_baseline:
  description: Expected behavior outputs
  scope: Unit and integration tests
  format: JSON snapshots

  capture:
    - test_outputs
    - assertions
    - mock_data
    - expected_errors

  example:
    test: "user registration"
    baseline: "baselines/functional/user-registration-v1.json"
    includes:
      - success_response_format
      - validation_error_messages
      - database_state_after_registration
```

### Visual Baseline

```yaml
visual_baseline:
  description: UI screenshots for visual regression
  scope: Frontend components
  format: PNG images with metadata

  capture:
    - component_screenshots
    - page_screenshots
    - responsive_breakpoints
    - interaction_states

  example:
    component: "LoginForm"
    baseline: "baselines/visual/login-form-v2.png"
    metadata:
      viewport: 1920x1080
      theme: dark
      state: default
```

### Performance Baseline

```yaml
performance_baseline:
  description: Performance benchmarks
  scope: API response times, load tests
  format: JSON metrics

  capture:
    - response_times_p50_p95_p99
    - throughput_requests_per_second
    - resource_utilization
    - database_query_times

  example:
    endpoint: "/api/users"
    baseline: "baselines/performance/api-users-v1.json"
    metrics:
      p50_response_time_ms: 45
      p95_response_time_ms: 120
      p99_response_time_ms: 250
      throughput_rps: 1000
```

### API Contract Baseline

```yaml
api_baseline:
  description: API request/response contracts
  scope: REST/GraphQL APIs
  format: OpenAPI/JSON schemas

  capture:
    - request_schemas
    - response_schemas
    - status_codes
    - headers

  example:
    endpoint: "POST /api/auth/login"
    baseline: "baselines/api/auth-login-v3.yaml"
    contract:
      request_body_schema: LoginRequest
      success_response: 200 with token
      error_responses: [400, 401, 429]
```

## Baseline Manifest

```yaml
# .aiwg/testing/baselines/manifest.yaml

baselines:
  - id: functional-auth-v1
    type: functional
    created: 2026-01-28T10:00:00Z
    created_by: test-engineer
    git_commit: abc123def
    release: v1.2.0
    scope: Authentication flows
    files:
      - baselines/functional/login-v1.json
      - baselines/functional/logout-v1.json
      - baselines/functional/register-v1.json
    status: active
    approved_by: tech-lead
    notes: Initial baseline for auth module

  - id: visual-dashboard-v2
    type: visual
    created: 2026-01-20T14:30:00Z
    created_by: frontend-engineer
    git_commit: def456ghi
    release: v1.1.0
    scope: Dashboard UI components
    files:
      - baselines/visual/dashboard-desktop-v2.png
      - baselines/visual/dashboard-mobile-v2.png
    status: active
    approved_by: design-lead
    previous_baseline: visual-dashboard-v1
    changes: Updated color scheme per design system v2

  - id: performance-api-v1
    type: performance
    created: 2026-01-15T09:00:00Z
    created_by: devops-engineer
    git_commit: ghi789jkl
    release: v1.0.0
    scope: Core API endpoints
    files:
      - baselines/performance/api-benchmarks-v1.json
    status: active
    approved_by: architect
    notes: Pre-optimization baseline
```

## Baseline Comparison Report

```markdown
# Baseline Comparison Report

**Date**: 2026-01-28
**Baseline**: functional-auth-v1
**Current State**: HEAD (commit xyz789)
**Comparison Type**: Functional

## Executive Summary

**Status**: ⚠️ Drift Detected
**Changes**: 3 outputs differ from baseline
**Severity**: Medium - Unexpected behavior changes

| Metric | Baseline | Current | Drift |
|--------|----------|---------|-------|
| Tests Passing | 45/45 | 43/45 | -2 |
| Output Matches | 100% | 93.3% | -6.7% |
| New Failures | 0 | 2 | +2 |

## Detailed Comparison

### Test: user-login

**Status**: ⚠️ Drift

**Baseline** (functional-auth-v1):
```json
{
  "status": 200,
  "body": {
    "token": "jwt.header.payload.signature",
    "user": {
      "id": "uuid",
      "email": "user@example.com"
    }
  }
}
```

**Current**:
```json
{
  "status": 200,
  "body": {
    "token": "jwt.header.payload.signature",
    "user": {
      "id": "uuid",
      "email": "user@example.com",
      "name": "John Doe"  // NEW FIELD
    }
  }
}
```

**Analysis**: New `name` field added to response
**Impact**: Breaking change for clients expecting exact schema
**Recommendation**: Update baseline if intentional, or fix if bug

### Test: user-registration-invalid-email

**Status**: ❌ Failure

**Baseline**:
```json
{
  "status": 400,
  "body": {
    "error": "Invalid email format"
  }
}
```

**Current**:
```json
{
  "status": 500,
  "body": {
    "error": "Internal server error"
  }
}
```

**Analysis**: Validation now returns 500 instead of 400
**Impact**: Critical - Server error on invalid input
**Recommendation**: Fix immediately - regression in error handling

### Test: user-logout

**Status**: ✅ Match

Output matches baseline exactly. No drift detected.

## Drift Summary

| Test | Status | Drift Type | Severity |
|------|--------|------------|----------|
| user-login | ⚠️ Drift | Schema change | Medium |
| user-registration | ✅ Match | None | - |
| user-registration-invalid | ❌ Fail | Error code | High |
| user-logout | ✅ Match | None | - |
| token-refresh | ✅ Match | None | - |

## Root Cause Analysis

### Likely Causes

1. **user-login drift**: Intentional feature addition (commit abc123)
   - PR #789: "Add user name to login response"
   - Needs baseline update

2. **invalid-email failure**: Unintentional regression
   - Validation middleware broken
   - Returns 500 instead of 400
   - Introduced in commit def456

## Recommendations

### Immediate Actions

- [ ] Fix validation error handling (returns 500 instead of 400)
- [ ] Add test for proper error codes
- [ ] Verify error handling across all validation

### Baseline Updates

- [ ] Update baseline for user-login if name field is intentional
- [ ] Document breaking change in API changelog
- [ ] Notify API consumers of schema change

### Process Improvements

- [ ] Require baseline review for API changes
- [ ] Add baseline diff to PR template
- [ ] Run baseline comparison in CI

## Approval Required

To update baseline with new outputs:

```bash
# Review changes
aiwg baseline compare functional-auth-v1

# Update if intentional
aiwg baseline update functional-auth-v1 \
  --approve-changes user-login \
  --justification "Added name field per REQ-123"

# Or create new baseline version
aiwg baseline create functional-auth-v2 \
  --based-on v1 \
  --changes "Added user name to login response"
```
```

## Usage Examples

### Create New Baseline

```
User: "Create baseline for authentication tests"

Skill executes:
1. Identify auth test suite
2. Run tests and capture outputs
3. Validate stability (run 3 times)
4. Store baseline with version
5. Update manifest

Output:
"Baseline Created: functional-auth-v1

Scope: Authentication tests (45 tests)
Format: JSON snapshots
Location: .aiwg/testing/baselines/functional/auth/

Files:
- login-v1.json
- logout-v1.json
- register-v1.json
- token-refresh-v1.json

Status: Active
Git Commit: abc123
Release: v1.2.0

Next: Baseline will be used for regression detection"
```

### Update Existing Baseline

```
User: "Update baseline for dashboard UI"

Skill checks current vs baseline:
"Comparing to visual-dashboard-v1:
- 5 screenshots differ
- Changes: Color scheme updated

Approve changes?
- dashboard-desktop.png
- dashboard-mobile.png
- widget-panel.png

Reason for update?"

User: "Design system v2 rollout"

Skill creates:
"Baseline Updated: visual-dashboard-v2

Previous: visual-dashboard-v1 (archived)
Changes: Design system v2 colors
Approved by: [current user]
Files updated: 5

New baseline active for regression tests"
```

### Compare to Baseline

```
User: "Compare current state to baseline"

Skill analyzes:
"Baseline Comparison: functional-auth-v1

Status: ⚠️ Drift Detected

Matches: 43/45 tests (95.6%)
Drifts: 2 tests show differences

Drift 1: user-login
  - New field 'name' in response
  - Severity: Medium
  - Likely intentional (commit abc123)

Drift 2: invalid-email handling
  - Returns 500 instead of 400
  - Severity: High
  - Likely regression (needs fix)

See full report: .aiwg/testing/baseline-comparison.md"
```

## Integration

This skill uses:
- `regression-bisect`: Find commits that broke baseline
- `regression-metrics`: Track baseline drift over time
- `test-coverage`: Ensure baselines cover critical paths
- `project-awareness`: Detect test framework and conventions

## Agent Orchestration

```yaml
agents:
  creation:
    agent: test-engineer
    focus: Baseline capture and validation

  analysis:
    agent: test-architect
    focus: Drift analysis and recommendations

  approval:
    agent: tech-lead
    focus: Baseline update approval
```

## Configuration

### Baseline Storage

```yaml
baseline_config:
  storage_path: .aiwg/testing/baselines/
  structure:
    - functional/
    - visual/
    - performance/
    - api/

  versioning: semantic  # v1, v2, v3
  compression: gzip     # for large baselines
  retention: 10         # keep last 10 versions
```

### Drift Thresholds

```yaml
drift_thresholds:
  functional:
    exact_match_required: true
    allow_new_fields: false  # strict schema matching

  visual:
    pixel_diff_threshold: 0.1%  # 0.1% difference allowed
    ignore_areas: [timestamp, dynamic-content]

  performance:
    p50_tolerance: 10%   # ±10% from baseline
    p95_tolerance: 15%   # ±15% from baseline
    p99_tolerance: 20%   # ±20% from baseline
```

### Approval Requirements

```yaml
approval_config:
  require_approval_for:
    - baseline_creation: true
    - baseline_updates: true
    - baseline_deletion: true

  approvers:
    - tech-lead
    - architect
    - test-lead

  approval_via:
    - pr_review
    - issue_comment
    - command_flag: --approved-by
```

## Output Locations

- Baselines: `.aiwg/testing/baselines/{type}/`
- Manifest: `.aiwg/testing/baselines/manifest.yaml`
- Comparisons: `.aiwg/testing/baseline-comparisons/`
- Archived: `.aiwg/testing/baselines/archive/`

## References

- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/test/baseline-schema.yaml
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/baseline-create.md
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md
Get regression-baseline.

vz-bench-debug

vz-scrape-runner

Think you can beat it?