Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install jmagly-aiwg-plugins-research-skills-research-qualitygit clone https://github.com/jmagly/aiwg.gitcp aiwg/SKILL.MD ~/.claude/skills/jmagly-aiwg-plugins-research-skills-research-quality/SKILL.md---
namespace: aiwg
name: research-quality
platforms: [all]
description: Assess source quality using GRADE methodology
commandHint:
argumentHint: "[REF-XXX] [--update-frontmatter] [--output yaml|markdown]"
category: research-quality
---
# Research Quality Command
Perform systematic GRADE evidence quality assessment on research sources.
## Instructions
When invoked, perform rigorous quality assessment:
1. **Load Source**
- Accept REF-XXX identifier or file path
- Load PDF and finding document
- Extract frontmatter metadata
- Determine source type and baseline quality
2. **Apply GRADE Framework**
**Baseline Quality** (by source type):
- Systematic review / Meta-analysis → HIGH
- Randomized controlled trial → HIGH
- Cohort study → MODERATE
- Case-control study → MODERATE
- Case series → LOW
- Expert opinion → LOW
- Preprint / Blog post → VERY LOW
**Downgrade Factors** (each -1 level):
- Risk of bias (study design flaws, conflicts of interest)
- Inconsistency (conflicting results across studies)
- Indirectness (different population, indirect comparisons)
- Imprecision (small sample size, wide confidence intervals)
- Publication bias (selective reporting)
**Upgrade Factors** (each +1 level):
- Large effect size (strong effect, dose-response relationship)
- Confounding works against finding (makes result conservative)
- Dose-response gradient present
3. **Calculate Final GRADE**
```
Final GRADE = Baseline + Upgrades - Downgrades
HIGH: Strong confidence, unlikely to change with new evidence
MODERATE: Moderate confidence, may change with new evidence
LOW: Limited confidence, likely to change with new evidence
VERY LOW: Very uncertain, any estimate is very uncertain
```
4. **Generate Assessment Report**
- Document baseline quality
- List all downgrade/upgrade factors with justification
- Calculate final GRADE level
- Provide hedging language recommendations
- Assess AIWG applicability
5. **Save Assessment**
- Save to `.aiwg/research/quality-assessments/REF-XXX-assessment.yaml`
- Update frontmatter in finding document if --update-frontmatter
- Log in quality assessment index
6. **Check Existing Citations**
- If --check-citations, scan corpus for citations of this source
- Flag any violations (overclaiming beyond GRADE level)
- Generate remediation suggestions
## Arguments
- `[ref-id or file-path]` - Source to assess (required)
- `--output [yaml|markdown]` - Output format (default: yaml)
- `--update-frontmatter` - Update finding document frontmatter with assessment
- `--check-citations` - Scan corpus for citation policy violations
- `--interactive` - Interactive assessment with prompts for each factor
## Examples
```bash
# Basic quality assessment
/research-quality REF-022
# Assessment with frontmatter update
/research-quality REF-022 --update-frontmatter
# Interactive assessment
/research-quality REF-022 --interactive
# Assessment with citation check
/research-quality REF-022 --check-citations --output markdown
```
## Expected Output
```
Assessing Quality: REF-022 - AutoGen
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Determining baseline
Source Type: arXiv preprint (later published in conference)
Baseline Quality: MODERATE (conference paper)
Note: Upgraded from VERY LOW due to peer review
Step 2: Applying GRADE framework
Downgrade Factors:
[✓] Risk of Bias: -0 (no significant bias detected)
- Study design appropriate
- No apparent conflicts of interest
- Methodology clearly described
[✓] Inconsistency: -0 (single study, no comparison)
- No conflicting results to evaluate
[!] Indirectness: -0 (directly applicable)
- Population: Software development teams
- Intervention: Multi-agent conversation framework
- Direct relevance to AIWG agent orchestration
[!] Imprecision: -1 (limited evaluation scope)
- Small benchmark dataset
- Limited real-world validation
- No confidence intervals reported
[✓] Publication Bias: -0 (mitigated)
- Open preprint, full methodology disclosed
- Negative results discussed
Upgrade Factors:
[!] Large Effect: +0 (moderate effect size)
- Improvements shown but not exceptionally large
- Effect sizes: 10-30% improvement range
[✓] Dose-Response: +0 (not applicable)
- No dose-response relationship to evaluate
[✓] Confounding: +0 (no clear confounding against)
Step 3: Calculating final GRADE
Baseline: MODERATE
Downgrades: -1 (imprecision)
Upgrades: +0
─────────────────────────
Final GRADE: LOW
Step 4: Generating assessment report
✓ Assessment saved: .aiwg/research/quality-assessments/REF-022-assessment.yaml
✓ Frontmatter updated in finding document
✓ Quality index updated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: LOW
Confidence: Limited confidence in effect estimates
Appropriate Hedging Language:
✓ USE: "Limited evidence suggests...", "Preliminary findings indicate..."
✗ AVOID: "Research demonstrates...", "Evidence proves..."
Rationale:
While AutoGen shows promising multi-agent collaboration patterns,
the evidence base is limited to a single study with small-scale
evaluation. Real-world effectiveness at scale requires further
investigation.
AIWG Applicability:
- Patterns are directly applicable to agent orchestration (HIGH)
- Implementation risk is moderate due to limited production validation
- Recommend: Pilot implementation with monitoring
Next Steps:
1. Monitor for follow-up studies strengthening evidence base
2. Plan validation studies within AIWG context
3. Review citations of REF-022 in corpus: /research-quality REF-022 --check-citations
```
## Assessment YAML Output
```yaml
# .aiwg/research/quality-assessments/REF-022-assessment.yaml
ref_id: REF-022
assessment_date: "2026-02-03"
assessor: "quality-agent"
source_metadata:
title: "AutoGen: Enabling Next-Gen LLM Applications..."
source_type: peer_reviewed_conference
year: 2023
grade_assessment:
baseline: MODERATE
baseline_rationale: "Peer-reviewed conference paper"
downgrade_factors:
- factor: imprecision
severity: -1
rationale: "Limited evaluation scope, small benchmarks"
upgrade_factors: []
final_grade: LOW
confidence_statement: "Limited confidence in effect estimates"
hedging_language:
appropriate:
- "Limited evidence suggests"
- "Preliminary findings indicate"
- "Initial research shows"
inappropriate:
- "Research demonstrates"
- "Evidence proves"
- "Studies conclusively show"
aiwg_applicability:
relevance: HIGH
implementation_risk: MODERATE
recommendation: "Pilot implementation with validation"
citation_guidance:
template: |
Research provides preliminary evidence for [claim]
(@.aiwg/research/findings/REF-022-autogen.md), though
broader validation is needed (GRADE: LOW).
```
## Citation Policy Integration
When --check-citations is used:
```
Checking citations of REF-022 across corpus...
Found 8 citations:
✓ COMPLIANT (5):
- .aiwg/architecture/agent-orchestration-sad.md:142
"Research suggests flexible conversation patterns..."
Hedging: APPROPRIATE for LOW quality
- .aiwg/requirements/UC-174-conversable-agent.md:23
"Evidence indicates multi-agent collaboration is feasible..."
Hedging: APPROPRIATE for LOW quality
✗ VIOLATIONS (3):
- docs/agent-framework.md:78
"Research demonstrates significant improvements..."
Hedging: TOO STRONG for LOW quality
Suggestion: Change to "Limited evidence suggests..."
- .aiwg/architecture/adr-012-agent-protocol.md:45
"Studies prove that conversation patterns enable..."
Hedging: TOO STRONG for LOW quality
Suggestion: Change to "Preliminary findings indicate..."
Remediation script generated:
.aiwg/research/quality-assessments/REF-022-violations.sh
```
## Interactive Mode
When --interactive is used, prompts for each factor:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GRADE Assessment: REF-022 (Interactive Mode)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Baseline Quality: MODERATE (conference paper)
────────────────────────────────────────────────────────────────────
Factor 1: Risk of Bias
────────────────────────────────────────────────────────────────────
Evaluate study design quality, conflicts of interest, methodology clarity.
Downgrade by 1 level? [y/N]: n
Rationale: Study design appropriate, no COI detected
────────────────────────────────────────────────────────────────────
Factor 2: Inconsistency
────────────────────────────────────────────────────────────────────
Evaluate consistency across studies (if multiple).
Downgrade by 1 level? [y/N]: n
Rationale: Single study, no comparison available
[... continues for all factors ...]
```
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/quality-agent.md - Quality Assessment Agent
- @$AIWG_ROOT/src/research/services/quality-service.ts - GRADE implementation
- @.aiwg/research/docs/grade-assessment-guide.md - GRADE methodology
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/quality-dimensions.yaml - Quality schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/citation-policy.md - Hedging language requirements