Test Design and Risk Assessment

Workflow ID: _bmad/bmm/testarch/test-design Version: 4.0 (BMad v6)

Overview

Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow operates in two modes:

System-Level Mode (Phase 3): Testability review of architecture before solutioning gate check
Epic-Level Mode (Phase 4): Per-epic test planning with risk assessment (current behavior)

The workflow auto-detects which mode to use based on project phase.

Preflight: Detect Mode and Load Context

Critical: Determine mode before proceeding.

Mode Detection

Check for sprint-status.yaml
- If {output_folder}/bmm-sprint-status.yaml exists → Epic-Level Mode (Phase 4)
- If NOT exists → Check workflow status
Check workflow-status.yaml
- Read {output_folder}/bmm-workflow-status.yaml
- If implementation-readiness: required or implementation-readiness: recommended → System-Level Mode (Phase 3)
- Otherwise → Epic-Level Mode (Phase 4 without sprint status yet)
Mode-Specific Requirements

System-Level Mode (Phase 3 - Testability Review):

✅ Architecture document exists (architecture.md or tech-spec)
✅ PRD exists with functional and non-functional requirements
✅ Epics documented (epics.md)
⚠️ Output: {output_folder}/test-design-system.md

Epic-Level Mode (Phase 4 - Per-Epic Planning):

✅ Story markdown with acceptance criteria available
✅ PRD or epic documentation exists for context
✅ Architecture documents available (optional but recommended)
✅ Requirements are clear and testable
⚠️ Output: {output_folder}/test-design-epic-{epic_num}.md

Halt Condition: If mode cannot be determined or required files missing, HALT and notify user with missing prerequisites.

Step 1: Load Context (Mode-Aware)

Mode-Specific Loading:

System-Level Mode (Phase 3)

Read Architecture Documentation
- Load architecture.md or tech-spec (REQUIRED)
- Load PRD.md for functional and non-functional requirements
- Load epics.md for feature scope
- Identify technology stack decisions (frameworks, databases, deployment targets)
- Note integration points and external system dependencies
- Extract NFR requirements (performance SLOs, security requirements, etc.)
Check Playwright Utils Flag

Read {config_source} and check config.tea_use_playwright_utils.

If true, note that @seontechnologies/playwright-utils provides utilities for test implementation. Reference in test design where relevant.

Load Knowledge Base Fragments (System-Level)

Critical: Consult {project-root}/_bmad/bmm/testarch/tea-index.csv to load:

nfr-criteria.md - NFR validation approach (security, performance, reliability, maintainability)
test-levels-framework.md - Test levels strategy guidance
risk-governance.md - Testability risk identification
test-quality.md - Quality standards and Definition of Done

Analyze Existing Test Setup (if brownfield)
- Search for existing test directories
- Identify current test framework (if any)
- Note testability concerns in existing codebase

Epic-Level Mode (Phase 4)

Read Requirements Documentation
- Load PRD.md for high-level product requirements
- Read epics.md or specific epic for feature scope
- Read story markdown for detailed acceptance criteria
- Identify all testable requirements
Load Architecture Context
- Read architecture.md for system design
- Read tech-spec for implementation details
- Read test-design-system.md (if exists from Phase 3)
- Identify technical constraints and dependencies
- Note integration points and external systems
Analyze Existing Test Coverage
- Search for existing test files in {test_dir}
- Identify coverage gaps
- Note areas with insufficient testing
- Check for flaky or outdated tests
Load Knowledge Base Fragments (Epic-Level)

Critical: Consult {project-root}/_bmad/bmm/testarch/tea-index.csv to load:

risk-governance.md - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)
probability-impact.md - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)
test-levels-framework.md - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)
test-priorities-matrix.md - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)

Halt Condition (Epic-Level only): If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Epic-level test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"

Step 1.5: System-Level Testability Review (Phase 3 Only)

Skip this step if Epic-Level Mode. This step only executes in System-Level Mode.

Actions

Review Architecture for Testability

Evaluate architecture against these criteria:

Controllability:

Can we control system state for testing? (API seeding, factories, database reset)
Are external dependencies mockable? (interfaces, dependency injection)
Can we trigger error conditions? (chaos engineering, fault injection)

Observability:

Can we inspect system state? (logging, metrics, traces)
Are test results deterministic? (no race conditions, clear success/failure)
Can we validate NFRs? (performance metrics, security audit logs)

Reliability:

Are tests isolated? (parallel-safe, stateless, cleanup discipline)
Can we reproduce failures? (deterministic waits, HAR capture, seed data)
Are components loosely coupled? (mockable, testable boundaries)

Identify Architecturally Significant Requirements (ASRs)

From PRD NFRs and architecture decisions, identify quality requirements that:

Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
Require special test environments (e.g., "Multi-region deployment" → regional test instances)

Score each ASR using risk matrix (probability × impact).

Define Test Levels Strategy

Based on architecture (mobile, web, API, microservices, monolith):

Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
Identify test environment needs (local, staging, ephemeral, production-like)
Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)

Assess NFR Testing Approach

For each NFR category:

Security: Auth/authz tests, OWASP validation, secret handling (Playwright E2E + security tools)
Performance: Load/stress/spike testing with k6, SLO/SLA thresholds
Reliability: Error handling, retries, circuit breakers, health checks (Playwright + API tests)
Maintainability: Coverage targets, code quality gates, observability validation

Flag Testability Concerns

Identify architecture decisions that harm testability:

❌ Tight coupling (no interfaces, hard dependencies)
❌ No dependency injection (can't mock external services)
❌ Hardcoded configurations (can't test different envs)
❌ Missing observability (can't validate NFRs)
❌ Stateful designs (can't parallelize tests)

Critical: If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.

Output System-Level Test Design

Write to {output_folder}/test-design-system.md containing:

   # System-Level Test Design

   ## Testability Assessment

   - Controllability: [PASS/CONCERNS/FAIL with details]
   - Observability: [PASS/CONCERNS/FAIL with details]
   - Reliability: [PASS/CONCERNS/FAIL with details]

   ## Architecturally Significant Requirements (ASRs)

   [Risk-scored quality requirements]

   ## Test Levels Strategy

   - Unit: [X%] - [Rationale]
   - Integration: [Y%] - [Rationale]
   - E2E: [Z%] - [Rationale]

   ## NFR Testing Approach

   - Security: [Approach with tools]
   - Performance: [Approach with tools]
   - Reliability: [Approach with tools]
   - Maintainability: [Approach with tools]

   ## Test Environment Requirements

   [Infrastructure needs based on deployment architecture]

   ## Testability Concerns (if any)

   [Blockers or concerns that should inform solutioning gate check]

   ## Recommendations for Sprint 0

   [Specific actions for *framework and *ci workflows]

After System-Level Mode: Skip to Step 4 (Generate Deliverables) - Steps 2-3 are epic-level only.

Step 1.6: Exploratory Mode Selection (Epic-Level Only)

Actions

Detect Planning Mode

Determine mode based on context:

Requirements-Based Mode (DEFAULT):

Have clear story/PRD with acceptance criteria
Uses: Existing workflow (Steps 2-4)
Appropriate for: Documented features, greenfield projects

Exploratory Mode (OPTIONAL - Brownfield):

Missing/incomplete requirements AND brownfield application exists
Uses: UI exploration to discover functionality
Appropriate for: Undocumented brownfield apps, legacy systems

Requirements-Based Mode (DEFAULT - Skip to Step 2)

If requirements are clear:

Continue with existing workflow (Step 2: Assess and Classify Risks)
Use loaded requirements from Step 1
Proceed with risk assessment based on documented requirements

Exploratory Mode (OPTIONAL - Brownfield Apps)

If exploring brownfield application:

A. Check MCP Availability

If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:

Use MCP-assisted exploration (Step 3.B)

If MCP unavailable OR config.tea_use_mcp_enhancements is false:

Use manual exploration fallback (Step 3.C)

B. MCP-Assisted Exploration (If MCP Tools Available)

Use Playwright MCP browser tools to explore UI:

Setup:

   1. Use planner_setup_page to initialize browser
   2. Navigate to {exploration_url}
   3. Capture initial state with browser_snapshot

Exploration Process:

   4. Use browser_navigate to explore different pages
   5. Use browser_click to interact with buttons, links, forms
   6. Use browser_hover to reveal hidden menus/tooltips
   7. Capture browser_snapshot at each significant state
   8. Take browser_screenshot for documentation
   9. Monitor browser_console_messages for JavaScript errors
   10. Track browser_network_requests to identify API calls
   11. Map user flows and interactive elements
   12. Document discovered functionality

Discovery Documentation:

Create list of discovered features (pages, workflows, forms)
Identify user journeys (navigation paths)
Map API endpoints (from network requests)
Note error states (from console messages)
Capture screenshots for visual reference

Convert to Test Scenarios:

Transform discoveries into testable requirements
Prioritize based on user flow criticality
Identify risks from discovered functionality
Continue with Step 2 (Assess and Classify Risks) using discovered requirements

C. Manual Exploration Fallback (If MCP Unavailable)

If Playwright MCP is not available:

Notify User:

   Exploratory mode enabled but Playwright MCP unavailable.

   **Manual exploration required:**

   1. Open application at: {exploration_url}
   2. Explore all pages, workflows, and features
   3. Document findings in markdown:
      - List of pages/features discovered
      - User journeys identified
      - API endpoints observed (DevTools Network tab)
      - JavaScript errors noted (DevTools Console)
      - Critical workflows mapped

   4. Provide exploration findings to continue workflow

   **Alternative:** Disable exploratory_mode and provide requirements documentation

Wait for user to provide exploration findings, then:

Parse user-provided discovery documentation
Convert to testable requirements
Continue with Step 2 (risk assessment)

Proceed to Risk Assessment

After mode selection (Requirements-Based OR Exploratory):

Continue to Step 2: Assess and Classify Risks
Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)

Step 2: Assess and Classify Risks

Actions

Identify Genuine Risks

Filter requirements to isolate actual risks (not just features):

Unresolved technical gaps
Security vulnerabilities
Performance bottlenecks
Data loss or corruption potential
Business impact failures
Operational deployment issues

Classify Risks by Category

Use these standard risk categories:

TECH (Technical/Architecture):

Architecture flaws
Integration failures
Scalability issues
Technical debt

SEC (Security):

Missing access controls
Authentication bypass
Data exposure
Injection vulnerabilities

PERF (Performance):

SLA violations
Response time degradation
Resource exhaustion
Scalability limits

DATA (Data Integrity):

Data loss
Data corruption
Inconsistent state
Migration failures

BUS (Business Impact):

User experience degradation
Business logic errors
Revenue impact
Compliance violations

OPS (Operations):

Deployment failures
Configuration errors
Monitoring gaps
Rollback issues

Score Risk Probability

Rate likelihood (1-3):

1 (Unlikely): <10% chance, edge case
2 (Possible): 10-50% chance, known scenario
3 (Likely): >50% chance, common occurrence

Score Risk Impact

Rate severity (1-3):

1 (Minor): Cosmetic, workaround exists, limited users
2 (Degraded): Feature impaired, workaround difficult, affects many users
3 (Critical): System failure, data loss, no workaround, blocks usage

Calculate Risk Score

Risk Score = Probability × Impact

Scores:
1-2: Low risk (monitor)
3-4: Medium risk (plan mitigation)
6-9: High risk (immediate mitigation required)

Highlight High-Priority Risks

Flag all risks with score ≥6 for immediate attention.

Request Clarification

If evidence is missing or assumptions required:

Document assumptions clearly
Request user clarification
Do NOT speculate on business impact

Plan Mitigations

For each high-priority risk:

Define mitigation strategy
Assign owner (dev, QA, ops)
Set timeline
Update residual risk expectation

Step 3: Design Test Coverage

Actions

Break Down Acceptance Criteria

Convert each acceptance criterion into atomic test scenarios:

One scenario per testable behavior
Scenarios are independent
Scenarios are repeatable
Scenarios tie back to risk mitigations

Select Appropriate Test Levels

Knowledge Base Reference: test-levels-framework.md

Map requirements to optimal test levels (avoid duplication):

E2E (End-to-End):

Critical user journeys
Multi-system integration
Production-like environment
Highest confidence, slowest execution

API (Integration):

Service contracts
Business logic validation
Fast feedback
Good for complex scenarios

Component:

UI component behavior
Interaction testing
Visual regression
Fast, isolated

Unit:

Business logic
Edge cases
Error handling
Fastest, most granular

Avoid duplicate coverage: Don't test same behavior at multiple levels unless necessary.

Assign Priority Levels

Knowledge Base Reference: test-priorities-matrix.md

P0 (Critical):

Blocks core user journey
High-risk areas (score ≥6)
Revenue-impacting
Security-critical
Run on every commit

P1 (High):

Important user features
Medium-risk areas (score 3-4)
Common workflows
Run on PR to main

P2 (Medium):

Secondary features
Low-risk areas (score 1-2)
Edge cases
Run nightly or weekly

P3 (Low):

Nice-to-have
Exploratory
Performance benchmarks
Run on-demand

Outline Data and Tooling Prerequisites

For each test scenario, identify:

Test data requirements (factories, fixtures)
External services (mocks, stubs)
Environment setup
Tools and dependencies

Define Execution Order

Recommend test execution sequence:

Smoke tests (P0 subset, <5 min)
P0 tests (critical paths, <10 min)
P1 tests (important features, <30 min)
P2/P3 tests (full regression, <60 min)

Step 4: Generate Deliverables

Actions

Create Risk Assessment Matrix

Use template structure:

   | Risk ID | Category | Description | Probability | Impact | Score | Mitigation      |
   | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
   | R-001   | SEC      | Auth bypass | 2           | 3      | 6     | Add authz check |

Create Coverage Matrix

| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
| ----------- | ---------- | -------- | --------- | ---------- | ----- |
| Login flow  | E2E        | P0       | R-001     | 3          | QA    |

Document Execution Order

### Smoke Tests (<5 min)

- Login successful
- Dashboard loads

### P0 Tests (<10 min)

- [Full P0 list]

### P1 Tests (<30 min)

- [Full P1 list]

Include Resource Estimates

### Test Effort Estimates

- P0 scenarios: 15 tests × 2 hours = 30 hours
- P1 scenarios: 25 tests × 1 hour = 25 hours
- P2 scenarios: 40 tests × 0.5 hour = 20 hours
- **Total:** 75 hours (~10 days)

Add Gate Criteria

### Quality Gate Criteria

- All P0 tests pass (100%)
- P1 tests pass rate ≥95%
- No high-risk (score ≥6) items unmitigated
- Test coverage ≥80% for critical paths

Write to Output File

Save to {output_folder}/test-design-epic-{epic_num}.md using template structure.

Important Notes

Risk Category Definitions

TECH (Technical/Architecture):

Architecture flaws or technical debt
Integration complexity
Scalability concerns

SEC (Security):

Missing security controls
Authentication/authorization gaps
Data exposure risks

PERF (Performance):

SLA risk or performance degradation
Resource constraints
Scalability bottlenecks

DATA (Data Integrity):

Data loss or corruption potential
State consistency issues
Migration risks

BUS (Business Impact):

User experience harm
Business logic errors
Revenue or compliance impact

OPS (Operations):

Deployment or runtime failures
Configuration issues
Monitoring/observability gaps

Risk Scoring Methodology

Probability × Impact = Risk Score

Examples:

High likelihood (3) × Critical impact (3) = Score 9 (highest priority)
Possible (2) × Critical (3) = Score 6 (high priority threshold)
Unlikely (1) × Minor (1) = Score 1 (low priority)

Threshold: Scores ≥6 require immediate mitigation.

Test Level Selection Strategy

Avoid duplication:

Don't test same behavior at E2E and API level
Use E2E for critical paths only
Use API tests for complex business logic
Use unit tests for edge cases

Tradeoffs:

E2E: High confidence, slow execution, brittle
API: Good balance, fast, stable
Unit: Fastest feedback, narrow scope

Priority Assignment Guidelines

P0 criteria (all must be true):

Blocks core functionality
High-risk (score ≥6)
No workaround exists
Affects majority of users

P1 criteria:

Important feature
Medium risk (score 3-5)
Workaround exists but difficult

P2/P3: Everything else, prioritized by value

Knowledge Base Integration

Core Fragments (Auto-loaded in Step 1):

risk-governance.md - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)
probability-impact.md - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)
test-levels-framework.md - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)
test-priorities-matrix.md - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)

Reference for Test Planning:

selective-testing.md - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
fixture-architecture.md - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)

Manual Reference (Optional):

Use tea-index.csv to find additional specialized fragments as needed

Evidence-Based Assessment

Critical principle: Base risk assessment on evidence, not speculation.

Evidence sources:

PRD and user research
Architecture documentation
Historical bug data
User feedback
Security audit results

Avoid:

Guessing business impact
Assuming user behavior
Inventing requirements

When uncertain: Document assumptions and request clarification from user.

Output Summary

After completing this workflow, provide a summary:

## Test Design Complete

**Epic**: {epic_num}
**Scope**: {design_level}

**Risk Assessment**:

- Total risks identified: {count}
- High-priority risks (≥6): {high_count}
- Categories: {categories}

**Coverage Plan**:

- P0 scenarios: {p0_count} ({p0_hours} hours)
- P1 scenarios: {p1_count} ({p1_hours} hours)
- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
- **Total effort**: {total_hours} hours (~{total_days} days)

**Test Levels**:

- E2E: {e2e_count}
- API: {api_count}
- Component: {component_count}
- Unit: {unit_count}

**Quality Gate Criteria**:

- P0 pass rate: 100%
- P1 pass rate: ≥95%
- High-risk mitigations: 100%
- Coverage: ≥80%

**Output File**: {output_file}

**Next Steps**:

1. Review risk assessment with team
2. Prioritize mitigation for high-risk items (score ≥6)
3. Run `*atdd` to generate failing tests for P0 scenarios (separate workflow; not auto-run by `*test-design`)
4. Allocate resources per effort estimates
5. Set up test data factories and fixtures

Validation

After completing all steps, verify:

Risk assessment complete with all categories
All risks scored (probability × impact)
High-priority risks (≥6) flagged
Coverage matrix maps requirements to test levels
Priority levels assigned (P0-P3)
Execution order defined
Resource estimates provided
Quality gate criteria defined
Output file created and formatted correctly

Refer to checklist.md for comprehensive validation criteria.

instructions.md 23 KB تاريخچه خام

Test Design and Risk Assessment

Overview

Preflight: Detect Mode and Load Context

Mode Detection

Step 1: Load Context (Mode-Aware)

System-Level Mode (Phase 3)

Epic-Level Mode (Phase 4)

Step 1.5: System-Level Testability Review (Phase 3 Only)

Actions

Step 1.6: Exploratory Mode Selection (Epic-Level Only)

Actions

Step 2: Assess and Classify Risks

Actions

Step 3: Design Test Coverage

Actions

Step 4: Generate Deliverables

Actions

Important Notes

Risk Category Definitions

Risk Scoring Methodology

Test Level Selection Strategy

Priority Assignment Guidelines

Knowledge Base Integration

Evidence-Based Assessment

Output Summary

Validation

instructions.md 23 KB

تاريخچه خام