Full Project Scan Instructions

This workflow performs complete project documentation (Steps 1-12) Called by: document-project/instructions.md router Handles: initial_scan and full_rescan modes

DATA LOADING STRATEGY - Understanding the Documentation Requirements System:

Display explanation to user:

How Project Type Detection Works:

This workflow uses a single comprehensive CSV file to intelligently document your project:

documentation-requirements.csv ({documentation_requirements_csv})

Contains 12 project types (web, mobile, backend, cli, library, desktop, game, data, extension, infra, embedded)
24-column schema combining project type detection AND documentation requirements
Detection columns: project_type_id, key_file_patterns (used to identify project type from codebase)
Requirement columns: requires_api_scan, requires_data_models, requires_ui_components, etc.
Pattern columns: critical_directories, test_file_patterns, config_patterns, etc.
Acts as a "scan guide" - tells the workflow WHERE to look and WHAT to document
Example: For project_type_id="web", key_file_patterns includes "package.json;tsconfig.json;*.config.js" and requires_api_scan=true

When Documentation Requirements are Loaded:

Fresh Start (initial_scan): Load all 12 rows → detect type using key_file_patterns → use that row's requirements
Resume: Load ONLY the doc requirements row(s) for cached project_type_id(s)
Full Rescan: Same as fresh start (may re-detect project type)
Deep Dive: Load ONLY doc requirements for the part being deep-dived

Now loading documentation requirements data for fresh start...

Load documentation-requirements.csv from: {documentation_requirements_csv} Store all 12 rows indexed by project_type_id for project detection and requirements lookup Display: "Loaded documentation requirements for 12 project types (web, mobile, backend, cli, library, desktop, game, data, extension, infra, embedded)"

Display: "✓ Documentation requirements loaded successfully. Ready to begin project analysis."

Check if {output_folder}/index.md exists

Read existing index.md to extract metadata (date, project structure, parts count) Store as {{existing_doc_date}}, {{existing_structure}}

I found existing documentation generated on {{existing_doc_date}}.

What would you like to do?

Re-scan entire project - Update all documentation with latest changes
Deep-dive into specific area - Generate detailed documentation for a particular feature/module/folder
Cancel - Keep existing documentation as-is

Your choice [1/2/3]:

<action>Set workflow_mode = "full_rescan"</action>
<action>Continue to scan level selection below</action>

<action>Set workflow_mode = "deep_dive"</action>
<action>Set scan_level = "exhaustive"</action>
<action>Initialize state file with mode=deep_dive, scan_level=exhaustive</action>
<action>Jump to Step 13</action>

<action>Display message: "Keeping existing documentation. Exiting workflow."</action>
<action>Exit workflow</action>

Set workflow_mode = "initial_scan" Continue to scan level selection below

Select Scan Level

Choose your scan depth level:

1. Quick Scan (2-5 minutes) [DEFAULT]

Pattern-based analysis without reading source files
Scans: Config files, package manifests, directory structure
Best for: Quick project overview, initial understanding
File reading: Minimal (configs, README, package.json, etc.)

2. Deep Scan (10-30 minutes)

Reads files in critical directories based on project type
Scans: All critical paths from documentation requirements
Best for: Comprehensive documentation for brownfield PRD
File reading: Selective (key files in critical directories)

3. Exhaustive Scan (30-120 minutes)

Reads ALL source files in project
Scans: Every source file (excludes node_modules, dist, build)
Best for: Complete analysis, migration planning, detailed audit
File reading: Complete (all source files)

Your choice 1/2/3:

<action>Set scan_level = "quick"</action>
<action>Display: "Using Quick Scan (pattern-based, no source file reading)"</action>

<action>Set scan_level = "deep"</action>
<action>Display: "Using Deep Scan (reading critical files per project type)"</action>

<action>Set scan_level = "exhaustive"</action>
<action>Display: "Using Exhaustive Scan (reading all source files)"</action>

Initialize state file: {output_folder}/project-scan-report.json Every time you touch the state file, record: step id, human-readable summary (what you actually did), precise timestamp, and any outputs written. Vague phrases are unacceptable. Write initial state: { "workflow_version": "1.2.0", "timestamps": {"started": "{{current_timestamp}}", "last_updated": "{{current_timestamp}}"}, "mode": "{{workflow_mode}}", "scan_level": "{{scan_level}}", "project_root": "{{project_root_path}}", "output_folder": "{{output_folder}}", "completed_steps": [], "current_step": "step_1", "findings": {}, "outputs_generated": ["project-scan-report.json"], "resume_instructions": "Starting from step 1" } Continue with standard workflow from Step 1

Ask user: "What is the root directory of the project to document?" (default: current working directory) Store as {{project_root_path}}

Scan {{project_root_path}} for key indicators:

Directory structure (presence of client/, server/, api/, src/, app/, etc.)
Key files (package.json, go.mod, requirements.txt, etc.)
Technology markers matching detection_keywords from project-types.csv

Detect if project is:

Monolith: Single cohesive codebase
Monorepo: Multiple parts in one repository
Multi-part: Separate client/server or similar architecture

List detected parts with their paths I detected multiple parts in this project: {{detected_parts_list}}

Is this correct? Should I document each part separately? [y/n]

Set repository_type = "monorepo" or "multi-part" For each detected part: - Identify root path - Run project type detection using key_file_patterns from documentation-requirements.csv - Store as part in project_parts array

Ask user to specify correct parts and their paths

Set repository_type = "monolith" Create single part in project_parts array with root_path = {{project_root_path}} Run project type detection using key_file_patterns from documentation-requirements.csv

For each part, match detected technologies and file patterns against key_file_patterns column in documentation-requirements.csv Assign project_type_id to each part Load corresponding documentation_requirements row for each part

I've classified this project: {{project_classification_summary}}

Does this look correct? [y/n/edit]

project_structure project_parts_metadata

IMMEDIATELY update state file with step completion:

Add to completed_steps: {"step": "step_1", "status": "completed", "timestamp": "{{now}}", "summary": "Classified as {{repository_type}} with {{parts_count}} parts"}
Update current_step = "step_2"
Update findings.project_classification with high-level summary only
CACHE project_type_id(s): Add project_types array: [{"part_id": "{{part_id}}", "project_type_id": "{{project_type_id}}", "display_name": "{{display_name}}"}]
This cached data prevents reloading all CSV files on resume - we can load just the needed documentation_requirements row(s)
Update last_updated timestamp
Write state file

PURGE detailed scan results from memory, keep only summary: "{{repository_type}}, {{parts_count}} parts, {{primary_tech}}"

For each part, scan for existing documentation using patterns:

README.md, README.rst, README.txt
CONTRIBUTING.md, CONTRIBUTING.rst
ARCHITECTURE.md, ARCHITECTURE.txt, docs/architecture/
DEPLOYMENT.md, DEPLOY.md, docs/deployment/
API.md, docs/api/
Any files in docs/, documentation/, .github/ folders

Create inventory of existing_docs with:

File path
File type (readme, architecture, api, etc.)
Which part it belongs to (if multi-part)

I found these existing documentation files: {{existing_docs_list}}

Are there any other important documents or key areas I should focus on while analyzing this project? [Provide paths or guidance, or type 'none']

Store user guidance as {{user_context}}

existing_documentation_inventory user_provided_context

Update state file:

Add to completed_steps: {"step": "step_2", "status": "completed", "timestamp": "{{now}}", "summary": "Found {{existing_docs_count}} existing docs"}
Update current_step = "step_3"
Update last_updated timestamp

PURGE detailed doc contents from memory, keep only: "{{existing_docs_count}} docs found"

For each part in project_parts:

Load key_file_patterns from documentation_requirements
Scan part root for these patterns
Parse technology manifest files (package.json, go.mod, requirements.txt, etc.)
Extract: framework, language, version, database, dependencies
Build technology_table with columns: Category, Technology, Version, Justification

Determine architecture pattern based on detected tech stack:

Use project_type_id as primary indicator (e.g., "web" → layered/component-based, "backend" → service/API-centric)
Consider framework patterns (e.g., React → component hierarchy, Express → middleware pipeline)
Note architectural style in technology table
Store as {{architecture_pattern}} for each part

technology_stack architecture_patterns

Update state file:

Add to completed_steps: {"step": "step_3", "status": "completed", "timestamp": "{{now}}", "summary": "Tech stack: {{primary_framework}}"}
Update current_step = "step_4"
Update findings.technology_stack with summary per part
Update last_updated timestamp

PURGE detailed tech analysis from memory, keep only: "{{framework}} on {{language}}"

BATCHING STRATEGY FOR DEEP/EXHAUSTIVE SCANS

This step requires file reading. Apply batching strategy:

Identify subfolders to process based on: - scan_level == "deep": Use critical_directories from documentation_requirements - scan_level == "exhaustive": Get ALL subfolders recursively (excluding node_modules, .git, dist, build, coverage)

For each subfolder to scan: 1. Read all files in subfolder (consider file size - use judgment for files >5000 LOC) 2. Extract required information based on conditional flags below 3. IMMEDIATELY write findings to appropriate output file 4. Validate written document (section-level validation) 5. Update state file with batch completion 6. PURGE detailed findings from context, keep only 1-2 sentence summary 7. Move to next subfolder

Track batches in state file: findings.batches_completed: [ {"path": "{{subfolder_path}}", "files_scanned": {{count}}, "summary": "{{brief_summary}}"} ]

Use pattern matching only - do NOT read source files Use glob/grep to identify file locations and patterns Extract information from filenames, directory structure, and config files only

For each part, check documentation_requirements boolean flags and execute corresponding scans:

Scan for API routes and endpoints using integration_scan_patterns Look for: controllers/, routes/, api/, handlers/, endpoints/

<action>Use glob to find route files, extract patterns from filenames and folder structure</action>

<action>Read files in batches (one subfolder at a time)</action>
<action>Extract: HTTP methods, paths, request/response types from actual code</action>

Build API contracts catalog IMMEDIATELY write to: {output_folder}/api-contracts-{part_id}.md Validate document has all required sections Update state file with output generated PURGE detailed API data, keep only: "{{api_count}} endpoints documented" api_contracts*{part_id}

Scan for data models using schema_migration_patterns Look for: models/, schemas/, entities/, migrations/, prisma/, ORM configs

<action>Identify schema files via glob, parse migration file names for table discovery</action>

<action>Read model files in batches (one subfolder at a time)</action>
<action>Extract: table names, fields, relationships, constraints from actual code</action>

Build database schema documentation IMMEDIATELY write to: {output_folder}/data-models-{part_id}.md Validate document completeness Update state file with output generated PURGE detailed schema data, keep only: "{{table_count}} tables documented" data_models*{part_id}

Analyze state management patterns Look for: Redux, Context API, MobX, Vuex, Pinia, Provider patterns Identify: stores, reducers, actions, state structure state_managementpatterns{part_id}

Inventory UI component library Scan: components/, ui/, widgets/, views/ folders Categorize: Layout, Form, Display, Navigation, etc. Identify: Design system, component patterns, reusable elements ui_componentinventory{part_id}

Look for hardware schematics using hardware_interface_patterns This appears to be an embedded/hardware project. Do you have:

Pinout diagrams
Hardware schematics
PCB layouts
Hardware documentation

If yes, please provide paths or links. [Provide paths or type 'none']

Store hardware docs references hardwaredocumentation{part_id}

Scan and catalog assets using asset_patterns Categorize by: Images, Audio, 3D Models, Sprites, Textures, etc. Calculate: Total size, file counts, formats used assetinventory{part_id}

Scan for additional patterns based on doc requirements:

config_patterns → Configuration management
auth_security_patterns → Authentication/authorization approach
entry_point_patterns → Application entry points and bootstrap
shared_code_patterns → Shared libraries and utilities
async_event_patterns → Event-driven architecture
ci_cd_patterns → CI/CD pipeline details
localization_patterns → i18n/l10n support

Apply scan_level strategy to each pattern scan (quick=glob only, deep/exhaustive=read files)

comprehensiveanalysis{part_id}

Update state file:

Add to completed_steps: {"step": "step_4", "status": "completed", "timestamp": "{{now}}", "summary": "Conditional analysis complete, {{files_generated}} files written"}
Update current_step = "step_5"
Update last_updated timestamp
List all outputs_generated

PURGE all detailed scan results from context. Keep only summaries:

"APIs: {{api_count}} endpoints"
"Data: {{table_count}} tables"
"Components: {{component_count}} components"

For each part, generate complete directory tree using critical_directories from doc requirements

Annotate the tree with:

Purpose of each critical directory
Entry points marked
Key file locations highlighted
Integration points noted (for multi-part projects)

Show how parts are organized and where they interface

Create formatted source tree with descriptions:

project-root/
├── client/          # React frontend (Part: client)
│   ├── src/
│   │   ├── components/  # Reusable UI components
│   │   ├── pages/       # Route-based pages
│   │   └── api/         # API client layer → Calls server/
├── server/          # Express API backend (Part: api)
│   ├── src/
│   │   ├── routes/      # REST API endpoints
│   │   ├── models/      # Database models
│   │   └── services/    # Business logic

source_tree_analysis critical_folders_summary

IMMEDIATELY write source-tree-analysis.md to disk Validate document structure Update state file:

Add to completed_steps: {"step": "step_5", "status": "completed", "timestamp": "{{now}}", "summary": "Source tree documented"}
Update current_step = "step_6"
Add output: "source-tree-analysis.md"

Scan for development setup using key_file_patterns and existing docs:

Prerequisites (Node version, Python version, etc.)
Installation steps (npm install, etc.)
Environment setup (.env files, config)
Build commands (npm run build, make, etc.)
Run commands (npm start, go run, etc.)
Test commands using test_file_patterns

Look for deployment configuration using ci_cd_patterns:

Dockerfile, docker-compose.yml
Kubernetes configs (k8s/, helm/)
CI/CD pipelines (.github/workflows/, .gitlab-ci.yml)
Deployment scripts
Infrastructure as Code (terraform/, pulumi/)

Extract contribution guidelines:

- Code style rules
- PR process
- Commit conventions
- Testing requirements

development_instructions deployment_configuration contribution_guidelines

Update state file:

Add to completed_steps: {"step": "step_6", "status": "completed", "timestamp": "{{now}}", "summary": "Dev/deployment guides written"}
Update current_step = "step_7"
Add generated outputs to list

Analyze how parts communicate:

Scan integration_scan_patterns across parts
Identify: REST calls, GraphQL queries, gRPC, message queues, shared databases
Document: API contracts between parts, data flow, authentication flow

Create integration_points array with:

from: source part
to: target part
type: REST API, GraphQL, gRPC, Event Bus, etc.
details: Endpoints, protocols, data formats

IMMEDIATELY write integration-architecture.md to disk Validate document completeness

integration_architecture

Update state file:

Add to completed_steps: {"step": "step_7", "status": "completed", "timestamp": "{{now}}", "summary": "Integration architecture documented"}
Update current_step = "step_8"

For each part in project_parts:

Use matched architecture template from Step 3 as base structure
Fill in all sections with discovered information:
- Executive Summary
- Technology Stack (from Step 3)
- Architecture Pattern (from registry match)
- Data Architecture (from Step 4 data models scan)
- API Design (from Step 4 API scan if applicable)
- Component Overview (from Step 4 component scan if applicable)
- Source Tree (from Step 5)
- Development Workflow (from Step 6)
- Deployment Architecture (from Step 6)
- Testing Strategy (from test patterns)

Generate: architecture.md (no part suffix)

Generate: architecture-{part_id}.md for each part

For each architecture file generated:

IMMEDIATELY write architecture file to disk
Validate against architecture template schema
Update state file with output
PURGE detailed architecture from context, keep only: "Architecture for {{part_id}} written"

architecture_document

Update state file:

Add to completed_steps: {"step": "step_8", "status": "completed", "timestamp": "{{now}}", "summary": "Architecture docs written for {{parts_count}} parts"}
Update current_step = "step_9"

Generate project-overview.md with:

Project name and purpose (from README or user input)
Executive summary
Tech stack summary table
Architecture type classification
Repository structure (monolith/monorepo/multi-part)
Links to detailed docs

Generate source-tree-analysis.md with:

Full annotated directory tree from Step 5
Critical folders explained
Entry points documented
Multi-part structure (if applicable)

IMMEDIATELY write project-overview.md to disk Validate document sections

Generate source-tree-analysis.md (if not already written in Step 5) IMMEDIATELY write to disk and validate

Generate component-inventory.md (or per-part versions) with:

All discovered components from Step 4
Categorized by type
Reusable vs specific components
Design system elements (if found)

Generate development-guide.md (or per-part versions) with:

Prerequisites and dependencies
Environment setup instructions
Local development commands
Build process
Testing approach and commands
Common development tasks

Generate deployment-guide.md with:

- Infrastructure requirements
- Deployment process
- Environment configuration
- CI/CD pipeline details