# File Utilities ## Principle Read and validate files (CSV, XLSX, PDF, ZIP) with automatic parsing, type-safe results, and download handling. Simplify file operations in Playwright tests with built-in format support and validation helpers. ## Rationale Testing file operations in Playwright requires boilerplate: - Manual download handling - External parsing libraries for each format - No validation helpers - Type-unsafe results - Repetitive path handling The `file-utils` module provides: - **Auto-parsing**: CSV, XLSX, PDF, ZIP automatically parsed - **Download handling**: Single function for UI or API-triggered downloads - **Type-safe**: TypeScript interfaces for parsed results - **Validation helpers**: Row count, header checks, content validation - **Format support**: Multiple sheet support (XLSX), text extraction (PDF), archive extraction (ZIP) ## Why Use This Instead of Vanilla Playwright? | Vanilla Playwright | File Utils | | ------------------------------------------- | ------------------------------------------------ | | ~80 lines per CSV flow (download + parse) | ~10 lines end-to-end | | Manual event orchestration for downloads | Encapsulated in `handleDownload()` | | Manual path handling and `saveAs` | Returns a ready-to-use file path | | Manual existence checks and error handling | Centralized in one place via utility patterns | | Manual CSV parsing config (headers, typing) | `readCSV()` returns `{ data, headers }` directly | ## Pattern Examples ### Example 1: UI-Triggered CSV Download **Context**: User clicks button, CSV downloads, validate contents. **Implementation**: ```typescript import { handleDownload, readCSV } from '@seontechnologies/playwright-utils/file-utils'; import path from 'node:path'; const DOWNLOAD_DIR = path.join(__dirname, '../downloads'); test('should download and validate CSV', async ({ page }) => { const downloadPath = await handleDownload({ page, downloadDir: DOWNLOAD_DIR, trigger: () => page.getByTestId('download-button-text/csv').click(), }); const csvResult = await readCSV({ filePath: downloadPath }); // Access parsed data and headers const { data, headers } = csvResult.content; expect(headers).toEqual(['ID', 'Name', 'Email']); expect(data[0]).toMatchObject({ ID: expect.any(String), Name: expect.any(String), Email: expect.any(String), }); }); ``` **Key Points**: - `handleDownload` waits for download, returns file path - `readCSV` auto-parses to `{ headers, data }` - Type-safe access to parsed content - Clean up downloads in `afterEach` ### Example 2: XLSX with Multiple Sheets **Context**: Excel file with multiple sheets (e.g., Summary, Details, Errors). **Implementation**: ```typescript import { readXLSX } from '@seontechnologies/playwright-utils/file-utils'; test('should read multi-sheet XLSX', async () => { const downloadPath = await handleDownload({ page, downloadDir: DOWNLOAD_DIR, trigger: () => page.click('[data-testid="export-xlsx"]'), }); const xlsxResult = await readXLSX({ filePath: downloadPath }); // Verify worksheet structure expect(xlsxResult.content.worksheets.length).toBeGreaterThan(0); const worksheet = xlsxResult.content.worksheets[0]; expect(worksheet).toBeDefined(); expect(worksheet).toHaveProperty('name'); // Access sheet data const sheetData = worksheet?.data; expect(Array.isArray(sheetData)).toBe(true); // Use type assertion for type safety const firstRow = sheetData![0] as Record; expect(firstRow).toHaveProperty('id'); }); ``` **Key Points**: - `worksheets` array with `name` and `data` properties - Access sheets by name - Each sheet has its own headers and data - Type-safe sheet iteration ### Example 3: PDF Text Extraction **Context**: Validate PDF report contains expected content. **Implementation**: ```typescript import { readPDF } from '@seontechnologies/playwright-utils/file-utils'; test('should validate PDF report', async () => { const downloadPath = await handleDownload({ page, downloadDir: DOWNLOAD_DIR, trigger: () => page.getByTestId('download-button-Text-based PDF Document').click(), }); const pdfResult = await readPDF({ filePath: downloadPath }); // content is extracted text from all pages expect(pdfResult.pagesCount).toBe(1); expect(pdfResult.fileName).toContain('.pdf'); expect(pdfResult.content).toContain('All you need is the free Adobe Acrobat Reader'); }); ``` **PDF Reader Options:** ```typescript const result = await readPDF({ filePath: '/path/to/document.pdf', mergePages: false, // Keep pages separate (default: true) debug: true, // Enable debug logging maxPages: 10, // Limit processing to first 10 pages }); ``` **Important Limitation - Vector-based PDFs:** Text extraction may fail for PDFs that store text as vector graphics (e.g., those generated by jsPDF): ```typescript // Vector-based PDF example (extraction fails gracefully) const pdfResult = await readPDF({ filePath: downloadPath }); expect(pdfResult.pagesCount).toBe(1); expect(pdfResult.info.extractionNotes).toContain( 'Text extraction from vector-based PDFs is not supported.' ); ``` Such PDFs will have: - `textExtractionSuccess: false` - `isVectorBased: true` - Explanatory message in `extractionNotes` ### Example 4: ZIP Archive Validation **Context**: Validate ZIP contains expected files and extract specific file. **Implementation**: ```typescript import { readZIP } from '@seontechnologies/playwright-utils/file-utils'; test('should validate ZIP archive', async () => { const downloadPath = await handleDownload({ page, downloadDir: DOWNLOAD_DIR, trigger: () => page.click('[data-testid="download-backup"]'), }); const zipResult = await readZIP({ filePath: downloadPath }); // Check file list expect(Array.isArray(zipResult.content.entries)).toBe(true); expect(zipResult.content.entries).toContain( 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv' ); // Extract specific file const targetFile = 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv'; const zipWithExtraction = await readZIP({ filePath: downloadPath, fileToExtract: targetFile, }); // Access extracted file buffer const extractedFiles = zipWithExtraction.content.extractedFiles || {}; const fileBuffer = extractedFiles[targetFile]; expect(fileBuffer).toBeInstanceOf(Buffer); expect(fileBuffer?.length).toBeGreaterThan(0); }); ``` **Key Points**: - `content.entries` lists all files in archive - `fileToExtract` extracts specific files to Buffer - Validate archive structure - Read and parse individual files from ZIP ### Example 5: API-Triggered Download **Context**: API endpoint returns file download (not UI click). **Implementation**: ```typescript test('should download via API', async ({ page, request }) => { const downloadPath = await handleDownload({ page, // Still need page for download events downloadDir: DOWNLOAD_DIR, trigger: async () => { const response = await request.get('/api/export/csv', { headers: { Authorization: 'Bearer token' }, }); if (!response.ok()) { throw new Error(`Export failed: ${response.status()}`); } }, }); const { content } = await readCSV({ filePath: downloadPath }); expect(content.data).toHaveLength(100); }); ``` **Key Points**: - `trigger` can be async API call - API must return `Content-Disposition` header - Still need `page` for download events - Works with authenticated endpoints ### Example 6: Reading CSV from Buffer (ZIP extraction) **Context**: Read CSV content directly from a Buffer (e.g., extracted from ZIP). **Implementation**: ```typescript // Read from a Buffer (e.g., extracted from a ZIP) const zipResult = await readZIP({ filePath: 'archive.zip', fileToExtract: 'data.csv', }); const fileBuffer = zipResult.content.extractedFiles?.['data.csv']; const csvFromBuffer = await readCSV({ content: fileBuffer }); // Read from a string const csvString = 'name,age\nJohn,30\nJane,25'; const csvFromString = await readCSV({ content: csvString }); const { data, headers } = csvFromString.content; expect(headers).toContain('name'); expect(headers).toContain('age'); ``` ## API Reference ### CSV Reader Options | Option | Type | Default | Description | | -------------- | ------------------ | -------- | -------------------------------------- | | `filePath` | `string` | - | Path to CSV file (mutually exclusive) | | `content` | `string \| Buffer` | - | Direct content (mutually exclusive) | | `delimiter` | `string \| 'auto'` | `','` | Value separator, auto-detect if 'auto' | | `encoding` | `string` | `'utf8'` | File encoding | | `parseHeaders` | `boolean` | `true` | Use first row as headers | | `trim` | `boolean` | `true` | Trim whitespace from values | ### XLSX Reader Options | Option | Type | Description | | ----------- | -------- | ------------------------------ | | `filePath` | `string` | Path to XLSX file | | `sheetName` | `string` | Name of sheet to set as active | ### PDF Reader Options | Option | Type | Default | Description | | ------------ | --------- | ------- | --------------------------- | | `filePath` | `string` | - | Path to PDF file (required) | | `mergePages` | `boolean` | `true` | Merge text from all pages | | `maxPages` | `number` | - | Maximum pages to extract | | `debug` | `boolean` | `false` | Enable debug logging | ### ZIP Reader Options | Option | Type | Description | | --------------- | -------- | ---------------------------------- | | `filePath` | `string` | Path to ZIP file | | `fileToExtract` | `string` | Specific file to extract to Buffer | ### Return Values #### CSV Reader Return Value ```typescript { content: { data: Array>, // Parsed rows (excludes header row if parseHeaders: true) headers: string[] | null // Column headers (null if parseHeaders: false) } } ``` #### XLSX Reader Return Value ```typescript { content: { worksheets: Array<{ name: string, // Sheet name rows: Array>, // All rows including headers headers?: string[] // First row as headers (if present) }> } } ``` #### PDF Reader Return Value ```typescript { content: string, // Extracted text (merged or per-page based on mergePages) pagesCount: number, // Total pages in PDF fileName?: string, // Original filename if available info?: Record // PDF metadata (author, title, etc.) } ``` > **Note**: When `mergePages: false`, `content` is an array of strings (one per page). When `maxPages` is set, only that many pages are extracted. #### ZIP Reader Return Value ```typescript { content: { entries: Array<{ name: string, // File/directory path within ZIP size: number, // Uncompressed size in bytes isDirectory: boolean // True for directories }>, extractedFiles: Record // Extracted file contents by path } } ``` > **Note**: When `fileToExtract` is specified, only that file appears in `extractedFiles`. ## Download Cleanup Pattern ```typescript test.afterEach(async () => { // Clean up downloaded files await fs.remove(DOWNLOAD_DIR); }); ``` ## Comparison with Vanilla Playwright Vanilla Playwright (real test) snippet: ```typescript // ~80 lines of boilerplate! const [download] = await Promise.all([ page.waitForEvent('download'), page.getByTestId('download-button-CSV Export').click(), ]); const failure = await download.failure(); expect(failure).toBeNull(); const filePath = testInfo.outputPath(download.suggestedFilename()); await download.saveAs(filePath); await expect .poll( async () => { try { await fs.access(filePath); return true; } catch { return false; } }, { timeout: 5000, intervals: [100, 200, 500] } ) .toBe(true); const csvContent = await fs.readFile(filePath, 'utf-8'); const parseResult = parse(csvContent, { header: true, skipEmptyLines: true, dynamicTyping: true, transformHeader: (header: string) => header.trim(), }); if (parseResult.errors.length > 0) { throw new Error(`CSV parsing errors: ${JSON.stringify(parseResult.errors)}`); } const data = parseResult.data as Array>; const headers = parseResult.meta.fields || []; ``` With File Utils, the same flow becomes: ```typescript const downloadPath = await handleDownload({ page, downloadDir: DOWNLOAD_DIR, trigger: () => page.getByTestId('download-button-text/csv').click(), }); const { data, headers } = (await readCSV({ filePath: downloadPath })).content; ``` ## Related Fragments - `overview.md` - Installation and imports - `api-request.md` - API-triggered downloads - `recurse.md` - Poll for file generation completion ## Anti-Patterns **DON'T leave downloads in place:** ```typescript test('creates file', async () => { await handleDownload({ ... }) // File left in downloads folder }) ``` **DO clean up after tests:** ```typescript test.afterEach(async () => { await fs.remove(DOWNLOAD_DIR); }); ```