File Utilities

Principle

Read and validate files (CSV, XLSX, PDF, ZIP) with automatic parsing, type-safe results, and download handling. Simplify file operations in Playwright tests with built-in format support and validation helpers.

Rationale

Testing file operations in Playwright requires boilerplate:

Manual download handling
External parsing libraries for each format
No validation helpers
Type-unsafe results
Repetitive path handling

The file-utils module provides:

Auto-parsing: CSV, XLSX, PDF, ZIP automatically parsed
Download handling: Single function for UI or API-triggered downloads
Type-safe: TypeScript interfaces for parsed results
Validation helpers: Row count, header checks, content validation
Format support: Multiple sheet support (XLSX), text extraction (PDF), archive extraction (ZIP)

Why Use This Instead of Vanilla Playwright?

Vanilla Playwright	File Utils
~80 lines per CSV flow (download + parse)	~10 lines end-to-end
Manual event orchestration for downloads	Encapsulated in `handleDownload()`
Manual path handling and `saveAs`	Returns a ready-to-use file path
Manual existence checks and error handling	Centralized in one place via utility patterns
Manual CSV parsing config (headers, typing)	`readCSV()` returns `{ data, headers }` directly

Pattern Examples

Example 1: UI-Triggered CSV Download

Context: User clicks button, CSV downloads, validate contents.

Implementation:

import { handleDownload, readCSV } from '@seontechnologies/playwright-utils/file-utils';
import path from 'node:path';

const DOWNLOAD_DIR = path.join(__dirname, '../downloads');

test('should download and validate CSV', async ({ page }) => {
  const downloadPath = await handleDownload({
    page,
    downloadDir: DOWNLOAD_DIR,
    trigger: () => page.getByTestId('download-button-text/csv').click(),
  });

  const csvResult = await readCSV({ filePath: downloadPath });

  // Access parsed data and headers
  const { data, headers } = csvResult.content;
  expect(headers).toEqual(['ID', 'Name', 'Email']);
  expect(data[0]).toMatchObject({
    ID: expect.any(String),
    Name: expect.any(String),
    Email: expect.any(String),
  });
});

Key Points:

handleDownload waits for download, returns file path
readCSV auto-parses to { headers, data }
Type-safe access to parsed content
Clean up downloads in afterEach

Example 2: XLSX with Multiple Sheets

Context: Excel file with multiple sheets (e.g., Summary, Details, Errors).

Implementation:

import { readXLSX } from '@seontechnologies/playwright-utils/file-utils';

test('should read multi-sheet XLSX', async () => {
  const downloadPath = await handleDownload({
    page,
    downloadDir: DOWNLOAD_DIR,
    trigger: () => page.click('[data-testid="export-xlsx"]'),
  });

  const xlsxResult = await readXLSX({ filePath: downloadPath });

  // Verify worksheet structure
  expect(xlsxResult.content.worksheets.length).toBeGreaterThan(0);
  const worksheet = xlsxResult.content.worksheets[0];
  expect(worksheet).toBeDefined();
  expect(worksheet).toHaveProperty('name');

  // Access sheet data
  const sheetData = worksheet?.data;
  expect(Array.isArray(sheetData)).toBe(true);

  // Use type assertion for type safety
  const firstRow = sheetData![0] as Record<string, unknown>;
  expect(firstRow).toHaveProperty('id');
});

Key Points:

worksheets array with name and data properties
Access sheets by name
Each sheet has its own headers and data
Type-safe sheet iteration

Example 3: PDF Text Extraction

Context: Validate PDF report contains expected content.

Implementation:

import { readPDF } from '@seontechnologies/playwright-utils/file-utils';

test('should validate PDF report', async () => {
  const downloadPath = await handleDownload({
    page,
    downloadDir: DOWNLOAD_DIR,
    trigger: () => page.getByTestId('download-button-Text-based PDF Document').click(),
  });

  const pdfResult = await readPDF({ filePath: downloadPath });

  // content is extracted text from all pages
  expect(pdfResult.pagesCount).toBe(1);
  expect(pdfResult.fileName).toContain('.pdf');
  expect(pdfResult.content).toContain('All you need is the free Adobe Acrobat Reader');
});

PDF Reader Options:

const result = await readPDF({
  filePath: '/path/to/document.pdf',
  mergePages: false, // Keep pages separate (default: true)
  debug: true, // Enable debug logging
  maxPages: 10, // Limit processing to first 10 pages
});

Important Limitation - Vector-based PDFs:

Text extraction may fail for PDFs that store text as vector graphics (e.g., those generated by jsPDF):

// Vector-based PDF example (extraction fails gracefully)
const pdfResult = await readPDF({ filePath: downloadPath });

expect(pdfResult.pagesCount).toBe(1);
expect(pdfResult.info.extractionNotes).toContain(
  'Text extraction from vector-based PDFs is not supported.'
);

Such PDFs will have:

textExtractionSuccess: false
isVectorBased: true
Explanatory message in extractionNotes

Example 4: ZIP Archive Validation

Context: Validate ZIP contains expected files and extract specific file.

Implementation:

import { readZIP } from '@seontechnologies/playwright-utils/file-utils';

test('should validate ZIP archive', async () => {
  const downloadPath = await handleDownload({
    page,
    downloadDir: DOWNLOAD_DIR,
    trigger: () => page.click('[data-testid="download-backup"]'),
  });

  const zipResult = await readZIP({ filePath: downloadPath });

  // Check file list
  expect(Array.isArray(zipResult.content.entries)).toBe(true);
  expect(zipResult.content.entries).toContain(
    'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv'
  );

  // Extract specific file
  const targetFile = 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv';
  const zipWithExtraction = await readZIP({
    filePath: downloadPath,
    fileToExtract: targetFile,
  });

  // Access extracted file buffer
  const extractedFiles = zipWithExtraction.content.extractedFiles || {};
  const fileBuffer = extractedFiles[targetFile];
  expect(fileBuffer).toBeInstanceOf(Buffer);
  expect(fileBuffer?.length).toBeGreaterThan(0);
});

Key Points:

content.entries lists all files in archive
fileToExtract extracts specific files to Buffer
Validate archive structure
Read and parse individual files from ZIP

Example 5: API-Triggered Download

Context: API endpoint returns file download (not UI click).

Implementation:

test('should download via API', async ({ page, request }) => {
  const downloadPath = await handleDownload({
    page, // Still need page for download events
    downloadDir: DOWNLOAD_DIR,
    trigger: async () => {
      const response = await request.get('/api/export/csv', {
        headers: { Authorization: 'Bearer token' },
      });

      if (!response.ok()) {
        throw new Error(`Export failed: ${response.status()}`);
      }
    },
  });

  const { content } = await readCSV({ filePath: downloadPath });

  expect(content.data).toHaveLength(100);
});

Key Points:

trigger can be async API call
API must return Content-Disposition header
Still need page for download events
Works with authenticated endpoints

Example 6: Reading CSV from Buffer (ZIP extraction)

Context: Read CSV content directly from a Buffer (e.g., extracted from ZIP).

Implementation:

// Read from a Buffer (e.g., extracted from a ZIP)
const zipResult = await readZIP({
  filePath: 'archive.zip',
  fileToExtract: 'data.csv',
});
const fileBuffer = zipResult.content.extractedFiles?.['data.csv'];
const csvFromBuffer = await readCSV({ content: fileBuffer });

// Read from a string
const csvString = 'name,age\nJohn,30\nJane,25';
const csvFromString = await readCSV({ content: csvString });

const { data, headers } = csvFromString.content;
expect(headers).toContain('name');
expect(headers).toContain('age');

API Reference

CSV Reader Options

Option	Type	Default	Description
`filePath`	`string`	-	Path to CSV file (mutually exclusive)
`content`	`string \\| Buffer`	-	Direct content (mutually exclusive)
`delimiter`	`string \\| 'auto'`	`','`	Value separator, auto-detect if 'auto'
`encoding`	`string`	`'utf8'`	File encoding
`parseHeaders`	`boolean`	`true`	Use first row as headers
`trim`	`boolean`	`true`	Trim whitespace from values

XLSX Reader Options

Option	Type	Description
`filePath`	`string`	Path to XLSX file
`sheetName`	`string`	Name of sheet to set as active

PDF Reader Options

Option	Type	Default	Description
`filePath`	`string`	-	Path to PDF file (required)
`mergePages`	`boolean`	`true`	Merge text from all pages
`maxPages`	`number`	-	Maximum pages to extract
`debug`	`boolean`	`false`	Enable debug logging

ZIP Reader Options

Option	Type	Description
`filePath`	`string`	Path to ZIP file
`fileToExtract`	`string`	Specific file to extract to Buffer

Return Values

CSV Reader Return Value

{
  content: {
    data: Array<Array<string | number>>,  // Parsed rows (excludes header row if parseHeaders: true)
    headers: string[] | null              // Column headers (null if parseHeaders: false)
  }
}

XLSX Reader Return Value

{
  content: {
    worksheets: Array<{
      name: string,                       // Sheet name
      rows: Array<Array<any>>,            // All rows including headers
      headers?: string[]                  // First row as headers (if present)
    }>
  }
}

PDF Reader Return Value

{
  content: string,                        // Extracted text (merged or per-page based on mergePages)
  pagesCount: number,                     // Total pages in PDF
  fileName?: string,                      // Original filename if available
  info?: Record<string, any>              // PDF metadata (author, title, etc.)
}

Note: When mergePages: false, content is an array of strings (one per page). When maxPages is set, only that many pages are extracted.

ZIP Reader Return Value

{
  content: {
    entries: Array<{
      name: string,                       // File/directory path within ZIP
      size: number,                       // Uncompressed size in bytes
      isDirectory: boolean                // True for directories
    }>,
    extractedFiles: Record<string, Buffer | string>  // Extracted file contents by path
  }
}

Note: When fileToExtract is specified, only that file appears in extractedFiles.

Download Cleanup Pattern

test.afterEach(async () => {
  // Clean up downloaded files
  await fs.remove(DOWNLOAD_DIR);
});

Comparison with Vanilla Playwright

Vanilla Playwright (real test) snippet:

// ~80 lines of boilerplate!
const [download] = await Promise.all([
  page.waitForEvent('download'),
  page.getByTestId('download-button-CSV Export').click(),
]);

const failure = await download.failure();
expect(failure).toBeNull();

const filePath = testInfo.outputPath(download.suggestedFilename());
await download.saveAs(filePath);

await expect
  .poll(
    async () => {
      try {
        await fs.access(filePath);
        return true;
      } catch {
        return false;
      }
    },
    { timeout: 5000, intervals: [100, 200, 500] }
  )
  .toBe(true);

const csvContent = await fs.readFile(filePath, 'utf-8');

const parseResult = parse(csvContent, {
  header: true,
  skipEmptyLines: true,
  dynamicTyping: true,
  transformHeader: (header: string) => header.trim(),
});

if (parseResult.errors.length > 0) {
  throw new Error(`CSV parsing errors: ${JSON.stringify(parseResult.errors)}`);
}

const data = parseResult.data as Array<Record<string, unknown>>;
const headers = parseResult.meta.fields || [];

With File Utils, the same flow becomes:

const downloadPath = await handleDownload({
  page,
  downloadDir: DOWNLOAD_DIR,
  trigger: () => page.getByTestId('download-button-text/csv').click(),
});

const { data, headers } = (await readCSV({ filePath: downloadPath })).content;

Related Fragments

overview.md - Installation and imports
api-request.md - API-triggered downloads
recurse.md - Poll for file generation completion

Anti-Patterns

DON'T leave downloads in place:

test('creates file', async () => {
  await handleDownload({ ... })
  // File left in downloads folder
})

DO clean up after tests:

test.afterEach(async () => {
  await fs.remove(DOWNLOAD_DIR);
});

file-utils.md 13 KB Historia Czysty

File Utilities

Principle

Rationale

Why Use This Instead of Vanilla Playwright?

Pattern Examples

Example 1: UI-Triggered CSV Download

Example 2: XLSX with Multiple Sheets

Example 3: PDF Text Extraction

Example 4: ZIP Archive Validation

Example 5: API-Triggered Download

Example 6: Reading CSV from Buffer (ZIP extraction)

API Reference

CSV Reader Options

XLSX Reader Options

PDF Reader Options

ZIP Reader Options

Return Values

CSV Reader Return Value

XLSX Reader Return Value

PDF Reader Return Value

ZIP Reader Return Value

Download Cleanup Pattern

Comparison with Vanilla Playwright

Related Fragments

Anti-Patterns

file-utils.md 13 KB

Historia Czysty