Read and validate files (CSV, XLSX, PDF, ZIP) with automatic parsing, type-safe results, and download handling. Simplify file operations in Playwright tests with built-in format support and validation helpers.
Testing file operations in Playwright requires boilerplate:
The file-utils module provides:
| Vanilla Playwright | File Utils |
|---|---|
| ~80 lines per CSV flow (download + parse) | ~10 lines end-to-end |
| Manual event orchestration for downloads | Encapsulated in handleDownload() |
Manual path handling and saveAs |
Returns a ready-to-use file path |
| Manual existence checks and error handling | Centralized in one place via utility patterns |
| Manual CSV parsing config (headers, typing) | readCSV() returns { data, headers } directly |
Context: User clicks button, CSV downloads, validate contents.
Implementation:
import { handleDownload, readCSV } from '@seontechnologies/playwright-utils/file-utils';
import path from 'node:path';
const DOWNLOAD_DIR = path.join(__dirname, '../downloads');
test('should download and validate CSV', async ({ page }) => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-text/csv').click(),
});
const csvResult = await readCSV({ filePath: downloadPath });
// Access parsed data and headers
const { data, headers } = csvResult.content;
expect(headers).toEqual(['ID', 'Name', 'Email']);
expect(data[0]).toMatchObject({
ID: expect.any(String),
Name: expect.any(String),
Email: expect.any(String),
});
});
Key Points:
handleDownload waits for download, returns file pathreadCSV auto-parses to { headers, data }afterEachContext: Excel file with multiple sheets (e.g., Summary, Details, Errors).
Implementation:
import { readXLSX } from '@seontechnologies/playwright-utils/file-utils';
test('should read multi-sheet XLSX', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.click('[data-testid="export-xlsx"]'),
});
const xlsxResult = await readXLSX({ filePath: downloadPath });
// Verify worksheet structure
expect(xlsxResult.content.worksheets.length).toBeGreaterThan(0);
const worksheet = xlsxResult.content.worksheets[0];
expect(worksheet).toBeDefined();
expect(worksheet).toHaveProperty('name');
// Access sheet data
const sheetData = worksheet?.data;
expect(Array.isArray(sheetData)).toBe(true);
// Use type assertion for type safety
const firstRow = sheetData![0] as Record<string, unknown>;
expect(firstRow).toHaveProperty('id');
});
Key Points:
worksheets array with name and data propertiesContext: Validate PDF report contains expected content.
Implementation:
import { readPDF } from '@seontechnologies/playwright-utils/file-utils';
test('should validate PDF report', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-Text-based PDF Document').click(),
});
const pdfResult = await readPDF({ filePath: downloadPath });
// content is extracted text from all pages
expect(pdfResult.pagesCount).toBe(1);
expect(pdfResult.fileName).toContain('.pdf');
expect(pdfResult.content).toContain('All you need is the free Adobe Acrobat Reader');
});
PDF Reader Options:
const result = await readPDF({
filePath: '/path/to/document.pdf',
mergePages: false, // Keep pages separate (default: true)
debug: true, // Enable debug logging
maxPages: 10, // Limit processing to first 10 pages
});
Important Limitation - Vector-based PDFs:
Text extraction may fail for PDFs that store text as vector graphics (e.g., those generated by jsPDF):
// Vector-based PDF example (extraction fails gracefully)
const pdfResult = await readPDF({ filePath: downloadPath });
expect(pdfResult.pagesCount).toBe(1);
expect(pdfResult.info.extractionNotes).toContain(
'Text extraction from vector-based PDFs is not supported.'
);
Such PDFs will have:
textExtractionSuccess: falseisVectorBased: trueextractionNotesContext: Validate ZIP contains expected files and extract specific file.
Implementation:
import { readZIP } from '@seontechnologies/playwright-utils/file-utils';
test('should validate ZIP archive', async () => {
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.click('[data-testid="download-backup"]'),
});
const zipResult = await readZIP({ filePath: downloadPath });
// Check file list
expect(Array.isArray(zipResult.content.entries)).toBe(true);
expect(zipResult.content.entries).toContain(
'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv'
);
// Extract specific file
const targetFile = 'Case_53125_10-19-22_AM/Case_53125_10-19-22_AM_case_data.csv';
const zipWithExtraction = await readZIP({
filePath: downloadPath,
fileToExtract: targetFile,
});
// Access extracted file buffer
const extractedFiles = zipWithExtraction.content.extractedFiles || {};
const fileBuffer = extractedFiles[targetFile];
expect(fileBuffer).toBeInstanceOf(Buffer);
expect(fileBuffer?.length).toBeGreaterThan(0);
});
Key Points:
content.entries lists all files in archivefileToExtract extracts specific files to BufferContext: API endpoint returns file download (not UI click).
Implementation:
test('should download via API', async ({ page, request }) => {
const downloadPath = await handleDownload({
page, // Still need page for download events
downloadDir: DOWNLOAD_DIR,
trigger: async () => {
const response = await request.get('/api/export/csv', {
headers: { Authorization: 'Bearer token' },
});
if (!response.ok()) {
throw new Error(`Export failed: ${response.status()}`);
}
},
});
const { content } = await readCSV({ filePath: downloadPath });
expect(content.data).toHaveLength(100);
});
Key Points:
trigger can be async API callContent-Disposition headerpage for download eventsContext: Read CSV content directly from a Buffer (e.g., extracted from ZIP).
Implementation:
// Read from a Buffer (e.g., extracted from a ZIP)
const zipResult = await readZIP({
filePath: 'archive.zip',
fileToExtract: 'data.csv',
});
const fileBuffer = zipResult.content.extractedFiles?.['data.csv'];
const csvFromBuffer = await readCSV({ content: fileBuffer });
// Read from a string
const csvString = 'name,age\nJohn,30\nJane,25';
const csvFromString = await readCSV({ content: csvString });
const { data, headers } = csvFromString.content;
expect(headers).toContain('name');
expect(headers).toContain('age');
| Option | Type | Default | Description |
|---|---|---|---|
filePath |
string |
- | Path to CSV file (mutually exclusive) |
content |
string \| Buffer |
- | Direct content (mutually exclusive) |
delimiter |
string \| 'auto' |
',' |
Value separator, auto-detect if 'auto' |
encoding |
string |
'utf8' |
File encoding |
parseHeaders |
boolean |
true |
Use first row as headers |
trim |
boolean |
true |
Trim whitespace from values |
| Option | Type | Description |
|---|---|---|
filePath |
string |
Path to XLSX file |
sheetName |
string |
Name of sheet to set as active |
| Option | Type | Default | Description |
|---|---|---|---|
filePath |
string |
- | Path to PDF file (required) |
mergePages |
boolean |
true |
Merge text from all pages |
maxPages |
number |
- | Maximum pages to extract |
debug |
boolean |
false |
Enable debug logging |
| Option | Type | Description |
|---|---|---|
filePath |
string |
Path to ZIP file |
fileToExtract |
string |
Specific file to extract to Buffer |
{
content: {
data: Array<Array<string | number>>, // Parsed rows (excludes header row if parseHeaders: true)
headers: string[] | null // Column headers (null if parseHeaders: false)
}
}
{
content: {
worksheets: Array<{
name: string, // Sheet name
rows: Array<Array<any>>, // All rows including headers
headers?: string[] // First row as headers (if present)
}>
}
}
{
content: string, // Extracted text (merged or per-page based on mergePages)
pagesCount: number, // Total pages in PDF
fileName?: string, // Original filename if available
info?: Record<string, any> // PDF metadata (author, title, etc.)
}
Note: When
mergePages: false,contentis an array of strings (one per page). WhenmaxPagesis set, only that many pages are extracted.
{
content: {
entries: Array<{
name: string, // File/directory path within ZIP
size: number, // Uncompressed size in bytes
isDirectory: boolean // True for directories
}>,
extractedFiles: Record<string, Buffer | string> // Extracted file contents by path
}
}
Note: When
fileToExtractis specified, only that file appears inextractedFiles.
test.afterEach(async () => {
// Clean up downloaded files
await fs.remove(DOWNLOAD_DIR);
});
Vanilla Playwright (real test) snippet:
// ~80 lines of boilerplate!
const [download] = await Promise.all([
page.waitForEvent('download'),
page.getByTestId('download-button-CSV Export').click(),
]);
const failure = await download.failure();
expect(failure).toBeNull();
const filePath = testInfo.outputPath(download.suggestedFilename());
await download.saveAs(filePath);
await expect
.poll(
async () => {
try {
await fs.access(filePath);
return true;
} catch {
return false;
}
},
{ timeout: 5000, intervals: [100, 200, 500] }
)
.toBe(true);
const csvContent = await fs.readFile(filePath, 'utf-8');
const parseResult = parse(csvContent, {
header: true,
skipEmptyLines: true,
dynamicTyping: true,
transformHeader: (header: string) => header.trim(),
});
if (parseResult.errors.length > 0) {
throw new Error(`CSV parsing errors: ${JSON.stringify(parseResult.errors)}`);
}
const data = parseResult.data as Array<Record<string, unknown>>;
const headers = parseResult.meta.fields || [];
With File Utils, the same flow becomes:
const downloadPath = await handleDownload({
page,
downloadDir: DOWNLOAD_DIR,
trigger: () => page.getByTestId('download-button-text/csv').click(),
});
const { data, headers } = (await readCSV({ filePath: downloadPath })).content;
overview.md - Installation and importsapi-request.md - API-triggered downloadsrecurse.md - Poll for file generation completionDON'T leave downloads in place:
test('creates file', async () => {
await handleDownload({ ... })
// File left in downloads folder
})
DO clean up after tests:
test.afterEach(async () => {
await fs.remove(DOWNLOAD_DIR);
});