AI-Assisted Playwright Automation Framework

Overview

Test automation has a scaling problem: the suite grows linearly with the product, but the team that maintains it usually doesn’t. The framework that wins is the one that lets a single engineer add coverage for a new feature in minutes, lets the AI agents do the boring parts, and produces failure artifacts a human can actually read at 11 PM the night before a release.

This is that framework — Playwright on JavaScript, with Playwright MCP wiring AI agents into the loop for test scaffolding, plus a deterministic fixtures layer that the AI is allowed to call but not allowed to replace.

Approach

Three principles drove the design:

AI for scaffolding, fixtures for truth. The AI agents propose locators, generate test bodies, and draft assertions. The fixtures own auth, app state, network mocks, and any operation that has to be the same every run. This split keeps the suite stable as the AI gets better — or worse.
Failure artifacts as a first-class output. Every failed run produces a video, a trace, the network HAR, the console log, and a screenshot at the moment of failure, named consistently. Triage means opening one folder, not five tools.
API and UI on the same plane. REST API tests live in the same project as UI tests, share the same fixtures, and run against the same environments. That collapses an entire category of “passes in API tests, fails in UI” mystery bugs.

What I built

Reusable fixtures for authenticated session state, common test users, seeded data, and network interception — composed via Playwright’s fixture system rather than imperative beforeEach scripts.
API automation layer that hits the same REST endpoints the UI consumes, with JSON schema validation on every response so contract regressions surface from automation rather than from a customer.
Playwright MCP integration so AI agents can drive the browser during exploratory test generation — they propose tests, run them, and report results back; humans review and merge.
Failure artifact pipeline wired into CI. On failure, the run uploads video + trace + HAR + console log to artifact storage with a stable URL pattern that links back from the failed PR comment.
Selenium / Cucumber migration tooling to convert legacy tests, preserving the assertion semantics while dropping the step-definition indirection.

Results

Regression effort down 30% across the suite once the migration completed and the AI-assisted scaffolding was in place.
Maintainability up 35% vs. the legacy Selenium / Cucumber framework — measured as time-to-fix for a flaky test plus time-to-add for a new test.
AI-agent test generation worked well for happy-path coverage and locator boilerplate; the human review step stays in place for assertions and edge cases.

Lessons

The two practices that mattered: refusing to let AI agents touch fixture code, and treating failure artifacts as the actual product of a test run. The AI is great at proposals; it’s bad at invariants. The fixtures encode invariants. Keep that boundary clean and the framework stays trustworthy as everything around it churns.