Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add playwright/cypress/puppeteer code dumping #419

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

christopherhwood
Copy link

why

This adds support for a popular request to be able to dump the stagehand actions taken during a session as playwright code (and adds support for outputting as cypress or puppeteer + either typescript or python as well)

what changed

Added a new ActionRecorder which functions just like a cache except it resets state every time it is initialized (to try to only hold the state from a single session).

The actions from this cache are then converted into playwright code via some hard-coded rules in a newly added testCodeGenerator.ts file.

If the user requests for cypress or puppeteer then we invoke a call to the LLM to convert the playwright code into the other test format.

test plan

Added a new example 2048_recorder.ts. It functions the same as the original 2048 example, except the game loop only loops once, the recorder is enabled, the original code to invoke a playwright function that does a keypress is replaced with a call to stagehand.act, and at the end of the example we dump the playwright code in typescript to the console.

Copy link

changeset-bot bot commented Jan 21, 2025

⚠️ No Changeset found

Latest commit: 0253e1b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@kamath
Copy link
Member

kamath commented Jan 21, 2025

boom!!! thanks so much for this, will take a look shortly

@sankalpgunturi
Copy link
Contributor

sankalpgunturi commented Jan 31, 2025

Hi @christopherhwood,

Great work! I tested your branch and noticed that page navigations and keyboard presses are not recorded deterministically.

For example, in this script:

await stagehand.init();

await stagehand.page.goto("https://www.saucedemo.com");
await stagehand.page.act({
  action: "type standard_user in username",
});

await stagehand.page.act({
  action: "type secret_sauce in password",
});

await stagehand.page.act({
  action: 'click "Log In"',
});
await stagehand.page.goto("https://www.bing.com");
await stagehand.page.goto("https://www.google.com");
await stagehand.page.act({
  action: "click on the search bar",
});
await stagehand.page.act({
  action: "type zebra",
});
await stagehand.page.keyboard.press("Enter");

const playwrightCode = await stagehand.dumpRecordedActionsCode({
  testFramework: "playwright",
  language: "typescript",
});
console.log("Playwright Code:\n\n", playwrightCode);

await stagehand.close();

The Playwright dump captures the saucedemo.com navigation correctly. However, the navigations to bing.com and google.com are missing. Interestingly, all actions performed on google.com are captured correctly, but the keyboard press was not recorded.

import { chromium } from '@playwright/test';

async function run() {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto('https://www.saucedemo.com/');

  await page.locator('xpath=//body[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/form[1]/div[1]/input[1]').fill('standard_user');
  await page.locator('xpath=//body[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/form[1]/div[2]/input[1]').type('secret_sauce');
  await page.locator('xpath=//body[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/form[1]/input[1]').click();
  await page.locator('xpath=//body[1]/div[1]/div[3]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/textarea[1]').click();
  await page.locator('xpath=//body[1]/div[1]/div[3]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/textarea[1]').fill('zebra');

  await context.close();
  await browser.close();
}

run().catch(console.error);

@christopherhwood
Copy link
Author

Thanks for testing it out @sankalpgunturi.

You're correct that it doesn't record everything done on page. I see how that would be valuable, at the time I was writing this I was only focused on the act actions since those are the primary place where the LLM is converting natural language into playwright.

If we wanted to capture all actions including those that are natively on the playwright page we would need to add a layer of interception that optionally records these actions based on whether the user has indicated they want recording or not.

As it works today, stagehand is just appending some more functions onto the playwright page and not doing anything to the existing actions.

Unfortunately I am entering a very busy period and not sure when I will have the time to revisit this and add that functionality. If someone else wants to take a shot you are more than welcome to do so.

@sankalpgunturi
Copy link
Contributor

sankalpgunturi commented Feb 3, 2025 via email

@JosXa
Copy link

JosXa commented Mar 13, 2025

I feel like this feature could be a great improvement for the use of stagehand by QA engineers and devs who would like to express website requirements in natural language and then generate a predictable click test using Stagehand.

I'm imagining that we specify our requirements and make sure the click test behaves correctly once, then save every LLM instruction as a set of snapshots into the test case, and re-run it in "LLM mode" when tests fail - to regenerate the snapshot.

I imagine this would need some sort of state machine or unique ID system to correctly identify which act / observe instruction matches which snapshot data stored as files in Git, but having this PR is already quite nice as an alternative to caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants