Changes for Groq demo #499

eastlondoner · 2025-02-18T16:31:10Z

I wanted to use stagehand with Groq Qwen-2.5-32b - this is a model that's not as smart as Sonnet and it did not work initially. It ran into various problems, changes that I made to get this working are all included in this PR

These improvements make stagehand work better with "dumber" models like Qwen-2.5

Add "You can only make one tool call per step, you will get the result of the tool call in the next step." to the act system prompt. This stops models from sending multiple tool calls (because stagehand only runs the first tool call) this reduces model confusion and improves speed and cost (fewer return tokens needed).
Added a note to discourage trying the same actions repeatedly in the act system prompt
Define the main available actions and communicate them to the LLM. This makes things much clearer for the LLM and helps the LLM not to 'hallucinate' tool calls.
If the LLM attempts an unavailable action, return to it an error explaining what the available actions are (including actions available on the chosen element) this helps models to self-heal if they make a mistake.
introduces a "goBack" action to use page.back() models want to use this to recover from mistakes
moves the actions that the model has taken into "assistant" role message, this is more in line with expected chat interaction behaviour. IMO this helps the models understand that they took the actions and reduces repetition of the same action.
adds a new return method for the model to return data to the caller. This solves a problem where if you ask Stagehand to return some information it has no way to do it.
made llm completion support an array of messages rather than just grabbing the first one. This ensures response information from models is not lost
adds handling for case where the model tool call includes a null element
adds "commentary" to the steps carried out. This includes non-tool-call content from the model allowing it to keep its side of the conversation going or pass forward thinking tokens

changeset-bot · 2025-02-18T16:31:14Z

⚠️ No Changeset found

Latest commit: e9b4ea3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

eastlondoner · 2025-02-18T16:32:33Z

lib/a11y/utils.ts

+      default: {
+        throw exhaustiveMatchingGuard(action);
+      }


this makes this a type safe switch statement. Every defined action must be implemented or this will not compile

eastlondoner · 2025-02-18T16:33:30Z

lib/actions.ts

+export const actMethods = [
+  "scrollIntoView",
+  "press",
+  "click",
+  "fill",
+  "type",
+  "goBack",
+] as const;


this defines the additional actions so the LLM can know what they are

eastlondoner · 2025-02-18T16:40:50Z

lib/prompt.ts

@@ -16,11 +17,13 @@ You will receive:


 ## Your Goal / Specification
-You have 2 tools that you can call: doAction, and skipSection. Do action only performs Playwright actions. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal.
+You have 3 tools that you can call: doAction, returnPlan, returnResult, and skipSection. Do action performs Playwright actions and can record information found so far. Return Plan is used to return a plan for multiple steps. Return Result is used to return either an intermediate result or a final result. Skip Section is used to skip a section of the page. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal. You can only make one tool call per step, you will get the result of the tool call in the next step.


there are a couple of changes in here. A very important one is

You can only make one tool call per step, you will get the result of the tool call in the next step.

- Add Groq-specific error classes for better error classification: GroqError as base, GroqAPIError for API errors, GroqAuthenticationError (401), GroqRateLimitError (429), GroqTimeoutError, GroqConnectionError, and GroqValidationError - Improve error handling in createChatCompletion with proper error mapping, retry logic, comprehensive logging, and fixed cache-related scoping - Update TODO.md to reflect completed error handling tasks test: Add initial Groq client integration tests - Add basic client initialization and chat completion tests - Create comprehensive test plan for Groq client - Add GROQ_API_KEY to .env.example - Set up test infrastructure with logging and environment handling test: Add initial Groq client integration tests - Add basic client initialization and chat completion tests - Create comprehensive test plan for Groq client - Add GROQ_API_KEY to .env.example - Set up test infrastructure with logging and environment handling add groq groq vibes well move groq client to external

eastlondoner · 2025-02-18T16:51:38Z

lib/inference.ts

+    if (toolCalls[0].function.name === "returnPlan") {
+      const { plan } = JSON.parse(toolCalls[0].function.arguments);
+      return {
+        result: plan,
+        completed: false,
+        commentary: "This is my plan for the next steps",
+      };
+    }


this was an experiment but didn't make a meaningful difference

eastlondoner · 2025-02-18T16:51:48Z

lib/llm/AnthropicClient.ts

+    console.log("sending messages to anthropic", {
+      anthropicTools: JSON.stringify(anthropicTools),
+      systemMessage: JSON.stringify(systemMessage),
+      formattedMessages: JSON.stringify(formattedMessages),
+    });


eastlondoner · 2025-02-18T16:52:33Z

lib/llm/AnthropicClient.ts

-              response.content.find((c) => c.type === "text")?.text || null,
+              response.content
+                .filter((c) => c.type === "text")
+                ?.map((c) => c.text)
+                .join("\n") || null,


this was throwing away response content because it only returns the first instance of type text. Some models return multiple response elements

eastlondoner · 2025-02-18T16:52:54Z

lib/llm/OpenAIClient.ts

+    console.log("openai response", JSON.stringify(response, null, 2));
+


eastlondoner · 2025-02-18T16:53:47Z

lib/prompt.ts

+  {
+    type: "function",
+    name: "returnPlan",
+    description: "return a plan for multiple steps",
+    parameters: {
+      type: "object",
+      required: ["plan"],
+      properties: {
+        plan: {
+          type: "string",
+          description: "The plan for the next steps",
+        },


remove returnPlan it didn't achieve much

eastlondoner · 2025-02-18T17:20:10Z

lib/prompt.ts

 If the user's goal will be accomplished after running the playwright action, set completed to true. Better to have completed set to true if your are not sure.

 Note 1: If there is a popup on the page for cookies or advertising that has nothing to do with the goal, try to close it first before proceeding. As this can block the goal from being completed.
 Note 2: Sometimes what your are looking for is hidden behind and element you need to interact with. For example, sliders, buttons, etc...
+Note 3: Avoid repeating actions that have already been taken, try something different.


Added this to discourage infinite retries

eastlondoner · 2025-02-18T17:38:01Z

lib/handlers/actHandler.ts

+      // If elementId is null, use the document body as the root element
+      let xpaths: string[];
+      if (elementId === null) {
+        xpaths = ["//body"];
+      } else {
+        xpaths = selectorMap[elementId] ?? [];
+      }


handle the case where the tool call sets the elementId to null

eastlondoner · 2025-02-18T17:38:22Z

lib/handlers/actHandler.ts

@@ -1235,18 +1312,68 @@ export class StagehandActHandler {
        }
      }

+      if ("result" in response) {


new response type implementation

eastlondoner · 2025-02-18T17:38:59Z

lib/handlers/actHandler.ts

+          if (method !== "goBack") {
+            throw new Error("None of the provided XPaths could be located.");
+          }


goBack does not require an element

eastlondoner · 2025-02-18T17:39:33Z

lib/handlers/actHandler.ts

+          (response.commentary && response.commentary?.toUpperCase() !== "NULL"
+            ? `  Commentary: ${response.commentary}\n`
+            : "") +
+          (response.findings && response.findings?.toUpperCase() !== "NULL"
+            ? `  Findings: ${response.findings}\n`
+            : "");


added commentary and findings for the model to communicate more information to its future self

eastlondoner · 2025-02-18T17:41:21Z

lib/inference.ts

+      return {
+        result,
+        completed,
+        commentary: modelCommentary,


modelCommentary collects additional (non tool call) messages that the model records, these are often helpful to the model in future. particularly with a thinking model like r1 if this contains thinking tokens

eastlondoner force-pushed the groq-demo branch from ab83f96 to e82fdf1 Compare February 18, 2025 16:31

eastlondoner commented Feb 18, 2025

View reviewed changes

eastlondoner force-pushed the groq-demo branch 2 times, most recently from 8d5e352 to 49609b0 Compare February 18, 2025 16:43

eastlondoner force-pushed the groq-demo branch from 49609b0 to 22e40e0 Compare February 18, 2025 16:44

eastlondoner commented Feb 18, 2025

View reviewed changes

tidy up

e9b4ea3

eastlondoner commented Feb 18, 2025

View reviewed changes

eastlondoner marked this pull request as ready for review February 18, 2025 17:28

eastlondoner commented Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes for Groq demo #499

Changes for Groq demo #499

eastlondoner commented Feb 18, 2025 •

edited

Loading

changeset-bot bot commented Feb 18, 2025 •

edited

Loading

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

eastlondoner Feb 18, 2025

		console.log("openai response", JSON.stringify(response, null, 2));

Changes for Groq demo #499

Are you sure you want to change the base?

Changes for Groq demo #499

Conversation

eastlondoner commented Feb 18, 2025 • edited Loading

changeset-bot bot commented Feb 18, 2025 • edited Loading

⚠️ No Changeset found

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eastlondoner commented Feb 18, 2025 •

edited

Loading

changeset-bot bot commented Feb 18, 2025 •

edited

Loading