Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for Groq demo #499

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eastlondoner
Copy link
Contributor

@eastlondoner eastlondoner commented Feb 18, 2025

I wanted to use stagehand with Groq Qwen-2.5-32b - this is a model that's not as smart as Sonnet and it did not work initially. It ran into various problems, changes that I made to get this working are all included in this PR

These improvements make stagehand work better with "dumber" models like Qwen-2.5

  • Add "You can only make one tool call per step, you will get the result of the tool call in the next step." to the act system prompt. This stops models from sending multiple tool calls (because stagehand only runs the first tool call) this reduces model confusion and improves speed and cost (fewer return tokens needed).

  • Added a note to discourage trying the same actions repeatedly in the act system prompt

  • Define the main available actions and communicate them to the LLM. This makes things much clearer for the LLM and helps the LLM not to 'hallucinate' tool calls.

  • If the LLM attempts an unavailable action, return to it an error explaining what the available actions are (including actions available on the chosen element) this helps models to self-heal if they make a mistake.

  • introduces a "goBack" action to use page.back() models want to use this to recover from mistakes

  • moves the actions that the model has taken into "assistant" role message, this is more in line with expected chat interaction behaviour. IMO this helps the models understand that they took the actions and reduces repetition of the same action.

  • adds a new return method for the model to return data to the caller. This solves a problem where if you ask Stagehand to return some information it has no way to do it.

  • made llm completion support an array of messages rather than just grabbing the first one. This ensures response information from models is not lost

  • adds handling for case where the model tool call includes a null element

  • adds "commentary" to the steps carried out. This includes non-tool-call content from the model allowing it to keep its side of the conversation going or pass forward thinking tokens

Copy link

changeset-bot bot commented Feb 18, 2025

⚠️ No Changeset found

Latest commit: e9b4ea3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Comment on lines +700 to +702
default: {
throw exhaustiveMatchingGuard(action);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes this a type safe switch statement. Every defined action must be implemented or this will not compile

Comment on lines +5 to +12
export const actMethods = [
"scrollIntoView",
"press",
"click",
"fill",
"type",
"goBack",
] as const;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this defines the additional actions so the LLM can know what they are

lib/prompt.ts Outdated
@@ -16,11 +17,13 @@ You will receive:


## Your Goal / Specification
You have 2 tools that you can call: doAction, and skipSection. Do action only performs Playwright actions. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal.
You have 3 tools that you can call: doAction, returnPlan, returnResult, and skipSection. Do action performs Playwright actions and can record information found so far. Return Plan is used to return a plan for multiple steps. Return Result is used to return either an intermediate result or a final result. Skip Section is used to skip a section of the page. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal. You can only make one tool call per step, you will get the result of the tool call in the next step.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a couple of changes in here. A very important one is

You can only make one tool call per step, you will get the result of the tool call in the next step.

@eastlondoner eastlondoner force-pushed the groq-demo branch 2 times, most recently from 8d5e352 to 49609b0 Compare February 18, 2025 16:43
- Add Groq-specific error classes for better error classification: GroqError as base, GroqAPIError for API errors, GroqAuthenticationError (401), GroqRateLimitError (429), GroqTimeoutError, GroqConnectionError, and GroqValidationError

- Improve error handling in createChatCompletion with proper error mapping, retry logic, comprehensive logging, and fixed cache-related scoping

- Update TODO.md to reflect completed error handling tasks

test: Add initial Groq client integration tests

- Add basic client initialization and chat completion tests

- Create comprehensive test plan for Groq client

- Add GROQ_API_KEY to .env.example

- Set up test infrastructure with logging and environment handling

test: Add initial Groq client integration tests

- Add basic client initialization and chat completion tests

- Create comprehensive test plan for Groq client

- Add GROQ_API_KEY to .env.example

- Set up test infrastructure with logging and environment handling

add groq

groq vibes well

move groq client to external
lib/inference.ts Outdated
Comment on lines 132 to 139
if (toolCalls[0].function.name === "returnPlan") {
const { plan } = JSON.parse(toolCalls[0].function.arguments);
return {
result: plan,
completed: false,
commentary: "This is my plan for the next steps",
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an experiment but didn't make a meaningful difference

Comment on lines 230 to 234
console.log("sending messages to anthropic", {
anthropicTools: JSON.stringify(anthropicTools),
systemMessage: JSON.stringify(systemMessage),
formattedMessages: JSON.stringify(formattedMessages),
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines -268 to +276
response.content.find((c) => c.type === "text")?.text || null,
response.content
.filter((c) => c.type === "text")
?.map((c) => c.text)
.join("\n") || null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was throwing away response content because it only returns the first instance of type text. Some models return multiple response elements

Comment on lines 344 to 345
console.log("openai response", JSON.stringify(response, null, 2));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove log

lib/prompt.ts Outdated
Comment on lines 258 to 269
{
type: "function",
name: "returnPlan",
description: "return a plan for multiple steps",
parameters: {
type: "object",
required: ["plan"],
properties: {
plan: {
type: "string",
description: "The plan for the next steps",
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove returnPlan it didn't achieve much

If the user's goal will be accomplished after running the playwright action, set completed to true. Better to have completed set to true if your are not sure.

Note 1: If there is a popup on the page for cookies or advertising that has nothing to do with the goal, try to close it first before proceeding. As this can block the goal from being completed.
Note 2: Sometimes what your are looking for is hidden behind and element you need to interact with. For example, sliders, buttons, etc...
Note 3: Avoid repeating actions that have already been taken, try something different.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to discourage infinite retries

@eastlondoner eastlondoner marked this pull request as ready for review February 18, 2025 17:28
Comment on lines +1358 to +1364
// If elementId is null, use the document body as the root element
let xpaths: string[];
if (elementId === null) {
xpaths = ["//body"];
} else {
xpaths = selectorMap[elementId] ?? [];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle the case where the tool call sets the elementId to null

@@ -1235,18 +1312,68 @@ export class StagehandActHandler {
}
}

if ("result" in response) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new response type implementation

Comment on lines +1441 to +1443
if (method !== "goBack") {
throw new Error("None of the provided XPaths could be located.");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goBack does not require an element

Comment on lines +1473 to +1478
(response.commentary && response.commentary?.toUpperCase() !== "NULL"
? ` Commentary: ${response.commentary}\n`
: "") +
(response.findings && response.findings?.toUpperCase() !== "NULL"
? ` Findings: ${response.findings}\n`
: "");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added commentary and findings for the model to communicate more information to its future self

return {
result,
completed,
commentary: modelCommentary,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelCommentary collects additional (non tool call) messages that the model records, these are often helpful to the model in future. particularly with a thinking model like r1 if this contains thinking tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant