-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for Groq demo #499
base: main
Are you sure you want to change the base?
Conversation
|
ab83f96
to
e82fdf1
Compare
default: { | ||
throw exhaustiveMatchingGuard(action); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes this a type safe switch statement. Every defined action must be implemented or this will not compile
export const actMethods = [ | ||
"scrollIntoView", | ||
"press", | ||
"click", | ||
"fill", | ||
"type", | ||
"goBack", | ||
] as const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this defines the additional actions so the LLM can know what they are
lib/prompt.ts
Outdated
@@ -16,11 +17,13 @@ You will receive: | |||
|
|||
|
|||
## Your Goal / Specification | |||
You have 2 tools that you can call: doAction, and skipSection. Do action only performs Playwright actions. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal. | |||
You have 3 tools that you can call: doAction, returnPlan, returnResult, and skipSection. Do action performs Playwright actions and can record information found so far. Return Plan is used to return a plan for multiple steps. Return Result is used to return either an intermediate result or a final result. Skip Section is used to skip a section of the page. Do exactly what the user's goal is. Do not perform any other actions or exceed the scope of the goal. You can only make one tool call per step, you will get the result of the tool call in the next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a couple of changes in here. A very important one is
You can only make one tool call per step, you will get the result of the tool call in the next step.
8d5e352
to
49609b0
Compare
- Add Groq-specific error classes for better error classification: GroqError as base, GroqAPIError for API errors, GroqAuthenticationError (401), GroqRateLimitError (429), GroqTimeoutError, GroqConnectionError, and GroqValidationError - Improve error handling in createChatCompletion with proper error mapping, retry logic, comprehensive logging, and fixed cache-related scoping - Update TODO.md to reflect completed error handling tasks test: Add initial Groq client integration tests - Add basic client initialization and chat completion tests - Create comprehensive test plan for Groq client - Add GROQ_API_KEY to .env.example - Set up test infrastructure with logging and environment handling test: Add initial Groq client integration tests - Add basic client initialization and chat completion tests - Create comprehensive test plan for Groq client - Add GROQ_API_KEY to .env.example - Set up test infrastructure with logging and environment handling add groq groq vibes well move groq client to external
49609b0
to
22e40e0
Compare
lib/inference.ts
Outdated
if (toolCalls[0].function.name === "returnPlan") { | ||
const { plan } = JSON.parse(toolCalls[0].function.arguments); | ||
return { | ||
result: plan, | ||
completed: false, | ||
commentary: "This is my plan for the next steps", | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was an experiment but didn't make a meaningful difference
lib/llm/AnthropicClient.ts
Outdated
console.log("sending messages to anthropic", { | ||
anthropicTools: JSON.stringify(anthropicTools), | ||
systemMessage: JSON.stringify(systemMessage), | ||
formattedMessages: JSON.stringify(formattedMessages), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
response.content.find((c) => c.type === "text")?.text || null, | ||
response.content | ||
.filter((c) => c.type === "text") | ||
?.map((c) => c.text) | ||
.join("\n") || null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was throwing away response content because it only returns the first instance of type text. Some models return multiple response elements
lib/llm/OpenAIClient.ts
Outdated
console.log("openai response", JSON.stringify(response, null, 2)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove log
lib/prompt.ts
Outdated
{ | ||
type: "function", | ||
name: "returnPlan", | ||
description: "return a plan for multiple steps", | ||
parameters: { | ||
type: "object", | ||
required: ["plan"], | ||
properties: { | ||
plan: { | ||
type: "string", | ||
description: "The plan for the next steps", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove returnPlan it didn't achieve much
If the user's goal will be accomplished after running the playwright action, set completed to true. Better to have completed set to true if your are not sure. | ||
|
||
Note 1: If there is a popup on the page for cookies or advertising that has nothing to do with the goal, try to close it first before proceeding. As this can block the goal from being completed. | ||
Note 2: Sometimes what your are looking for is hidden behind and element you need to interact with. For example, sliders, buttons, etc... | ||
Note 3: Avoid repeating actions that have already been taken, try something different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to discourage infinite retries
// If elementId is null, use the document body as the root element | ||
let xpaths: string[]; | ||
if (elementId === null) { | ||
xpaths = ["//body"]; | ||
} else { | ||
xpaths = selectorMap[elementId] ?? []; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the case where the tool call sets the elementId to null
@@ -1235,18 +1312,68 @@ export class StagehandActHandler { | |||
} | |||
} | |||
|
|||
if ("result" in response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new response type implementation
if (method !== "goBack") { | ||
throw new Error("None of the provided XPaths could be located."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goBack does not require an element
(response.commentary && response.commentary?.toUpperCase() !== "NULL" | ||
? ` Commentary: ${response.commentary}\n` | ||
: "") + | ||
(response.findings && response.findings?.toUpperCase() !== "NULL" | ||
? ` Findings: ${response.findings}\n` | ||
: ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added commentary and findings for the model to communicate more information to its future self
return { | ||
result, | ||
completed, | ||
commentary: modelCommentary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modelCommentary collects additional (non tool call) messages that the model records, these are often helpful to the model in future. particularly with a thinking model like r1 if this contains thinking tokens
I wanted to use stagehand with Groq Qwen-2.5-32b - this is a model that's not as smart as Sonnet and it did not work initially. It ran into various problems, changes that I made to get this working are all included in this PR
These improvements make stagehand work better with "dumber" models like Qwen-2.5
Add "You can only make one tool call per step, you will get the result of the tool call in the next step." to the act system prompt. This stops models from sending multiple tool calls (because stagehand only runs the first tool call) this reduces model confusion and improves speed and cost (fewer return tokens needed).
Added a note to discourage trying the same actions repeatedly in the act system prompt
Define the main available actions and communicate them to the LLM. This makes things much clearer for the LLM and helps the LLM not to 'hallucinate' tool calls.
If the LLM attempts an unavailable action, return to it an error explaining what the available actions are (including actions available on the chosen element) this helps models to self-heal if they make a mistake.
introduces a "goBack" action to use page.back() models want to use this to recover from mistakes
moves the actions that the model has taken into "assistant" role message, this is more in line with expected chat interaction behaviour. IMO this helps the models understand that they took the actions and reduces repetition of the same action.
adds a new return method for the model to return data to the caller. This solves a problem where if you ask Stagehand to return some information it has no way to do it.
made llm completion support an array of messages rather than just grabbing the first one. This ensures response information from models is not lost
adds handling for case where the model tool call includes a null element
adds "commentary" to the steps carried out. This includes non-tool-call content from the model allowing it to keep its side of the conversation going or pass forward thinking tokens