Secure mode for Invisible prompt injection and emoji DoS #525
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: I have NEVER used stagehand before. Also, AI wrote all of this. I am not a ts/js guy so please review it really well. THAT SAID, I'm super proud to add this because y'all are the GOAT of ai-based browsing AND I aspire to be the AI security GOAT so it's cool to add this.
Why
Stagehand processes website content that may contain invisible Unicode characters or emoji variation selectors. These characters can potentially be used in prompt injection attacks or other security exploits against AI systems. By filtering these characters, we can prevent potential security issues while still maintaining the functionality of the application.
What Changed
CharacterFilterConfig
interface to allow fine-grained control over which character ranges to filterTest Plan
The implementation has been tested with several test cases:
npx tsx examples/simple_unicode_test.ts
to verify the basic filtering functionalitynpx tsx examples/stagehand_unicode_test.ts
to test the integration with the Stagehand frameworknpx tsx examples/unicode_filter_test.ts
to test different filtering configurationsThe tests demonstrate that:
All tests pass successfully, confirming that the character filtering system works as expected.