A web application that allows users to input prompts, which are then used to run browser-use automations. The application displays the automation steps in a chat panel in real-time.
For detailed architecture information, including data flow diagrams and dependency chains, see ARCHITECTURE.md.
- Next.js with React and TypeScript
- Shadcn UI components
- Real-time updates with WebSockets
- Node.js WebSocket server
- Python automation service
- Integration with browser-use library
- Node.js 18+
- Python 3.9+
- uv (optional, for faster Python package installation)
- Clone the repository
git clone https://github.com/yourusername/browser-use-demo.git
cd browser-use-demo
- Install Node.js dependencies
npm install
- Install Python package
# Using uv (recommended)
pip install uv
cd python_agent
uv pip install -e .
cd ..
# Or using pip
cd python_agent
pip install -e .
cd ..
- Create a
.env
file based on.env.example
cp .env.example .env
- Update the
.env
file with your configuration
PORT=3001
OPENAI_API_KEY=your_openai_api_key_if_using_openai
- Start the FastAPI server
npm run server
- In a separate terminal, start the frontend development server
npm run dev
- Open your browser and navigate to
http://localhost:3000
├── python_agent/ # Python package for browser automation
│ ├── browser_automation_agent/
│ │ ├── __init__.py
│ │ ├── agent.py # Main agent implementation
│ │ ├── cli.py # Command-line interface
│ │ ├── server.py # FastAPI server
│ │ ├── models.py # Pydantic models
│ │ └── logger.py # Logging utilities
│ ├── setup.py # Package setup file
│ ├── pyproject.toml # Python project configuration
│ └── requirements.txt # Python dependencies
├── src/
│ ├── app/ # Next.js app directory
│ ├── components/ # React components
│ └── lib/ # Utility functions and libraries
├── public/ # Static assets
└── .env # Environment variables
The application supports various configuration options that can be set from the frontend:
- Ollama (local models)
- OpenAI (cloud models)
- AWS Bedrock (cloud models)
- For Ollama: llama3.2, llama3, llama2
- For OpenAI: gpt-4o, gpt-4-turbo, gpt-4
- For AWS Bedrock: Claude 3 (Sonnet, Haiku, Opus), Llama 3 70B
- Vision capabilities (on/off)
- Headless mode (on/off)
- Browser type (chromium, firefox, webkit)
npm run server:debug
# Format Python code
cd python_agent
ruff format .
# Lint Python code
ruff check .
# Lint TypeScript/JavaScript code
npm run lint
This project uses conventional commits. Please follow the commit convention.
When running the FastAPI server, API documentation is available at:
- Swagger UI: http://localhost:3001/docs
- ReDoc: http://localhost:3001/redoc
The WebSocket API allows real-time communication between the client and server:
prompt:submit
: Submit a prompt for automationautomation:stop
: Stop the current automation
automation:log
: Log message from the automationautomation:complete
: Automation completedautomation:error
: Error in automation