Skip to content

jillesca/oncall-netops-tig-pyats-demo

Repository files navigation

OnCall NetOps + TIG + pyATS Demo 🚀

This Proof of Concept (PoC) demonstrates how a group of agents can work together to resolve a network issue, specifically an ISIS adjacency issue.

The TIG (Telegraf, InfluxDB, Grafana) stack monitors devices and sends an alert to Langgraph whenever an ISIS neighbor is lost. This alert triggers the agents to work, and you can review the summary on Langgraph Studio to decide the next steps.

You can watch the demo in action (about 7 minutes, no sound).

Demo architecture

Components

The demo is split into three separate repositories:

  • OnCall-NetOps: GitHub repo.
    • Graph of AI agents.
  • pyATS Server: GitHub repo.
    • Used by AI agents to interact with network devices.
  • Observability Stack: GitHub repo.
    • Monitors network devices and trigger alarms.

Graph 🤖

When the graph receives a request, the node_coordinator validates the info and passes it to the node_orchestrator, which decides which network agents to call. Each agent connects to devices, gathers data, and returns a report. When all agents finish, their reports go to the node_root_cause_analyzer, which determines the root cause. If more details are needed, it requests them from the node_orchestrator. Otherwise, it sends the final findings to the node_report_generator.

Network agents:

  • agent_isis: Retrieves ISIS info.
  • agent_routing: Retrieves routing info.
  • agent_log_analyzer: Checks logs.
  • agent_device_health: Retrieves device health.
  • agent_network_interface: Retrieves interfaces/config.
  • agent_interface_actions: Performs interface actions.

Graph of agents

Requirements ⚠️

  • Python 3.11 (Only for the Langgraph Studio Desktop version).
  • Docker >=1.27
  • Make
  • OpenAI Key
  • Langsmith Key: Create a token and copy the Langsmith environment variables.
  • CML: Import and start the topology.

Create an .env file in the root directory and set your keys there.

Example .env file
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY=<langsmith_token>
LANGSMITH_PROJECT="oncall-netops"
OPENAI_API_KEY=<openai_token>

Build ⚙️

  1. Import the remote repositories used as git submodules:

    make build-repos
  2. Build the TIG stack, pyATS server, and webhook proxy. You can deploy each component separately (refer to their respective repositories for more info).

    make build-demo

Note

If any required environment variable is missing, the make target will fail and print which environment variable is missing.

Run Langgraph

There are two options to run the graph:

  1. Langgraph Server CLI: Run the server in the terminal without a container. You can use the web version of Langgraph Studio (a bit slower).
  2. Langgraph Studio Desktop: Desktop version (only for Mac).

Review Environment Variables

  • PYATS_API_SERVER: This variable connects the Langgraph API server to the pyATS server. It defaults to http://host.docker.internal:57000. Note that the demo assumes you're running the Langgraph server in a container, so adjust this value (see .env.example) if needed. Default port for the pyATS server is 57000.
  • LANGGRAPH_API_HOST: Links the grafana-to-langgraph-proxy with the Langgraph API server. Defaults to http://host.docker.internal:56000, adjust if needed.

If you need to adjust these environment variables, use the table below.

Scenario Variable Value
grafana-to-langgraph-proxy, pyATS Server, and Langgraph API Server on the same host. Langgraph in a container. PYATS_API_SERVER http://host.docker.internal:PORT
LANGGRAPH_API_HOST http://host.docker.internal:PORT
grafana-to-langgraph-proxy, pyATS Server, & Langgraph API Server on different hosts or not in containers PYATS_API_SERVER http://<HOST_IP:PORT>
LANGGRAPH_API_HOST http://<HOST_IP:PORT>

See the .env.example file for the rest of the environment variables used. These are set by the Makefile.

Option 1: Langgraph Server CLI 💻

Install the dependencies listed in the requirements file, using a virtual environment if possible.

Start the server with:

make run-environment
Example output
❯ make run-environment
langgraph dev --port 56000
WARNING:langgraph_api.cli:python_dotenv is not installed. Environment variables will not be available.
INFO:langgraph_api.cli:

        Welcome to

╦  ┌─┐┌┐┌┌─┐╔═╗┬─┐┌─┐┌─┐┬ ┬
║  ├─┤││││ ┬║ ╦├┬┘├─┤├─┘├─┤
╩═╝┴ ┴┘└┘└─┘╚═╝┴└─┴ ┴┴  ┴ ┴

- 🚀 API: http://127.0.0.1:56000
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:56000
- 📚 API Docs: http://127.0.0.1:56000/docs

This in-memory server is designed for development and testing.
For production use, please use LangGraph Cloud.

Open the LangGraph Studio URL using Chrome (Firefox doesn't work).

If you have issues with the web version, make sure:

  • You are logged in to Langsmith.
  • Refresh your browser.

If you don't want to use the web version, you can still see the operations in the terminal, but it is hard to follow and interact with due to the amount of output.

Option 2: Langgraph Studio Desktop 🍏

Download the desktop version (only for Mac).

Before you start opening the project, set the target port in the bottom bar. This project uses port 56000. If you set a different one, update the environment variable LANGGRAPH_API_PORT.

Select port

On Langgraph Studio, select this project and open it. This imports the code from this repo and installs everything in a dedicated container.

Select project

Note

Sometimes the build process fails. Restart or retry.

Run

cml topology

There are three devices involved in this demo. They run ISIS between them. You can inspect the topology here.

The use case built in this demo is when an ISIS neighbor is lost. Grafana detects the lost neighbor and sends an automatic alert to the graph. You can replicate the scenario by shutting down an ISIS interface like GigabitEthernet5 on cat8000-v0 of the XE devices and see what happens.

isis neighbor down

The alert triggers a background job in Langgraph Studio. You won't be able to see the graph running in the GUI until it finishes (tool limitation at this point). Inspect the logs if you want to see what is happening.

Trigger via API

Once the graph is finished, you can see the results and interact with the agents. The threads won't autorefresh to show you the output. Switch to another thread and go back to see the results. Use the User Request field to interact with the graph about the alert received.

User interaction

Note

If you’re curious about the other inputs, they’re used by the agents for different tasks. This is the state shared across the agents.

You can also use the graph to interact with the network devices without an alert. If so, use the same User Request field and provide the device hostname: cat8000v-0, cat8000v-1, or cat8000v-2 (a future improvement).

Traces 🔍

Here you can see the traces from one execution of the demo. There you can find state, runs, inputs, and outputs.

  • Graph triggered by an automatic alert: Trace
  • Graph triggered by a user request following up on the alert: Trace

FAQ

A common error with Langgraph Studio is when you restart the server and orphan containers from another Langgraph instance are still running, causing the server to fail.

If you have this problem, see orphan containers with:

docker ps --filter "name=oncall-netops"

Remove them with:

docker ps --filter "name=oncall-netops" --format "{{.ID}}" | xargs docker rm -f

Restart Langgraph Studio.

Additional Resources

About

Langgraph demo with network devices and telemetry

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published