Skip to content

Commit

Permalink
Port usage guides
Browse files Browse the repository at this point in the history
  • Loading branch information
mavam committed Mar 4, 2025
1 parent 67593f5 commit cc1165e
Show file tree
Hide file tree
Showing 23 changed files with 1,512 additions and 6 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ can find a preview of the new documentation at
- [x] Tutorials
- [ ] Guides
- [x] Installation
- [ ] Usage
- [x] Usage
- [ ] Contribution
- [ ] Development
- [ ] Reference
Expand Down
45 changes: 45 additions & 0 deletions src/content/docs/guides/usage/basics/collect-metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Collect metrics
---

Tenzir keeps track of metrics about node resource usage, pipeline state, and
runtime performance.

Metrics are stored as internal events in the node's storage engine, allowing you
to work with metrics just like regular data. Use the
[`metrics`](../tql2/operators/metrics.md) input operator to access the metrics.
The operator documentation lists [all available
metrics](../tql2/operators/metrics#schemas) in detail.

The `metrics` operator provides a *copy* of existing metrics. You can use it
multiple time to reference the same metrics feed.

## Write metrics to a file

Export metrics continuously to a file via `metrics --live`:

```tql
metrics live=true
write_ndjson
save_file "metrics.json", append=true
```

This attaches to incoming metrics feed, renders them as NDJSON, and then writes
the output to a file. Without the `live` option, the `metrics` operator returns
the snapshot of all historical metrics.

## Summarize metrics

You can [shape](../usage/shape-data/README.md) metrics like ordinary data,
e.g., write aggregations over metrics to compute runtime statistics suitable for
reporting or dashboarding:

```tql
metrics "operator"
where sink == true
summarize runtime=sum(duration), pipeline_id
sort -runtime
```

The above example computes the total runtime over all pipelines grouped by their
unique ID.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
title: Install a package
---

A [package](../../explanations/packages) bundles pipelines and contexts, making
it easy to deploy them as a single unit.
A [package](../../../explanations/packages) bundles pipelines and contexts,
making it easy to deploy them as a single unit.

## Install from the Tenzir Library

Expand All @@ -26,6 +26,14 @@ import { Steps } from '@astrojs/starlight/components';
To install a package interactively, use the
[`package::add`](../tql2/operators/package/add.md) operator:

```tql
package::add "demo-node"
```

This installs the package named `demo-node` from the [Community
Library on GitHub](https://github.com/tenzir/library). To install a local
package, just provide the filename instead:

```tql
package::add "package.yaml"
```
Expand Down Expand Up @@ -81,7 +89,7 @@ convention, the directory name is the package ID.
The node search path for packages consists of the following locations:

1. The `packages` directory in all [configuration
directories](../configuration.md#configuration-files).
directories](../../../explanations/configuration#configuration-files).
2. All directories specified in the `tenzir.package-dirs` configuration option.

As an alternative way to specify inputs visually in the app, or setting them
Expand All @@ -93,4 +101,4 @@ next to the `package.yaml` file. Here is an example that sets the inputs
inputs:
endpoint: localhost:42000
policy: block
```
```
44 changes: 44 additions & 0 deletions src/content/docs/guides/usage/basics/manage-a-pipeline.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: Manage a pipeline
---

A pipeline can be in one of the following **states** after you [run
it](../run-pipelines):

- **Created**: the pipeline has just been deployed.
- **Running**: the pipeline is actively processing data.
- **Completed**: there is no more data to process.
- **Failed**: an error occurred.
- **Paused**: the user interrupted execution, keeping in-memory state.
- **Stopped**: the user interrupted execution, resetting all in-memory state.

The [app](https://app.tenzir.com/) or [API](/api) allow you to manage the
pipeline lifecycles.

## Change the state of a pipeline

In the [app](https://app.tenzir.com/overview), an icon visualizes the current
pipeline state. Change a state as follows:

1. Click the checkbox on the left next to the pipeline, or the checkbox in the
column header to select all pipelines.
2. Click the button corresponding to the desired action, i.e., *Start*, *Pause*,
*Stop*, or *Delete*.
3. Confirm your selection.

For the [API](/api), use the following endpoints based on the desired actions:

- Start, pause, and stop:
[`/pipeline/update`](/api#/paths/~1pipeline~1update/post)
- Delete: [`/pipeline/delete`](/api#/paths/~1pipeline~1delete/post)

## Understand pipeline state transitions

The diagram below illustrates the various states, where circles correspond to
states and arrows to state transitions:

![Pipeline States](manage-a-pipeline/pipeline-states.svg)

The grey buttons indicate the actions you, as a user, can take to transition
into a different state. The orange arrows are transitions that take place
automatically based on system events.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 132 additions & 0 deletions src/content/docs/guides/usage/basics/run-pipelines.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: Run pipelines
---

You can run a [pipeline](../../../explanations/architecture/pipeline) in the
app, on the command line using the `tenzir` binary, or configure it to run as
code.

## In the app

Run a pipeline by writing typing it in the editor and hitting the *Run* button.

The following invariants apply:

1. You must start with an input operator
2. The browser is always the output operator

The diagram below illustrates these mechanics:

![Pipeline in the Browser](run-pipelines/pipeline-browser.svg)

For example, write [`version`](../../tql2/operators/version.md) and click *Run*
to see a single event arrive.

## On the command line

On the command line, run `tenzir <pipeline>` where `<pipeline>` is the
definition of the pipeline.

If the pipeline expects events as its input, an implicit `load_stdin |
read_json` will be prepended. If it expects bytes instead, only `load_stdin` is
prepended. Likewise, if the pipeline outputs events, an implicit `write_json |
save_stdout` will be appended. If it outputs bytes instead, only `save_stdout`
is appended.

The diagram below illustrates these mechanics:

![Pipeline on the command line](run-pipelines/pipeline-cli.svg)

For example, run [`tenzir 'version | drop
dependencies'`](../../tql2/operators/version.md) to see a single event in the
terminal:

```tql
{
version: "4.22.1+g324214e6de",
tag: "g324214e6de",
major: 4,
minor: 22,
patch: 1,
features: [],
build: {
type: "Release",
tree_hash: "c4c37acb5f9dc1ce3806f40bbde17a08",
assertions: false,
sanitizers: {
address: false,
undefined_behavior: false,
},
},
}
```

You could also render the output differently by choosing a different format:

```sh
tenzir 'version | drop dependencies | write_csv'
tenzir 'version | drop dependencies | write_ssv'
tenzir 'version | drop dependencies | write_parquet | save_file "version.parquet'
```

Instead of passing the pipeline description to the `tenzir` executable, you can
also load the definition from a file via `-f`:

```sh
tenzir -f pipeline.tql
```

This will interpret the file contents as pipeline and run it.

## As Code

In addition to running pipelines interactively, you can also deploy *pipelines as
code (PaC)*. This infrastructure-as-code-like method differs from the app-based
deployment in two ways:

1. Pipelines deployed as code always start with the Tenzir node, ensuring
continuous operation.
2. To safeguard them, deletion via the user interface is disallowed.

Here's a an example of deploying a pipeline through your configuration:

```yaml title="<prefix>/etc/tenzir/tenzir.yaml"
tenzir:
pipelines:
# A unique identifier for the pipeline that's used for metrics, diagnostics,
# and API calls interacting with the pipeline.
suricata-over-tcp:
# An optional user-facing name for the pipeline. Defaults to the id.
name: Onboard Suricata from TCP
# An optional user-facing description of the pipeline.
description: |
Onboards Suricata EVE JSON from TCP port 34343.
# The definition of the pipeline. Configured pipelines that fail to start
# cause the node to fail to start.
definition: |
load_tcp "0.0.0.0:34343"
read_suricata
publish "suricata"
# Pipelines that encounter an error stop running and show an error state.
# This option causes pipelines to automatically restart when they
# encounter an error instead. The first restart happens immediately, and
# subsequent restarts after the configured delay, defaulting to 1 minute.
# The following values are valid for this option:
# - Omit the option, or set it to null or false to disable.
# - Set the option to true to enable with the default delay of 1 minute.
# - Set the option to a valid duration to enable with a custom delay.
restart-on-error: 1 minute
# Add a list of labels that are shown in the pipeline overview page at
# app.tenzir.com.
labels:
- Suricata
- Onboarding
# Disable the pipeline.
disabled: false
# Pipelines that are unstoppable will run automatically and indefinitely.
# They are not able to pause or stop.
# If they do complete, they will end up in a failed state.
# If `restart-on-error` is enabled, they will restart after the specified
# duration.
unstoppable: true
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 77 additions & 0 deletions src/content/docs/guides/usage/data/deduplicate-events.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: Deduplicate events
---

The [`deduplicate`](../tql2/operators/deduplicate.md) provides is a powerful
mechanism to remove duplicate events in a pipeline.

There are numerous use cases for deduplication, such as reducing noise,
optimizing costs and make threat detection and response more efficent. Read our
[blog post](/archive/reduce-cost-and-noise-with-deduplication) for high-level
discussion.

## Analyze unique host pairs

Let's say you're investigating an incident and would like get a better of
picture of what entities are involved in the communication. To this end, you
would like to extract all unique host pairs to identify who communicated with
whom.

Here's how this looks like with Zeek data:

```tql
export
where @schema == "zeek.conn"
deduplicate {orig_h: id.orig_h, resp_h: id.resp_h}
```

Providing `id.orig_h` and `id.resp_h` to the operator restricts the output to
all unique host pairs. Note that flipped connections occur twice here, i.e., A →
B as well as B → A are present.

## Remove duplicate alerts

Are you're overloaded with alerts, like every analyst? Let's remove some noise
from our alerts.

First, let's check what our alert dataset looks like:

```tql
export
where @schema == "suricata.alert"
top alert.signature
head 5
```

```tql
{
alert.signature: "ET MALWARE Cobalt Strike Beacon Observed",
count: 117369,
}
{
alert.signature: "SURICATA STREAM ESTABLISHED packet out of window",
count: 103198,
}
{
alert.signature: "SURICATA STREAM Packet with invalid ack",
count: 21960,
}
{
alert.signature: "SURICATA STREAM ESTABLISHED invalid ack",
count: 21920,
}
{
alert.signature: "ET JA3 Hash - [Abuse.ch] Possible Dridex",
count: 16870,
}
```

Hundreds of thousands of alerts! Maybe I'm just interested in one per hour per
host affected host pair? Here's the pipeline for this:

```tql
from "/tmp/eve.json", follow=true
where @schema == "suricata.alert"
deduplicate {src: src_ip, dst: dest_ip, sig: alert.signature}, timeout=1h
import
```
Loading

0 comments on commit cc1165e

Please sign in to comment.