-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a26267b
Showing
26 changed files
with
1,760 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
name: CI | ||
|
||
on: | ||
push: | ||
branches: [main] | ||
pull_request: | ||
release: | ||
types: [published] | ||
|
||
jobs: | ||
lint: | ||
name: lint | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Set up Go | ||
uses: actions/setup-go@v4 | ||
with: | ||
go-version: '1.22.3' | ||
- name: Check out code | ||
uses: actions/checkout@v3 | ||
- name: Check formatting | ||
run: | | ||
test -z $(gofmt -l .) | ||
build-and-test: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Setup Go | ||
uses: actions/setup-go@v5 | ||
with: | ||
go-version: '1.22.3' | ||
- name: Install dependencies | ||
run: go get ./src | ||
- name: Build | ||
run: go build -o ./dist/bin ./src | ||
- name: Test with the Go CLI | ||
run: go test ./src | ||
|
||
build-and-publish-image: | ||
runs-on: ubuntu-latest | ||
needs: | ||
- lint | ||
- build-and-test | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Set up QEMU | ||
uses: docker/setup-qemu-action@v3 | ||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
- name: Tag the image | ||
id: meta | ||
uses: docker/metadata-action@v4 | ||
with: | ||
images: | | ||
bitovi/temporal-cloud-metrics-to-k8s | ||
tags: | | ||
type=raw,value=latest,enable=${{ github.ref_name == 'main' }} | ||
type=semver,pattern={{version}},enable=${{ github.event_name == 'release' }} | ||
- | ||
name: Login to Docker Hub | ||
uses: docker/login-action@v2 | ||
if: github.event_name != 'pull_request' | ||
with: | ||
username: ${{ secrets.DOCKERHUB_USERNAME }} | ||
password: ${{ secrets.DOCKERHUB_PASSWORD }} | ||
- | ||
name: Build Docker image | ||
uses: docker/build-push-action@v4 | ||
with: | ||
context: . | ||
platforms: linux/amd64,linux/arm64 | ||
tags: ${{ steps.meta.outputs.tags }} | ||
- | ||
name: Push Docker image | ||
uses: docker/build-push-action@v4 | ||
if: ${{ (github.ref_name == 'main') || (github.event_name == 'release') }} | ||
with: | ||
context: . | ||
platforms: linux/amd64,linux/arm64 | ||
tags: ${{ steps.meta.outputs.tags }} | ||
push: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
certs | ||
config.yaml | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
FROM golang:1.22.3 | ||
|
||
WORKDIR /app | ||
|
||
COPY go.mod go.sum ./ | ||
|
||
RUN go mod download | ||
|
||
COPY src/*.go ./ | ||
|
||
RUN CGO_ENABLED=0 GOOS=linux go build -o ./temporal-cloud-metrics-adapter | ||
|
||
CMD ["./temporal-cloud-metrics-adapter"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2024 Bitovi | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
# Temporal Cloud Metrics to Kubernetes | ||
|
||
Bring Temporal Cloud Metrics into your Kubernetes cluster to inform autoscaling of your workers. | ||
|
||
 | ||
|
||
## Setup | ||
|
||
### Prerequisites | ||
|
||
1. A [Temporal Cloud account](https://temporal.io/) | ||
- [An mTLS certificate provisioned](https://docs.temporal.io/cloud/certificates) | ||
- [The metrics endpoint enabled](https://docs.temporal.io/production-deployment/cloud/metrics/general-setup) | ||
2. A [Kubernetes](https://kubernetes.io/) compliant cluster (also tested on [K3s](https://k3s.io/) and [minikube](https://minikube.sigs.k8s.io/)) | ||
3. The [Helm](https://helm.sh/docs/intro/install/) CLI | ||
|
||
### Step 1: Copy mTLS Certificate | ||
|
||
We need the client mTLS certificate for our Temporal Cloud namespace so that we can load it into our cluster for use in the metrics adapter and worker. | ||
|
||
1. Copy the certificate into `./certs/client.crt` | ||
2. Copy the key into `./certs/client.key` | ||
|
||
### Step 2: Configuration | ||
|
||
A YAML config file is used to define the connection parameters and the specific metrics you'd like to pull into Kubernetes from Temporal Cloud. | ||
|
||
There is an example configuration in [`./sample-config.yaml`](./sample-config.yaml). Copy it to `config.yaml` and and make your changes to it. The Helm chart will use this path by default. | ||
|
||
__Considerations__ | ||
|
||
Autoscaling in Kubernetes is triggered when a target metric value increases beyond a designated threshold, such as CPU usage, memory usage, or request count. Therefore, it is important that the metrics we calculate are positive numbers that increase when the system is under some kind of stress. | ||
|
||
The queries in the included example configuration were derived from queries associated with Temporal best practices, but they have been modified to align with these requirements. Let's see an example. | ||
|
||
__Before__ | ||
|
||
``` | ||
sum by(temporal_namespace) ( | ||
rate( | ||
temporal_cloud_v0_poll_success_sync_count{}[1m] | ||
) | ||
) | ||
- | ||
sum by(temporal_namespace) ( | ||
rate( | ||
temporal_cloud_v0_poll_success_count{}[1m] | ||
) | ||
) | ||
``` | ||
|
||
__After__ | ||
|
||
We've made two important changes here: (1) we've swapped the places of the two underlying metrics to invert the resulting number so it will now be positive and increase as the Sync Match Rate falls, (2) use clamp_min to set a lower bound of zero, and (3) we default the resulting value to zero in the event no data points are available within the specified time window. | ||
|
||
``` | ||
sum( | ||
clamp_min( | ||
( | ||
sum by(temporal_namespace) ( | ||
rate( | ||
temporal_cloud_v0_poll_success_count{}[1m] | ||
) | ||
) | ||
- | ||
sum by(temporal_namespace) ( | ||
rate( | ||
temporal_cloud_v0_poll_success_sync_count{}[1m] | ||
) | ||
) | ||
), | ||
0 | ||
) | ||
) or vector(0) | ||
``` | ||
|
||
### Step 3: HPA | ||
|
||
The HPA (Horizontal Pod Autoscaler) defines the desired scaling behavior and bounds, and manages our deployment replicas accordingly. | ||
|
||
There is a complete example HPA in [`./chart/templates/hpa.yaml`](./chart/templates/hpa.yaml). You may use it as it or adjust it to fit your needs before installing the helm chart. | ||
|
||
### Step 4: Install | ||
|
||
__Install with Existing worker__ | ||
|
||
This allows you to setup autoscaling on an existing deployment. | ||
|
||
```bash | ||
helm install temporal-cloud-metrics-adapter ./chart --wait \ | ||
--namespace staging \ | ||
--set-file=temporal.tls.cert=certs/client.crt \ | ||
--set-file=temporal.tls.key=certs/client.key \ | ||
--set-file=adapter.config=config.yaml \ | ||
--set temporal.namespace=xyz.123 \ | ||
--set worker.deployment=temporal-workers | ||
``` | ||
|
||
__Install with Demo worker__ | ||
|
||
This is intended for testing and demos and should never been used in a production environment. | ||
|
||
```bash | ||
helm install temporal-cloud-metrics-adapter ./chart --wait \ | ||
--namespace staging --create-namespace \ | ||
--set-file=temporal.tls.cert=certs/client.crt \ | ||
--set-file=temporal.tls.key=certs/client.key \ | ||
--set-file=adapter.config=config.yaml \ | ||
--set temporal.namespace=xyz.123 \ | ||
--set temporal.address=xyz.123.tmprl.cloud:7233 \ | ||
--set worker.demo=true | ||
``` | ||
|
||
__Uninstall__ | ||
|
||
```bash | ||
helm uninstall -n staging temporal-cloud-metrics-adapter | ||
``` | ||
|
||
__Helm Values__ | ||
|
||
| Option | Type | Example Value | Description | | ||
|---------------------------|---------|--------------------------------------|-----------------------------------------------------| | ||
| temporal.tls.cert | File | `certs/client.crt` | Path to the client certificate file | | ||
| temporal.tls.key | File | `certs/client.key` | Path to the client key file | | ||
| temporal.namespace | String | `xyz.123` | The target Temporal Cloud namespace | | ||
| temporal.address | String | `xyz.123.tmprl.cloud:7233` | Address of the Temporal Cloud instance | | ||
| adapter.config | String | `./config.yaml` | The file path for the configuration for the adapter | | ||
| worker.deployment | String | `temporal-worker` | Name of existing Temporal worker deployment | | ||
| worker.demo | Boolean | `true` or `false` | Flag to determine whether to deploy a demo worker | | ||
|
||
### Demo | ||
|
||
This repo includes a script to create a burst of workflows to simulate load. | ||
|
||
```bash | ||
# Startup 50 demo workflows | ||
TEMPORAL_ADDRESS=xyz.123.tmprl.cloud:7233 \ | ||
TEMPORAL_NAMESPACE=xyz.123 \ | ||
./scripts/execute-demo-workflows 50 | ||
``` | ||
|
||
## Metric Granularity | ||
|
||
Temporal Cloud metrics do not include labels that indicate which Workflow they are associated with. Depending on your architecture, you might need to divide your workers across unique namespaces to obtain metrics for specific Workflows. | ||
|
||
## Tuning Scaling Behavior | ||
|
||
__HPA Polling Interval__ | ||
|
||
By default, the `HorizontalPodAutoscaler` fetches metrics every 15 seconds. This can be configured by setting the `--horizontal-pod-autoscaler-sync-period` on the [kube controller](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/). | ||
|
||
_Note: The `--horizontal-pod-autoscaler-sync-period` is not currently supported in K3s._ | ||
|
||
__Adjust Metrics Time Window__ | ||
|
||
You can also adjust the timescale used in the query for the Temporal Cloud metrics. To do this, change the time window specified in the queries in the [adapter configuration file](./chart/templates/configuration.yaml). | ||
|
||
Currently, the time window is set to `1m` (1 minute). This can be reduced to slightly improve the responsiveness of the scaling behavior. Be cautious about going below `45s` (45 seconds) for systems with relatively low throughput, as it can result in dead zones in the resulting metrics. | ||
|
||
__Adjust HPA Behavior__ | ||
|
||
You can adjust the how quickly the cluster scales up and down our workers. | ||
|
||
```yaml | ||
metrics: | ||
- type: External | ||
external: | ||
metric: | ||
# The name of the metrics to watch | ||
name: temporal_cloud_sync_match_rate | ||
selector: | ||
matchLabels: | ||
# Match a particular Temporal Cloud namespace | ||
temporal_namespace: xyz.123 | ||
target: | ||
type: Value | ||
# Scale up when the target metric exceeds 50 milli values (0.05) | ||
value: 50m | ||
behavior: | ||
scaleUp: | ||
# The highest value in the last 10 seconds will be used to determine the need to scale up | ||
stabilizationWindowSeconds: 10 | ||
selectPolicy: Max | ||
policies: | ||
# Scale up by 5 pods every 10 seconds whole the target threshold is exceeded | ||
- type: Pods | ||
value: 5 | ||
periodSeconds: 10 | ||
scaleDown: | ||
# The highest value in the last 60 seconds will be used to determine the need to scale down | ||
stabilizationWindowSeconds: 60 | ||
selectPolicy: Max | ||
policies: | ||
# Scale up by 5 pods every 10 seconds whole the target threshold is achieved | ||
- type: Pods | ||
value: 3 | ||
periodSeconds: 30 | ||
``` | ||
You can find a complete example in this [manifest](./chart/templates/hpa.yaml). For more detailed information on the HorizontalPodAutoscaler, refer to the official [HPA documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/). | ||
## Scaling to Zero | ||
In some use cases, you might want your application to scale completely down to zero. This can be achieved by configuring the [`HorizontalPodAutoscaler`](./chart/templates/hpa.yaml). | ||
|
||
To scale to zero, set `minReplicas` to `0`. The cluster will then scale down to zero when the targeted metrics fall below the defined threshold. | ||
|
||
_Note: Scaling to zero may cause a delay in processing new tasks, as it can take time for metrics to propagate to the cluster._ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
apiVersion: v2 | ||
name: temporal-cloud-metrics-to-k8s | ||
description: A Helm chart to enable access to metrics from Temporal Cloud within your cluster. | ||
|
||
type: application | ||
|
||
version: 0.1.0 | ||
|
||
appVersion: "0.1.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
apiVersion: apiregistration.k8s.io/v1 | ||
kind: APIService | ||
metadata: | ||
name: v1beta1.external.metrics.k8s.io | ||
spec: | ||
service: | ||
name: temporal-cloud-metrics-adapter | ||
namespace: {{ .Release.Namespace }} | ||
group: external.metrics.k8s.io | ||
version: v1beta1 | ||
insecureSkipTLSVerify: true | ||
groupPriorityMinimum: 100 | ||
versionPriority: 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: adapter-configuration | ||
namespace: {{ .Release.Namespace }} | ||
data: | ||
config.yaml: | | ||
{{ .Values.adapter.config | nindent 4 }} |
Oops, something went wrong.