Skip to content

Commit

Permalink
Move kettle from k8s-gubernator to kubernetes-public
Browse files Browse the repository at this point in the history
Signed-off-by: Davanum Srinivas <[email protected]>
  • Loading branch information
dims committed Apr 16, 2024
1 parent dcdd96d commit 2fcc9ce
Show file tree
Hide file tree
Showing 18 changed files with 83 additions and 125 deletions.
27 changes: 27 additions & 0 deletions config/jobs/kubernetes/sig-k8s-infra/trusted/sig-test-infra.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
periodics:
- name: metrics-kettle
cluster: k8s-infra-prow-build-trusted
interval: 1h
decorate: true
extra_refs:
- org: kubernetes
repo: test-infra
base_ref: master
spec:
serviceAccountName: k8s-triage
containers:
- image: gcr.io/k8s-staging-test-infra/bigquery:v20240205-69ac5748ba
args:
- ./kettle/monitor.py
- --stale=6
- --table
- k8s_infra_kettle:build.all
- k8s_infra_kettle:build.week
- k8s_infra_kettle:build.day
annotations:
testgrid-num-failures-to-alert: '6'
testgrid-alert-stale-results-hours: '12'
testgrid-dashboards: sig-testing-misc
testgrid-alert-email: [email protected], [email protected]
testgrid-broken-column-threshold: '0.5'
description: Monitors Kettle's BigQuery database freshness.
26 changes: 0 additions & 26 deletions config/jobs/kubernetes/test-infra/test-infra-periodics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,32 +35,6 @@ periodics:
testgrid-broken-column-threshold: '0.5'
description: Runs `make test verify` on the test-infra repo every hour

- name: metrics-kettle
interval: 1h
decorate: true
extra_refs:
- org: kubernetes
repo: test-infra
base_ref: master
spec:
serviceAccountName: triage
containers:
- image: gcr.io/k8s-staging-test-infra/bigquery:v20240205-69ac5748ba
args:
- ./kettle/monitor.py
- --stale=6
- --table
- k8s-gubernator:build.all
- k8s-gubernator:build.week
- k8s-gubernator:build.day
annotations:
testgrid-num-failures-to-alert: '6'
testgrid-alert-stale-results-hours: '12'
testgrid-dashboards: sig-testing-misc
testgrid-alert-email: [email protected], [email protected]
testgrid-broken-column-threshold: '0.5'
description: Monitors Kettle's BigQuery database freshness.

- name: job-migration-todo-report
decorate: true
interval: 24h
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture.dot
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ digraph G {
Gubernator [href="https://gubernator.k8s.io"]
"Testgrid (closed)" [href="https://testgrid.k8s.io"]
Deck [href="https://prow.k8s.io"]
BigQuery [href="https://bigquery.cloud.google.com/table/k8s-gubernator:build.week"]
BigQuery [href="https://bigquery.cloud.google.com/table/k8s_infra_kettle:build.week"]

subgraph cluster_Prow {
label="Prow"
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions kettle/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,18 @@ IMG = gcr.io/k8s-testimages/kettle
TAG := $(shell date +v%Y%m%d)-$(shell git describe --tags --always --dirty)

# These are the usual GKE variables.
PROJECT ?= k8s-gubernator
ZONE ?= us-west1-b
CLUSTER ?= g8r
PROJECT ?= kubernetes-public
ZONE ?= us-central1
CLUSTER ?= aaa

get-cluster-credentials:
kubectl config use-context gke_k8s-gubernator_us-west1-b_g8r || gcloud container clusters get-credentials "$(CLUSTER)" --project="$(PROJECT)" --zone="$(ZONE)"
kubectl config use-context gke_kubernetes-public_us-central1_aaa || gcloud container clusters get-credentials "$(CLUSTER)" --project="$(PROJECT)" --zone="$(ZONE)"

push-prod:
../../../hack/make-rules/go-run/arbitrary.sh run ./images/builder --project=k8s-testimages --scratch-bucket=gs://k8s-testimages-scratch --build-dir=. kettle/
../hack/make-rules/go-run/arbitrary.sh run ./images/builder --project=k8s-staging-infra-tools --scratch-bucket=gs://k8s-testimages-scratch --build-dir=. kettle/

push:
../../../hack/make-rules/go-run/arbitrary.sh run ./images/builder --project=k8s-testimages --allow-dirty --build-dir=. kettle/
../hack/make-rules/go-run/arbitrary.sh run ./images/builder --project=k8s-staging-infra-tools --allow-dirty --build-dir=. kettle/

deploy: get-cluster-credentials
sed "s/:latest/:$(TAG)/g" deployment.yaml | kubectl apply -f - --record
Expand Down
8 changes: 4 additions & 4 deletions kettle/OVERVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ Flags:
# Create JSON Results and Upload
This stage gets run for each [BigQuery] table that Kettle is tasked with uploading data to. Typically looking like either:
- Fixed Time: `pypy3 make_json.py --days <num> | pv | gzip > build_<table>.json.gz`
and `bq load --source_format=NEWLINE_DELIMITED_JSON --max_bad_records={MAX_BAD_RECORDS} k8s-gubernator:build.<table> build_<table>.json.gz schema.json`
and `bq load --source_format=NEWLINE_DELIMITED_JSON --max_bad_records={MAX_BAD_RECORDS} k8s_infra_kettle:build.<table> build_<table>.json.gz schema.json`
- All Results: `pypy3 make_json.py | pv | gzip > build_<table>.json.gz`
and `bq load --source_format=NEWLINE_DELIMITED_JSON --max_bad_records={MAX_BAD_RECORDS} k8s-gubernator:build.<table> build_<table>.json.gz schema.json`
and `bq load --source_format=NEWLINE_DELIMITED_JSON --max_bad_records={MAX_BAD_RECORDS} k8s_infra_kettle:build.<table> build_<table>.json.gz schema.json`

### Make Json
`make_json.py` prepares an incremental table to track builds it has emitted to BQ. This table is named `build_emitted_<days>` (if days flag passed) or `build_emitted` otherwise. *This is important because if you change the days AND NOT the table being uploaded to, you will get duplicate results. If the `--reset_emitted` flag is passed, it will refresh the incremental table for fresh data. It then walks all of the builds to fetch within `<days>` or since epoch if unset, and dumps each as a json object to a build `tar.gz`.
Expand All @@ -49,6 +49,6 @@ After all historical data has been uploaded, Kettle enters a Streaming phase. It
- inserts it into the tables (from flag)
- adds the data to the respective incremental tables

[BigQuery]: https://console.cloud.google.com/bigquery?utm_source=bqui&utm_medium=link&utm_campaign=classic&project=k8s-gubernator
[BigQuery]: https://console.cloud.google.com/bigquery?utm_source=bqui&utm_medium=link&utm_campaign=classic&project=k8s-infra-kettle
[Buckets]: https://github.com/kubernetes/test-infra/blob/master/kettle/buckets.yaml
[Schema]: https://github.com/kubernetes/test-infra/blob/master/kettle/schema.json
[Schema]: https://github.com/kubernetes/test-infra/blob/master/kettle/schema.json
24 changes: 12 additions & 12 deletions kettle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ This collects test results scattered across a variety of GCS buckets,
stores them in a local SQLite database, and outputs newline-delimited
JSON files for import into BigQuery. *See [overview](./OVERVIEW.md) for more details.*

Results are stored in the [k8s-gubernator:build BigQuery dataset][Big Query Tables],
Results are stored in the [k8s_infra_kettle:build BigQuery dataset][Big Query Tables],
which is publicly accessible.

# Deploying

Kettle runs as a pod in the `k8s-gubernator/g8r` cluster. To drop into it's context, run `<root>$ make -C kettle get-cluster-credentials`
Kettle runs as a pod in the `kubernetes-public/aaa` cluster. To drop into it's context, run `<root>$ make -C kettle get-cluster-credentials`

If you change:

Expand All @@ -18,7 +18,7 @@ If you change:
- any code: **Run from root** deploy with `make -C kettle push update`, revert with `make -C kettle rollback` if it fails
- `push` builds the continer image and pushes it to the image registry
- `update` sets the image of the existing kettle *Pod* which triggers a restart cycle
- this will build the image to [Google Container Registry](https://console.cloud.google.com/gcr/images/k8s-gubernator/GLOBAL/kettle?project=k8s-gubernator&organizationId=433637338589&gcrImageListsize=30)
- this will build the image to [Google Container Registry](https://console.cloud.google.com/gcr/images/kubernetes-public/GLOBAL/kettle)
- See [Makefile](Makefile) for details

#### Note:
Expand Down Expand Up @@ -63,23 +63,23 @@ You can watch the pod startup and collect data from various GCS buckets by looki
```sh
kubectl logs -f $(kubectl get pod -l app=kettle -oname)
```
or access [log history](https://console.cloud.google.com/logs/query?project=k8s-gubernator) with the Query: `resource.labels.container_name="kettle"`.
or access [log history](https://console.cloud.google.com/logs/query?project=kubernetes-public) with the Query: `resource.labels.container_name="kettle"`.

It might take a couple of hours to be fully functional and start updating BigQuery. You can always go back to the [Gubernator BigQuery page][Big Query All] and check to see if data collection has resumed. Backfill should happen automatically.

#### Kettle Staging

`Kettle Staging` uses a similar deployment to `Kettle` with the following differences
- [100G SSD](https://console.cloud.google.com/compute/disksDetail/zones/us-west1-b/disks/kettle-data-staging?folder=&organizationId=&project=k8s-gubernator) vs 1001G in production
- [100G SSD](https://console.cloud.google.com/compute/disksDetail/zones/us-central1/disks/kettle-data-staging?folder=&organizationId=&project=kubernetes-public) vs 1001G in production
- Limit option for number of builds to pull from each job bucket (Default 1000 each). Set via BUILD_LIMIT env in [deployment-staging.yaml](./deployment-staging.yaml).
- writes to [build.staging](https://console.cloud.google.com/bigquery?project=k8s-gubernator&page=table&t=all&d=build&p=k8s-gubernator&redirect_from_classic=true) table only. This differs from production that writes to three tables `build.all`, `build.day`, and `build.week`.
- writes to [build.staging](https://console.cloud.google.com/bigquery?project=kubernetes-public&page=table&t=all&d=build&p=kubernetes-public&redirect_from_classic=true) table only. This differs from production that writes to three tables `build.all`, `build.day`, and `build.week`.


It can be deployed with `make -C kettle deploy-staging`. If already deployed, you may just run `make -C kettle update-staging`.

#### Adding Fields

To add fields to the BQ table, Visit the [k8s-gubernator:build BigQuery dataset][Big Query Tables] and Select the table (Ex. Build > All). Schema -> Edit Schema -> Add field. As well as update [schema.json](./schema.json)
To add fields to the BQ table, Visit the [k8s_infra_kettle:build BigQuery dataset][Big Query Tables] and Select the table (Ex. Build > All). Schema -> Edit Schema -> Add field. As well as update [schema.json](./schema.json)

## Adding Buckets

Expand Down Expand Up @@ -118,21 +118,21 @@ gcloud pubsub subscriptions create <subscription name> --topic=gcs-changes --top
```

### Auth
For kettle to have permission, kettle's user needs access. When updating or changing a [Subscription] make sure to add `kettle@k8s-gubernator.iam.gserviceaccount.com` as a `PubSub Editor`.
For kettle to have permission, kettle's user needs access. When updating or changing a [Subscription] make sure to add `kettle@kubernetes-public.iam.gserviceaccount.com` as a `PubSub Editor`.
```
gcloud pubsub subscriptions add-iam-policy-binding \
projects/kubernetes-jenkins/subscriptions/kettle-staging \
--member=serviceAccount:kettle@k8s-gubernator.iam.gserviceaccount.com \
--member=serviceAccount:kettle@kubernetes-public.iam.gserviceaccount.com \
--role=roles/pubsub.editor
```
# Known Issues
- Occasionally data from Kettle stops updating, we suspect this is due to a transient hang when contacting GCS ([#8800](https://github.com/kubernetes/test-infra/issues/8800)). If this happens, [restart kettle](#restarting)
[Big Query Tables]: https://console.cloud.google.com/bigquery?utm_source=bqui&utm_medium=link&utm_campaign=classic&project=k8s-gubernator
[Big Query All]: https://console.cloud.google.com/bigquery?project=k8s-gubernator&page=table&t=all&d=build&p=k8s-gubernator
[Big Query Staging]: https://console.cloud.google.com/bigquery?project=k8s-gubernator&page=table&t=staging&d=build&p=k8s-gubernator
[Big Query Tables]: https://console.cloud.google.com/bigquery?utm_source=bqui&utm_medium=link&utm_campaign=classic&project=kubernetes-public
[Big Query All]: https://console.cloud.google.com/bigquery?project=kubernetes-public&page=table&t=all&d=build&p=kubernetes-public
[Big Query Staging]: https://console.cloud.google.com/bigquery?project=kubernetes-public&page=table&t=staging&d=build&p=kubernetes-public
[PubSub]: https://cloud.google.com/pubsub/docs
[Subscriptions]: https://console.cloud.google.com/cloudpubsub/subscription/list?project=kubernetes-jenkins
[Topic Creation]: https://cloud.google.com/storage/docs/reporting-changes#enabling
4 changes: 2 additions & 2 deletions kettle/deployment-staging.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: kettle@k8s-gubernator.iam.gserviceaccount.com
iam.gke.io/gcp-service-account: kettle@kubernetes-public.iam.gserviceaccount.com
name: kettle
---
apiVersion: apps/v1
Expand All @@ -23,7 +23,7 @@ spec:
serviceAccountName: kettle
containers:
- name: kettle-staging
image: gcr.io/k8s-testimages/kettle:latest
image: gcr.io/k8s-staging-infra-tools/kettle:latest
imagePullPolicy: Always
env:
- name: BUILD_LIMIT
Expand Down
15 changes: 12 additions & 3 deletions kettle/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,18 @@
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: [email protected]
name: kettle
namespace: kettle
labels:
app: kettle
annotations:
iam.gke.io/gcp-service-account: [email protected]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kettle
namespace: kettle
spec:
replicas: 1
selector:
Expand All @@ -23,13 +27,18 @@ spec:
serviceAccountName: kettle
containers:
- name: kettle
image: gcr.io/k8s-testimages/kettle:latest
image: gcr.io/k8s-staging-infra-tools/kettle:latest
imagePullPolicy: Always
env:
- name: DEPLOYMENT
value: prod
- name: SUBSCRIPTION_PATH
value: kubernetes-jenkins/gcs-changes/kettle-filtered
resources:
requests:
memory: 4Gi
limits:
memory: 12Gi
volumeMounts:
- name: data
mountPath: /data
Expand Down
32 changes: 7 additions & 25 deletions kettle/pv.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,22 @@
kind: PersistentVolume
apiVersion: v1
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
labels:
app: kettle
name: kettle-data
spec:
capacity:
storage: 3001Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
gcePersistentDisk:
pdName: kettle-data
fsType: ext4
name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
labels:
app: kettle
name: kettle-data
namespace: kettle
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3001Gi
storageClassName: ssd
volumeName: kettle-data
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
allowVolumeExpansion: true
reclaimPolicy: Delete
2 changes: 1 addition & 1 deletion kettle/stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ def get_options(argv):
)
parser.add_argument(
'--dataset',
help='BigQuery dataset (e.g. k8s-gubernator:build)'
help='BigQuery dataset (e.g. k8s_infra_kettle:build)'
)
parser.add_argument(
'--tables',
Expand Down
12 changes: 6 additions & 6 deletions kettle/update.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,23 +63,23 @@ def main():

if os.getenv('DEPLOYMENT', 'staging') == "prod":
call(f'{mj_cmd} {mj_ext} --days {DAY} | pv | gzip > build_day.json.gz')
call(f'{bq_cmd} {bq_ext} k8s-gubernator:build.day build_day.json.gz schema.json')
call(f'{bq_cmd} {bq_ext} k8s_infra_kettle:build.day build_day.json.gz schema.json')

call(f'{mj_cmd} {mj_ext} --days {WEEK} | pv | gzip > build_week.json.gz')
call(f'{bq_cmd} {bq_ext} k8s-gubernator:build.week build_week.json.gz schema.json')
call(f'{bq_cmd} {bq_ext} k8s_infra_kettle:build.week build_week.json.gz schema.json')

# TODO: (MushuEE) #20024, remove 30 day limit once issue with all uploads is found
call(f'{mj_cmd} --days {MONTH} | pv | gzip > build_all.json.gz')
call(f'{bq_cmd} k8s-gubernator:build.all build_all.json.gz schema.json')
call(f'{bq_cmd} k8s_infra_kettle:build.all build_all.json.gz schema.json')

call(f'python3 stream.py --poll {SUB_PATH} ' \
f'--dataset k8s-gubernator:build ' \
f'--dataset k8s_infra_kettle:build ' \
f'--tables all:{MONTH} day:{DAY} week:{WEEK} --stop_at=1')
else:
call(f'{mj_cmd} | pv | gzip > build_staging.json.gz')
call(f'{bq_cmd} k8s-gubernator:build.staging build_staging.json.gz schema.json')
call(f'{bq_cmd} k8s_infra_kettle:build.staging build_staging.json.gz schema.json')
call(f'python3 stream.py --poll {SUB_PATH} ' \
f'--dataset k8s-gubernator:build --tables staging:0 --stop_at=1')
f'--dataset k8s_infra_kettle:build --tables staging:0 --stop_at=1')

if __name__ == '__main__':
os.chdir(os.path.dirname(__file__))
Expand Down
3 changes: 0 additions & 3 deletions metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,6 @@ jqfilter: |
* weekly-consistency - compute overall weekly consistency for PRs
- [Config](configs/weekly-consistency-config.yaml)
- [weekly-consistency-latest.json](http://storage.googleapis.com/k8s-metrics/weekly-consistency-latest.json)
* istio-job-flakes - compute overall weekly consistency for postsubmits
- [Config](configs/istio-flakes.yaml)
- [istio-job-flakes-latest.json](http://storage.googleapis.com/k8s-metrics/istio-job-flakes-latest.json)
## Adding a new metric
Expand Down
31 changes: 0 additions & 31 deletions metrics/configs/istio-flakes.yaml

This file was deleted.

Loading

0 comments on commit 2fcc9ce

Please sign in to comment.