opensearch-project · Moss-CG · Jun 13, 2023 · Jun 13, 2023 · Jun 13, 2023 · Jun 13, 2023
@@ -2,7 +2,7 @@ name: Functional tests
 on: [pull_request]
 
 jobs:
-  functional-tests:
+  operator:
     runs-on: ubuntu-latest
     steps:
       - name: Checkout code
@@ -27,17 +27,56 @@ jobs:
           ## Prepare kubeconfig
           k3d kubeconfig get $CLUSTER_NAME > functionaltests/kubeconfig
           export KUBECONFIG=$(pwd)/functionaltests/kubeconfig
-
+
+          ## Build controller docker image
+          make docker-build
+
+          ## Import controller docker image
+          k3d image import -c $CLUSTER_NAME controller:latest
+
+          ## Install helm chart
+          helm install opensearch-operator ../charts/opensearch-operator --set manager.image.repository=controller --set manager.image.tag=latest --set manager.image.pullPolicy=IfNotPresent --namespace default --wait
+          cd functionaltests
+
+          ## Run tests
+          go test ./operatortests -timeout 30m
+
+  cluster-helm-chart:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v2
+      - name: Setup go
+        uses: actions/setup-go@v3
+        with:
+          go-version: '1.19'
+      - uses: nolar/setup-k3d-k3s@v1
+        with:
+          version: v1.22
+          k3d-name: opensearch-operator-tests
+          k3d-args: --agents 2 -p 30000-30005:30000-30005@agent:0
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+      - name: Run tests
+        run: |
+          set -e
+          export CLUSTER_NAME=opensearch-operator-tests
+          ## Check disk to avoid failed shard assignments due to watermarking
+          df -h
+          cd opensearch-operator
+          ## Prepare kubeconfig
+          k3d kubeconfig get $CLUSTER_NAME > functionaltests/kubeconfig
+          export KUBECONFIG=$(pwd)/functionaltests/kubeconfig
+
           ## Build controller docker image
           make docker-build
-          
+
           ## Import controller docker image
           k3d image import -c $CLUSTER_NAME controller:latest
-          
+
           ## Install helm chart
           helm install opensearch-operator ../charts/opensearch-operator --set manager.image.repository=controller --set manager.image.tag=latest --set manager.image.pullPolicy=IfNotPresent --namespace default --wait
           helm install opensearch-cluster ../charts/opensearch-cluster --set OpenSearchClusterSpec.enabled=true --wait
           cd functionaltests
 
           ## Run tests
-          go test -timeout 30m
+          go test ./helmtests -timeout 15m
@@ -55,7 +55,7 @@ The opensearch k8s operator aims to be compatible to all supported opensearch ve
 
 | Operator Version | Min Supported Opensearch Version | Max supported Opensearch version | Comment |
 |------------------|----------------------------------|----------------------------------|---------|
-| 2.3              | 1.0                              | 2.7                              |         |
+| 2.3              | 1.0                              | 2.8                              |         |
 | 2.2              | 1.0                              | 2.5                              |         |
 | 2.1              | 1.0                              | 2.3                              |         |
 | 2.0              | 1.0                              | 2.3                              |         |

@@ -0,0 +1,154 @@
+# Autoscaling
+
+## Content
+- [Autoscaling](#autoscaling)
+- [Goals](#goals)
+- [Design](#design)
+- [Getting Started](#gettingstarted)
+
+## Goals
+1. Scale OpenSearch clusters managed by the operator up and down via monitoring metrics.
+2. Support for making scaling decision from one-to-many metrics with aggregations.
+
+## Design
+A separate CRD is used for defining autoscaling policies. Autoscaling CRDs are stateless as they are never updated by the operator and only read. Once an autoscaler is created, it can be referenced from either a cluster or nodepool level inside the OpensearchCluster configuration. When enabled, the autoscaler will query a prometheus backend containing the cluster metrics and make scaling determinations based on the user configuration. 
+
+### Requirements
+To support the second goal of being able to make scaling decisions with aggregations, there needs to be a record of cluster metrics over a time period. Since the monitoring component of the operator is already leveraging Prometheus, it made sense to utilize it as well. The autoscaler requires a Prometheus instance that is scraping the metrics of your cluster for the autoscaler to work. 
+
+### Considerations
+Some design considerations to make note of:
+1. ScaleConf only contains maxReplicas but no minReplicas, this is because the number of replicas specified in the nodepool for the OpenSearch cluster is used for the minReplica value.
+2. The operator field of an Item can be any supported Prometheus comparison binary operator.
+```
+== (equal)
+!= (not-equal)
+> (greater-than)
+< (less-than)
+>= (greater-or-equal)
+<= (less-or-equal)
+```
+3. The interval field of a queryOption can be an integer follow by any valid Prometheus time duration.
+```
+ms - milliseconds
+s - seconds
+m - minutes
+h - hours
+d - days - assuming a day has always 24h
+w - weeks - assuming a week has always 7d
+y - years - assuming a year has always 365d
+```
+4. The function field of a queryOption can be any valid singular Prometheus function.
+
+### Autoscaler Custom Resource Reference Guide
+
+The Autoscaler CRD is defined by kind: `Autoscaler`, group: `opensearch.opster.io` and version `v1`.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| apiVersion | string | opensearch.opster.io/v1 | true |
+| kind | string | Autoscaler | true |
+| metadata | object | Refer to the Kubernetes API documentation for the fields of the `metadata` field. | true |
+| spec | object | AutoscalerSpec defines the desired configuration of the autoscaler. | true |
+
+
+### Autoscaler.spec
+AutoscalerSpec defines the desired configuration of the autoscaler.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| rules | []Rule | The container for lists of type Rule, defining scaling logic. | true |
+
+
+### Rule
+Rule defines a single rule.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| items | []Item | A list of type Item, defining conditions for scaling. | true |
+| nodeRole | string | The role of the Opensearch node type you would like to target for scaling. | true |
+| behavior | Scale | The container for the scaling behavior of the ruleset. | true |
+
+A rule may contain many items; by default all items expressions generated from the configuration must evaluate to true for a scaling action to take place.
+A nodeRole is needed primarily in the case that the autoscalePolicy is defined at the cluster level so which nodes to scale is known.
+
+### Item
+Item defines a singular item in a rule.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| metric | string | A prometheus metric to target for performing conditional operations. | true |
+| operator | string | The operator to use for comparing the prometheus query result and threshold. | true |
+| threshold | string | The threshold value for taking scaling action. | true |
+| queryOptions | QueryOptions | Optional additions to the prometheus query. | false |
+
+The operator field of an Item can be any supported Prometheus comparison binary operator.
+```
+== (equal)
+!= (not-equal)
+> (greater-than)
+< (less-than)
+>= (greater-or-equal)
+<= (less-or-equal)
+```
+
+### QueryOptions
+QueryOptions defined additional query configurations.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| labelMatchers | []string | A prometheus supported label matcher to limit results. | false |
+| interval | string | A prometheus supported interval of time over which to query. | false |
+| function | string | A prometheus supported function wrapper. | false |
+| aggregateEvaluation | bool | A flag to average your prometheus query results together. | false |
+
+The interval field of a queryOption can be an integer follow by any valid Prometheus time duration.
+```
+ms - milliseconds
+s - seconds
+m - minutes
+h - hours
+d - days - assuming a day has always 24h
+w - weeks - assuming a week has always 7d
+y - years - assuming a year has always 365d
+```
+
+The aggregateEvaluation field is designed to average the results from multiple nodes for comparison. This is useful for when you want to scale based off an average of nodes metrics versus each individual node needing to be evaluated.
+
+### Behavior
+Behavior defines a scaling behavior for a rule.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| enable | bool | Flag to enable or disable the rule. | true |
+| scaleUp | ScaleConf | Container for upscaling behavior. | false |
+| scaleDown | ScaleConf | Container for downscaling behavior. | false |
+
+You should never have both scaleUp and scaleDown defined for the same rule. Each rule should only ever have one or the other.
+
+### ScaleConf
+Scaling behavior for scaling up or down.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| maxReplicas | int32 | Maximum amount of replicas to scale up to. | false |
+
+MaxReplicas is an optional field in the case of a rule that is scaling down, however if scaling up it is needed so there is an upper boundary. MinReplicas is absent because the nodepool.Replicas defined in the cluster spec performs this function. When scaling down the cluster will never scale below the number of replicas defined in the cluster.
+
+
+In addition to the autoscaler CRD, changes to the existing OpensearchCluster CRD are included, specifically the generalConfig and nodePools.
+
+### OpensearchCluster.General.Autoscaler
+Addition of an `Autoscaler` section under generalConfig.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| enable | boolean | Enables or disables autoscaling functionality. | false |
+| prometheusEndpoint | string | A prometheus endpoint to monitor. | false |
+| scaleTimeout | int | The amount of time to wait before scaling since last scale or cluster creation in minutes. | false |
+| clusterPolicy | string | The override to set a cluster specific autoscale policy. | false |
+
+### OpensearchCluster.nodePools
+Addition of `AutoScalePolicy` to nodePools.
+| Name | Type | Description | Required |
+|--------|--------|--------|--------|
+| autoScalePolicy | string | The name of an autoscaler that the user has applied. | false |
+
+Note that the clusterPolicy and autoScalePolicy are synonymous and users should choose one or the other based on their needs. 
+
+## GettingStarted
+1. Have a Prometheus instance where metrics from your cluster are being stored.
+2. Create an autoscaling policy with the CRD that meets your scaling requirements.
+3. Define the autoscaling policy in your OpensearchCluster and enable it.
@@ -653,6 +653,51 @@ Monitoring TLS configuration options
   </tbody>
 </table>
 
+<h3 id="GeneralConfig">
+  Autoscaler
+</h3>
+
+Autoscaler defines Opensearch autoscaling configuration
+
+<table>
+    <thead>
+        <tr>
+            <th>Name</th>
+            <th>Type</th>
+            <th>Description</th>
+            <th>Required</th>
+            <th>default</th>
+        </tr>
+    </thead>
+    <tbody><tr>
+        <td><b>enable</b></td>
+        <td>bool</td>
+        <td>Define if to enable autoscaling for that cluster</td>
+        <td>false</td>
+        <td>-</td>
+      </tr><tr>
+        <td><b>prometheusEndpoint</b></td>
+        <td>string</td>
+        <td>A prometheus URL endpoint that monitoring metrics from the OS cluster are sent to.</td>
+        <td>false</td>
+        <td>-</td>
+      </tr><tr>
+        <td><b>scaleTimeout</b></td>
+        <td>string</td>
+        <td>This interval limits how often the cluster will attempt an automatic scaling action. Notation should follow [prometheus time duration standards](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-durations).</td>
+        <td>false</td>
+        <td>10m</td>
+      </tr><tr>
+      </tr><tr>
+        <td><b>clusterAutoScalePolicy</b></td>
+        <td>string</td>
+        <td>Optional to define an autoscaling policy at the cluster level instead of nodePool.</td>
+        <td>false</td>
+        <td>-</td>
+      </tr><tr>
+      </tr><tr>
+</table>
+
 <h3 id="GeneralConfig">
   Keystore
 </h3>