Skip to content

Commit 4ff73a0

Browse files
committed
Add safer-cluster module template files
1 parent 5d6fafc commit 4ff73a0

File tree

5 files changed

+876
-0
lines changed

5 files changed

+876
-0
lines changed

autogen/safer-cluster/README.md

+278
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# Safer Cluster: How to setup a GKE Kubernetes cluster with reduced exposure
2+
3+
This module defines an opinionated setup of GKE
4+
cluster. We outline project configurations, cluster settings, and basic K8s
5+
objects that permit a safer-than-default configuration.
6+
7+
## Module Usage
8+
9+
The module fixes a set of parameters to values suggested in the
10+
[GKE harderning guide](https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster),
11+
the CIS framework, and other best practices.
12+
13+
The motivation for each setting, and its relation to harderning guides or other recommendations
14+
is outline in `main.tf` as comments over individual settings. When security-relevant settings
15+
are available for configuration, recommendations on their settings are documented in the `variables.tf` file.
16+
17+
## Project Setup and Cloud IAM policy for GKE
18+
19+
### Applications and Clusters
20+
21+
- Different applications that access data with different sensitivity and do
22+
not need to communicate with each other (e.g., dev and prod instances of the
23+
same application) should be placed in different clusters.
24+
25+
- This approach will limit the blast radius of errors. An security problem
26+
in dev shouldn't impact production data.
27+
28+
- If applications need to communicate (e.g., a frontend system calling
29+
a backend), we suggest placing the two applications in the same cluster, in
30+
different namespaces.
31+
32+
- Placing them in the same cluster will provide fast network
33+
communication, and the different namespaces will be configured to
34+
provide some administrative isolation. Istio will be used to encrypt and
35+
control communication between applications.
36+
37+
- We suggest to store user or business data persistently in managed storage
38+
services that are inventoried and controlled by centralized teams.
39+
(e.g., GCP storage services within a GCP organization).
40+
41+
- Storing user or business data securely requires satisfying a large set of
42+
requirements, such as data inventory, which might be harder to satisfy at
43+
scale when data is stored opaquely within a cluster. Services like Cloud
44+
Asset Inventory provide centralized teams ability to enumerate data stores.
45+
46+
### Project Setup
47+
48+
We suggest a GKE setup composed of multiple projects to separate responsibilities
49+
between cluster operators, which need to administer the cluster; and product
50+
developers, which mostly just want to deploy and debug applications.
51+
52+
- *Cluster Projects (`project_id`):* GKE clusters storing sensitive data should have their
53+
own projects, so that they can be administered independently (e.g., dev cluster;
54+
production clusters; staging clusters should go in different projects.)
55+
56+
- *A shared GCR project (`registry_project_id`):* all clusters can share the same GCR project.
57+
58+
- Easier to share images between environments. The same image could be
59+
progressively rolled-out in dev, staging, and then production.
60+
- Easier to manage service account permissions: GCR requires authorizing
61+
service accounts to access certain buckets, which are created only after
62+
images are published. When the only service run by the project is GCR,
63+
we can safely give project-wide read permissions to all buckets.
64+
65+
- (optional) *Data Projects:* When the same cluster is shared by different
66+
applications managed by different teams, we suggest separating the data for
67+
each application by placing storage resources for each team in different
68+
projects (e.g., a Spanner instance for application A in one project, GCS
69+
bucket for application B in a different project).
70+
71+
- This permits to control administrative access to the data more tightly,
72+
as Cloud IAM policies for accessing the data can be managed by each
73+
application team, rather than the team managing the cluster
74+
infrastructure.
75+
76+
Exception to such a setup:
77+
78+
- When not using Shared VPCs, resources that require direct network connectivity
79+
(e.g., a Cloud SQL instance), need to be placed in the same VPC (hence, project)
80+
as the clusters from which connections are made.
81+
82+
### Google Service Accounts
83+
84+
We use GKE Workload Identity (BETA) to associate a GCP identity to each workload,
85+
and limit the permissions associated with the cluster nodes.
86+
87+
The Safer Cluster setup relies on several service accounts:
88+
89+
- The module generates a service account to run nodes. Such a service account
90+
has only permissions of sending logging data, metrics, and downloading containers
91+
from the given GCR project. The following settings in the module will create
92+
a service account with the above properties:
93+
94+
```
95+
create_service_account = true
96+
registry_project_id = <the project id for your GCR project>
97+
grant_registry_access = true
98+
```
99+
100+
- A service account *for each application* running on the cluster (e.g.,
101+
`[email protected]`). These service
102+
accounts should be associated to the permissions required for running the
103+
application, such as access to databases.
104+
105+
```
106+
- email: myproduct
107+
displayName: Google Service Account for containers running in the myproduct k8s namespace
108+
policy:
109+
# GKE workload identity authorization. This authorizes the Kubernetes Service Account
110+
# myproduct/default from this project's identity namespace to impersonate the service account.
111+
# https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
112+
bindings:
113+
- members:
114+
- serviceAccount:product-prod.svc.id.goog[myproduct/default]
115+
role: roles/iam.workloadIdentityUser
116+
```
117+
118+
We suggest running different applications in different namespaces within the cluster. Each namespace
119+
should be assigned to its own GCP service account to better define the Cloud IAM permissions required
120+
for the application.
121+
122+
If you are using more than 2 projects in your setup, you can consider creating
123+
the service account in a different project to keep application and
124+
infrastructure separate. For example, service accounts could be created in each team's project,
125+
while the cluster runs in a centrally controlled project.
126+
127+
<section class="zippy">
128+
*Why?*
129+
130+
Separating the permissions associated with the infrastructure GKE nodes and the
131+
application provides a simpler way to scale up the cluster: multiple applications
132+
could be run in the same cluster, and each of them can run with tailored permissions
133+
that limit the impact of compromises.
134+
135+
Such a separation of identities is enabled by a GKE feature called Workload
136+
Identity. The feature provides additional advantages such as a better protection
137+
of the node's metadata server against attackers.
138+
139+
</section>
140+
141+
### Cloud IAM Permissions for the GKE Cluster
142+
143+
We suggest to mainly rely on Kubernetes RBAC to manage access control, and use
144+
Cloud IAM to give users only the ability of configuring `kubectl` credentials.
145+
146+
Engineers operating applications on the cluster should only be assigned the
147+
Cloud IAM permission `roles/container.clusterViewer`. This role allows them to
148+
obtain credentials for the cluster, but provides no further access to the
149+
cluster objects. All cluster objects are protected by RBAC configurations,
150+
defined below.
151+
152+
<section class="zippy">
153+
*Why?*
154+
155+
Both Cloud IAM and RBAC can be used to control access to GKE clusters. Those two
156+
systems are combined as a "OR": an action is authorized if the necessary
157+
permissions are provided by either RBAC _OR_ Cloud IAM
158+
159+
However, Cloud IAM permissions are defined for a project: user get assigned the
160+
same permissions over all clusters and all namespaces within each cluster. Such
161+
a setup makes it hard to separate responsibilities between teams in charge of
162+
managing clusters, and teams in charge of products.
163+
164+
By relying on RBAC instead of Cloud IAM, we have a finer-grained control of the
165+
permissions provided to engineers, and permits to restrict permissions to only
166+
certain namespaces.
167+
168+
</section>
169+
170+
You can add the following binding to the `myproduct-prod` project.
171+
172+
```
173+
- members:
174+
role: roles/container.clusterViewer`
175+
- group:<produdct team group>
176+
- group:<cluster team group>
177+
```
178+
179+
The permissions won't allow engineers to SSH into nodes as part of the regular
180+
development workflow. Such permissions should be granted only to the cluster
181+
team, and used only in case of emergency.
182+
183+
While RBAC permissions should be sufficient for most cases, we also suggest to
184+
create an emergency superuser role that can be used, given a proper
185+
justification, for resolving cases where regular permissions are insufficient.
186+
For simplicity, we suggest using `roles/container.admin` and
187+
`roles/compute.admin`, until more narrow roles can be defined given your usage.
188+
189+
```
190+
- members:
191+
role: roles/container.admin
192+
- group:<oncall for cluster tean>
193+
- members:
194+
role: roles/compute.admin
195+
- group:<oncall for cluster tean>
196+
```
197+
198+
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
199+
## Inputs
200+
201+
| Name | Description | Type | Default | Required |
202+
|------|-------------|:----:|:-----:|:-----:|
203+
| authenticator\_security\_group | The name of the RBAC security group for use with Google security groups in Kubernetes RBAC. Group name must be in format [email protected] | string | `"null"` | no |
204+
| cloudrun | (Beta) Enable CloudRun addon | string | `"false"` | no |
205+
| cluster\_ipv4\_cidr | The IP address range of the kubernetes pods in this cluster. Default is an automatically assigned CIDR. | string | `""` | no |
206+
| cluster\_resource\_labels | The GCE resource labels (a map of key/value pairs) to be applied to the cluster | map(string) | `<map>` | no |
207+
| compute\_engine\_service\_account | Use the given service account for nodes rather than creating a new dedicated service account. | string | `""` | no |
208+
| database\_encryption | Application-layer Secrets Encryption settings. The object format is {state = string, key_name = string}. Valid values of state are: "ENCRYPTED"; "DECRYPTED". key_name is the name of a CloudKMS key. | object | `<list>` | no |
209+
| default\_max\_pods\_per\_node | The maximum number of pods to schedule per node | string | `"110"` | no |
210+
| description | The description of the cluster | string | `""` | no |
211+
| enable\_intranode\_visibility | Whether Intra-node visibility is enabled for this cluster. This makes same node pod to pod traffic visible for VPC network | bool | `"false"` | no |
212+
| enable\_shielded\_nodes | Enable Shielded Nodes features on all nodes in this cluster. | bool | `"true"` | no |
213+
| enable\_vertical\_pod\_autoscaling | Vertical Pod Autoscaling automatically adjusts the resources of pods controlled by it | bool | `"false"` | no |
214+
| grant\_registry\_access | Grants created cluster-specific service account storage.objectViewer role. | bool | `"false"` | no |
215+
| horizontal\_pod\_autoscaling | Enable horizontal pod autoscaling addon | bool | `"true"` | no |
216+
| http\_load\_balancing | Enable httpload balancer addon. The addon allows whoever can create Ingress objects to expose an application to a public IP. Network policies or Gatekeeper policies should be used to verify that only authorized applications are exposed. | bool | `"true"` | no |
217+
| initial\_node\_count | The number of nodes to create in this cluster's default node pool. | number | `"0"` | no |
218+
| ip\_range\_pods | The _name_ of the secondary subnet ip range to use for pods | string | n/a | yes |
219+
| ip\_range\_services | The _name_ of the secondary subnet range to use for services | string | n/a | yes |
220+
| istio | (Beta) Enable Istio addon | string | `"false"` | no |
221+
| kubernetes\_version | The Kubernetes version of the masters. If set to 'latest' it will pull latest available version in the selected region. The module enforces certain minimum versions to ensure that specific features are available. | string | `"latest"` | no |
222+
| logging\_service | The logging service that the cluster should write logs to. Available options include logging.googleapis.com, logging.googleapis.com/kubernetes (beta), and none | string | `"logging.googleapis.com"` | no |
223+
| maintenance\_start\_time | Time window specified for daily maintenance operations in RFC3339 format | string | `"05:00"` | no |
224+
| master\_authorized\_networks | List of master authorized networks. If none are provided, disallow external access (except the cluster node IPs, which GKE automatically whitelists). | object | `<list>` | no |
225+
| master\_ipv4\_cidr\_block | (Beta) The IP range in CIDR notation to use for the hosted master network | string | `"10.0.0.0/28"` | no |
226+
| monitoring\_service | The monitoring service that the cluster should write metrics to. Automatically send metrics from pods in the cluster to the Google Cloud Monitoring API. VM metrics will be collected by Google Compute Engine regardless of this setting Available options include monitoring.googleapis.com, monitoring.googleapis.com/kubernetes (beta) and none | string | `"monitoring.googleapis.com"` | no |
227+
| name | The name of the cluster | string | n/a | yes |
228+
| network | The VPC network to host the cluster in | string | n/a | yes |
229+
| network\_project\_id | The project ID of the shared VPC's host (for shared vpc support) | string | `""` | no |
230+
| node\_pools | List of maps containing node pools | list(map(string)) | `<list>` | no |
231+
| node\_pools\_labels | Map of maps containing node labels by node-pool name | map(map(string)) | `<map>` | no |
232+
| node\_pools\_metadata | Map of maps containing node metadata by node-pool name | map(map(string)) | `<map>` | no |
233+
| node\_pools\_oauth\_scopes | Map of lists containing node oauth scopes by node-pool name | map(list(string)) | `<map>` | no |
234+
| node\_pools\_tags | Map of lists containing node network tags by node-pool name | map(list(string)) | `<map>` | no |
235+
| node\_pools\_taints | Map of lists containing node taints by node-pool name | object | `<map>` | no |
236+
| node\_version | The Kubernetes version of the node pools. Defaults kubernetes_version (master) variable and can be overridden for individual node pools by setting the `version` key on them. Must be empyty or set the same as master at cluster creation. | string | `""` | no |
237+
| project\_id | The project ID to host the cluster in | string | n/a | yes |
238+
| region | The region to host the cluster in | string | n/a | yes |
239+
| regional | Whether is a regional cluster (zonal cluster if set false. WARNING: changing this after cluster creation is destructive!) | bool | `"true"` | no |
240+
| registry\_project\_id | Project holding the Google Container Registry. If empty, we use the cluster project. If grant_registry_access is true, storage.objectViewer role is assigned on this project. | string | `""` | no |
241+
| resource\_usage\_export\_dataset\_id | The dataset id for which network egress metering for this cluster will be enabled. If enabled, a daemonset will be created in the cluster to meter network egress traffic. | string | `""` | no |
242+
| sandbox\_enabled | (Beta) Enable GKE Sandbox (Do not forget to set `image_type` = `COS_CONTAINERD` and `node_version` = `1.12.7-gke.17` or later to use it). | bool | `"false"` | no |
243+
| service\_account | The service account to run nodes as if not overridden in `node_pools`. The create_service_account variable default value (true) will cause a cluster-specific service account to be created. | string | `""` | no |
244+
| stub\_domains | Map of stub domains and their resolvers to forward DNS queries for a certain domain to an external DNS server | map(list(string)) | `<map>` | no |
245+
| subnetwork | The subnetwork to host the cluster in | string | n/a | yes |
246+
| upstream\_nameservers | If specified, the values replace the nameservers taken by default from the node’s /etc/resolv.conf | list | `<list>` | no |
247+
| zones | The zones to host the cluster in | list(string) | `<list>` | no |
248+
249+
## Outputs
250+
251+
| Name | Description |
252+
|------|-------------|
253+
| ca\_certificate | Cluster ca certificate (base64 encoded) |
254+
| endpoint | Cluster endpoint |
255+
| horizontal\_pod\_autoscaling\_enabled | Whether horizontal pod autoscaling enabled |
256+
| http\_load\_balancing\_enabled | Whether http load balancing enabled |
257+
| location | Cluster location (region if regional cluster, zone if zonal cluster) |
258+
| logging\_service | Logging service used |
259+
| master\_authorized\_networks\_config | Networks from which access to master is permitted |
260+
| master\_version | Current master kubernetes version |
261+
| min\_master\_version | Minimum master kubernetes version |
262+
| monitoring\_service | Monitoring service used |
263+
| name | Cluster name |
264+
| network\_policy\_enabled | Whether network policy enabled |
265+
| node\_pools\_names | List of node pools names |
266+
| node\_pools\_versions | List of node pools versions |
267+
| region | Cluster region |
268+
| service\_account | The service account to default running nodes as if not overridden in `node_pools`. |
269+
| type | Cluster type (regional / zonal) |
270+
| zones | List of zones in which the cluster resides |
271+
272+
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
273+
274+
To provision this example, run the following from within this directory:
275+
- `terraform init` to get the plugins
276+
- `terraform plan` to see the infrastructure plan
277+
- `terraform apply` to apply the infrastructure build
278+
- `terraform destroy` to destroy the built infrastructure

0 commit comments

Comments
 (0)