Skip to content

Commit

Permalink
Repoint Terraform readme to Docsite (datacommonsorg#4872)
Browse files Browse the repository at this point in the history
  • Loading branch information
kmoscoe authored Jan 22, 2025
1 parent 48e02a2 commit 041ad35
Showing 1 changed file with 1 addition and 245 deletions.
246 changes: 1 addition & 245 deletions deploy/terraform-custom-datacommons/README.md
Original file line number Diff line number Diff line change
@@ -1,247 +1,3 @@
# Custom Data Commons Terraform Deployment

## Overview

Deploying your own Custom Data Commons instance on Google Cloud Platform (GCP) lets you host and manage your own data while leveraging [Google's Data Commons](https://datacommons.org/) repository of publicly available information. This deployment is managed using Terraform, which automates the provisioning of infrastructure and services on GCP.

## Features

* Creates Data Commons services container as a Cloud Run service, in region `us-central1`
* Creates Data Commons data management container as a Cloud Run job
* Enables all required Google Cloud APIs
* Creates Redis instance (optional)
* Creates Cloud SQL MySQL instance
* Creates new service account with minimum required permissions
* Automatically provisions required Google Maps API key. Stores key in GCP secrets container.
* Generates random MySQL password and stores in GCP secrets container.
* Supports multiple Data Commons instances in the same GCP account

## Architecture

![Custom Data Commons Architecture](https://docs.datacommons.org/assets/images/custom_dc/customdc_setup3.png)


## Pre-requisites

- **A [Google Cloud Project](https://cloud.google.com)**: You must have a GCP project where the custom Data Commons instance will be deployed.
- **[gcloud](https://cloud.google.com/sdk/docs/install)**: Ensure gcloud is installed on your local machine for the initial environment setup.
- **[Terraform](https://developer.hashicorp.com/terraform/install)**: Ensure Terraform is installed on your local machine to manage the infrastructure as code.

## Deployment Instructions

### 1. Configure Deployment Variables

From the root directory of the `website` repo, create a local copy of `terraform.tfvars` file and fill in the required values.

```
cd deploy/terraform-custom-datacommons/
cp ./modules/terraform.tfvars.sample ./modules/terraform.tfvars
```

Example `terraform.tfvars`:

```
project_id = "your-gcp-project"
namespace = "dan-dev"
dc_api_key = "your-api-key"
```

#### Required Variables

- **project_id**: The Google Cloud project ID where the resources will be created.
- **namespace**: A unique namespace to differentiate multiple instances of custom Data Commons within the same project.
- **dc_api_key**: Data Commons API key. [Request an API key](https://apikeys.datacommons.org)

#### Optional Configuration Variables

- **region**: The [GCP region](https://cloud.google.com/about/locations) where resources will be deployed.
- **enable_redis**: Set to true to enable redis caching (default: false)
- **dc_web_service_image**: Docker image to use for the services container. Default: `gcr.io/datcom-ci/datacommons-services:stable`. Set to `gcr.io/datcom-ci/datacommons-services:latest` to use the latest web service image.
- **dc_data_job_image**: Docker image to use for the data loading job. Default: `gcr.io/datcom-ci/datacommons-data:stable`. Set to `gcr.io/datcom-ci/datacommons-data:latest` to use the latest data job image.
- **make_dc_web_service_public**: By default, the Data Commons web service is publicly accessible. Set this to `false` if your GCP account has restrictions on public access. [Reference](https://cloud.google.com/run/docs/authenticating/public).
- **google_analytics_tag_id**: Set to your [Google Analytics Tag ID](https://support.google.com/analytics/answer/9539598) to enable Google Analytics tracking.

See `modules/variables.tf` for a complete list of optional variables.

### 2. Set up GCP Project

Enables necessary APIs in your Google Cloud project:

```bash
PROJECT_ID=your-gcp-project
./setup.sh $PROJECT_ID
```

### 3. Authenticate to GCP

Generate credentials for Google Cloud:

```bash
gcloud auth application-default login --project $PROJECT_ID
gcloud auth login
```
### 4. (Optional) Provision a docker artifact registry and push custom Data Commons images

An artifact registry can optionally be used to store custom Docker images for the Data Commons web service image.

Create a new artifact repository named `$PROJECT_ID-artifacts` in the `us-central1` region:

```bash
REGION=us-central1 # Or any other GCP region
./create_artifact_repository.sh $PROJECT_ID $REGION
```

Follow [these instructions to build custom Data Commons docker images](https://docs.datacommons.org/custom_dc/build_image.html).

For example, to build a custom Data Commons web service image:

```bash
docker build --platform linux/amd64 -f build/cdc_services/Dockerfile \
--tag custom-dc-services \
.
```

And follow [these instructions to push a new image to the artifact registry](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#pushing_images_to_a_repository).

For example:

```bash
PROJECT_ID=your-gcp-project
docker tag custom-dc-services:latest \
us-central1-docker.pkg.dev/$PROJECT_ID/$PROJECT_ID-artifacts/custom-dc-services:latest
docker push \
us-central1-docker.pkg.dev/$PROJECT_ID/$PROJECT_ID-artifacts/custom-dc-services:latest
```

To use this custom image in your Terraform deployment, set the `dc_web_service_image` variable to `us-central1-docker.pkg.dev/your-gcp-project/your-gcp-project-artifacts/custom-dc-services:latest`.

Example:

```
dc_web_service_image = "us-central1-docker.pkg.dev/my-gcp-project/datcom-website-dev-artifacts/custom-dc-services:latest"
```

### 5. Initialize Terraform

```bash
cd modules
```

Optionally use a Google Cloud Storage [backend](https://developer.hashicorp.com/terraform/language/settings/backends/configuration):

```bash
PROJECT_ID=your-gcp-project
gcloud auth application-default login --project $PROJECT_ID
gsutil mb -p $PROJECT_ID "gs://${PROJECT_ID}-datacommons-tf"

cat <<EOF >backend.tf
terraform {
backend "gcs" {
bucket = "${PROJECT_ID}-datacommons-tf"
prefix = "terraform/state"
}
}
EOF
```

Initialize Terraform and validate configuration:

```bash
terraform init
terraform plan
```

### 6. Provision and run Data Commons in GCP

Deploy custom Data Commons instance (takes about 15 minutes):

```bash
terraform apply
```

Terraform will prompt you to confirm the actions before creating resources. Review the plan and type `yes` to proceed.

Once the deployment is complete, terraform should output something like:

```
cloud_run_service_name = "<namespace>-datacommons-web-service"
cloud_run_service_url = "https://<namespace>-datacommons-web-service-abc123-uc.a.run.app"
dc_api_key = <sensitive>
gcs_data_bucket_name = "<namespace>-datacommons-data-<project-id>"
maps_api_key = <sensitive>
mysql_instance_connection_name = "<project-id>:us-central1:<namespace>-datacommons-mysql-instance"
mysql_instance_public_ip = "<mysql_ip>"
mysql_user_password = <sensitive>
redis_instance_host = "<redis_ip>"
redis_instance_port = 6379
```

### 6. Load custom data

Upload custom sample data to the GCS bucket specified by the terraform output `gcs_data_bucket_name` (`gs://<your-namespace>-datacommons-data-<your-project-id>`).
From the `website` repository's root directory, run:

```
# Replace `NAMESPACE` and `PROJECT_ID` with values from your `terraform.tfvars`
NAMESPACE=your-namespace
PROJECT_ID=your-project-id
DATA_BUCKET=${NAMESPACE}-datacommons-data-${PROJECT_ID}
gcloud storage cp custom_dc/sample/* gs://$DATA_BUCKET/input/
```

Load custom data into data commons:
```
# Replace `NAMESPACE` and `REGION` with values from your `terraform.tfvars`
NAMESPACE=your-namespace
REGION=us-central1
gcloud run jobs execute ${NAMESPACE}-datacommons-data-job --region=$REGION
```

Restart the services to pick up the new data:

```bash
terraform apply
```

### 7. Open Data Commons

Open your Custom Data Commons instance in the browser using the above
`cloud_run_service_url` (e.g, `https://<your-namespace>-datacommons-web-service-abc123-uc.a.run.app`),


## Using Terraform Workspaces and Namespace

If you need to deploy multiple instances of Data Commons within the same GCP project, or across different projects, you can use Terraform workspaces and the `namespace` variable.

- **Workspaces**: Terraform workspaces allow you to manage multiple instances of your Terraform configuration in the same directory. To create a new workspace:

```bash
terraform workspace new <workspace_name>
```

Switch between workspaces using:

```bash
terraform workspace select <workspace_name>
```
- **Namespace**: The `namespace` variable helps you to prefix resource names, ensuring they don't conflict when deploying multiple instances in the same GCP project. Set the `namespace` variable uniquely for each deployment.
### Example for Multiple Instances
To deploy multiple instances in the same project or across different projects:
1. **Different Projects**: Specify a different `project_id` for each deployment.
2. **Same Project**: Use different `namespace` values and Terraform workspaces to isolate the deployments.
```bash
terraform apply -var="namespace=instance1"
```
Repeat the above command with a different namespace to deploy another instance.
## Conclusion
By following these steps, you can deploy and manage custom instances of Data Commons on GCP using Terraform. This setup allows you to host your own data, integrate it with Google's public datasets, and run multiple instances within the same project or across different projects.
For more details on optional configurations, refer to the `variables.tf` file in the deployment directory.
The contents of this file have moved to https://docs.datacommons.org/custom_dc/deploy_cloud.html

0 comments on commit 041ad35

Please sign in to comment.