This is a quick walkthrough of the possible way to configure this project and make it run on your BigQuery project.
Please remember this project is just a POC.
Be assured that the original author has no ill intent, but be also warned that the original author declines all responsibility for any user suffering any loss, GCP billing cost increase, or prejudice caused by:
-
Any feature, bug or error in the code
-
Any malicious code introduced into this project by a third party using any mean
(malicious fork, dependency injection, dependency confusion, etc.)
This documentation aims at informing the user as well as possible about the supported ways to connect this project to the user's GCP account, and the risks associated with each method. To learn more about GCP authentication, please refer to the official GCP documentation: https://cloud.google.com/docs/authentication/best-practices-applications
- Documentation: https://cloud.google.com/bigquery/docs/authentication/getting-started
- Pros: easy as pie
- Cons: not very safe
- Advice:
- Only do this with code that you trust
- Use this with a dummy GCP account that only have acces to a sandbox project.
- Don't use this method in production
Follow the instructions here: https://cloud.google.com/sdk/docs/install
(This step is necessary if you run this locally, but can be skipped if you run this project directly from inside GCP where the application default are pre-configured)
Run this command in a terminal:
gcloud auth application-default login
You can revoke the credentials at any time by running:
gcloud auth application-default revoke
import bigquery_frame
bigquery_frame.conf.GCP_PROJECT = "Name of your BigQuery project"
export GCP_PROJECT="Name of your BigQuery project"
- Documentation: https://cloud.google.com/bigquery/docs/authentication/service-account-file
- Pros: More secure
- Cons: A little more work involved
- Advice:
- We recommend this method
- Use a dedicated service account for this project
- Only give it the minimal access necessary for your test
Go to your project's Service Account page: https://console.cloud.google.com/iam-admin/serviceaccounts
(Please make sure you select the correct project.)
For example, you can call it bigquery-frame-poc
.
You can grant it the following rights
BigQuery Job User
on the project you want.BigQuery Data Viewer
on the project (or just on the specific datasets) that you want.
(If you want to grant access to a specific dataset, this can be done after the service account is created, directly in the BigQuery console, by clicking the "Share Dataset" button on a Dataset's panel)
Once the service account is created, click on it, go to the "KEYS" tab, and click on the "ADD KEY" button. You will automatically download a json Oauth2 file for this service account. Store it somewhere on your computer.
(If you have forked this repo and stored the credentials inside,
be careful not to commit it accidentally, use .gitignore
)
There are two possible variants here:
- Method 2.A: pass the path to the json file to bigquery-frame
- Method 2.B: pass directly the content of the json file to bigquery-frame
The first method is generally simpler to set up a local development environment, while the second method is generally easier for setting up automated CI pipelines.
import bigquery_frame
bigquery_frame.conf.GCP_CREDENTIALS_PATH = "Path to your service account credentials json file"
export GCP_CREDENTIALS_PATH="Path to your service account credentials json file"
When using this method, be careful to not accidentally get escaped newline characters "\n"
in your json content.
import bigquery_frame
bigquery_frame.conf.GCP_CREDENTIALS = "Content of your credentials json file"
export GCP_CREDENTIALS="Content of your credentials json file"
The constructor of the BigQueryBuilder
class takes a google.cloud.bigquery.Client
as argument, allowing the users to instantiate the client in any other way they
might prefer.