Skip to content

Commit

Permalink
docs: DIA-1872: Add ADC doc in GCS connection (#7032)
Browse files Browse the repository at this point in the history
Co-authored-by: nik <[email protected]>
Co-authored-by: caitlinwheeless <[email protected]>
  • Loading branch information
3 people authored Feb 11, 2025
1 parent f18871a commit 36814a4
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 10 deletions.
7 changes: 7 additions & 0 deletions docs/source/guide/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,13 @@ Data in Label Studio is stored in one or two places, depending on your deploymen
- Project settings and configuration details are stored in Label Studio's internal database.
- Input data (texts, images, audio files) is hosted by external data storage and provided to the Label Studio by using URI links. The data is not stored in Label Studio directly, the content is retrieved client-side only.
- Project annotations are stored in the internal database, and optionally can be stored in a local file directory, a Redis database, or cloud storage buckets on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.


!!! info Tip
There are several advanced security options for AWS and GCP storage, including:
* [Application Default Credentials for GCP](storage#Application-Default-Credentials-for-enhanced-security-for-GCS) (on-prem only)
* [IP filtering for GCP storage](storage#IP-filtering-for-enhanced-security-for-GCS)
* [IP filtering and VPN for S3](storage#IP-Filtering-and-VPN-for-Enhanced-Security-for-S3-Storage)

### Secure database access

Expand Down
24 changes: 14 additions & 10 deletions docs/source/guide/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -532,12 +532,11 @@ In the Label Studio UI, do the following to set up the connection:
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain gs://... links, they must be pre-signed in order to be displayed in the browser.
- Adjust the counter for how many minutes the pre-signed URLs are valid.
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket. You can also use the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to specify this file. For example:
```bash
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.

**On-prem users:** Alternatively, you can use the `GOOGLE_APPLICATION_CREDENTIALS` environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See [Application Default Credentials for enhanced security](#Application-Default-Credentials-for-enhanced-security-for-GCS) below.
9. Click **Add Storage**.
10. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.
10. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.

After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_gcs_sync_create).

Expand All @@ -546,8 +545,17 @@ You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_gcs_create) then [sync the import storage](/api#operation/api_storages_gcs_sync_create).
- See [Create export storage](/api#operation/api_storages_export_gcs_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_gcs_sync_create).

### Application Default Credentials for enhanced security for GCS

If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.

The recommended way to to do this is by using the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. For example:

### IP Filtering for Enhanced Security for GCS storage
```bash
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```

### IP filtering for enhanced security for GCS

Google Cloud Storage offers [bucket IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview) as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.

Expand Down Expand Up @@ -624,10 +632,6 @@ gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter

</details>

#### Application Default Credentials as Advanced Security Approach

**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.


## Microsoft Azure Blob storage

Expand Down

0 comments on commit 36814a4

Please sign in to comment.