diff --git a/docs/source/guide/security.md b/docs/source/guide/security.md index 4fa900727e92..aedd56e13e6f 100644 --- a/docs/source/guide/security.md +++ b/docs/source/guide/security.md @@ -120,75 +120,17 @@ Once Label Studio tasks are created, users can view and edit tasks in their brow #### Source storage behind your VPC -!!! warning Google Cloud Storage - Google Cloud Storage does **not** support IP or VPN restrictions for pre-signed URLs, making this approach infeasible for GCS. As an alternative security measure for GCS, you can use **signed URLs with short lifetimes**. +To ensure maximum security and isolation of your data behind a VPC, only allow access to the Label Studio backend and users within your internal network. To do this, you can use the following technique — especially effective with Label Studio SaaS (Cloud, `app.humansignal.com`): -To ensure maximum security and isolation of your data behind a VPC, only allow access to users within your VPC. To do this, you can use the following technique — especially effective with Label Studio SaaS (Cloud, `app.humansignal.com`) and AWS S3: +1. Set **IP restrictions** for your storage to **allow Label Studio to perform task synchronization and generate pre-signed URLs** for media file serving. IP restrictions enhance security by ensuring that only trusted networks can access your storage. GET (`s3:GetObject` for S3) and LIST (`s3:ListBucket` for S3) permissions are required. The IP ranges for `app.humansignal.com` can be found in the documentation [here](saas#IP-range). -1. Set **IP restrictions** for your S3 storage to allow Label Studio to perform task synchronization and generate pre-signed URLs for media file serving. IP restrictions enhance security by ensuring that only trusted networks can access your storage. GET (`s3:GetObject`) and LIST (`s3:ListBucket`) permissions are required. The IP ranges for `app.humansignal.com` can be found in the documentation [here](saas#IP-range). +2. **Establish secure connection** between Storage and Users' Browsers: + - Configure a VPC private endpoint and route VPN traffic to it so that users' browsers can securely access the S3 bucket using only your Virtual Private Network (VPN). + - Or limit your storage access to certain IPs or VPCs. -2. **Establish your VPC Connection** between S3 Storage and Users' Browsers: - - Configure your network so that users' browsers can access the S3 bucket securely within your Virtual Private Cloud (VPC). This ensures that data transmission occurs over a private network, enhancing security by preventing exposure to the public internet. Administrators can set up this connection using AWS VPC endpoints or other networking configurations within their infrastructure. - - **Helpful Resources**: - - [AWS Documentation: VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html) - - [AWS Documentation: How to Configure VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/endpoint-services-overview.html) - -
-Bucket Policy Example for S3 storage - -!!! warning - These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see [How to regain access to an Amazon S3 bucket](https://repost.aws/knowledge-center/s3-accidentally-denied-access). - -Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS management console. Add the following policy: - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "DenyAccessUnlessFromSaaSIPsForListAndGet", - "Effect": "Deny", - "Principal": { - "AWS": "arn:aws:iam::490065312183:user/rw_bucket" - }, - "Action": [ - "s3:ListBucket", - "s3:GetObject" - ], - "Resource": [ - "arn:aws:s3:::YOUR_BUCKET_NAME", - "arn:aws:s3:::YOUR_BUCKET_NAME/*" - ], - "Condition": { - "NotIpAddress": { - "aws:SourceIp": [ - //// IP ranges for app.humansignal.com from the documentation - "x.x.x.x/32", - "x.x.x.x/32", - "x.x.x.x/32" - ] - } - } - }, -//// Optional - { - "Sid": "DenyAccessUnlessFromVPNForGetObject", - "Effect": "Deny", - "Principal": "*", - "Action": "s3:GetObject", - "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*", - "Condition": { - "NotIpAddress": { - "aws:SourceIp": "YOUR_VPN_SUBNET/32" - } - } - } - ] -} -``` -
+**Configuration examples:** + - [AWS S3 Storage: IP Filtering and VPN for Enhanced Security](storage#IP-Filtering-and-VPN-for-Enhanced-Security-for-S3-storage). + - [Google Cloud Storage: IP Filtering for Enhanced Security](storage#IP-Filtering-for-Enhanced-Security-for-GCS-storage). This image shows how you can securely configure source cloud storages with Label Studio using your VPC and IP restrictions @@ -196,16 +138,7 @@ Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS managem Label Studio + Cloud Storage VPC -#### Additional Notes - -**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually. - -**AWS S3 IAM**: In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An 'external ID' is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. See [Set up an S3 connection with IAM role access](storage#Set-up-an-S3-connection-with-IAM-role-access) - -**Storage Regions**: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server. -!!! note More details on Cloud Storages - See more details on [Source storage Sync and URI resolving](storage#Source-storage-Sync-and-URI-resolving). ### Secure access to Redis storage diff --git a/docs/source/guide/storage.md b/docs/source/guide/storage.md index d2e2ed4240ce..7058bfc94f4a 100644 --- a/docs/source/guide/storage.md +++ b/docs/source/guide/storage.md @@ -27,6 +27,7 @@ When working with an external cloud storage connection, keep the following in mi * Label Studio doesn't import the data stored in the bucket, but instead creates *references* to the objects. Therefore, you must have full access control on the data to be synced and shown on the labeling screen. * Sync operations with external buckets only goes one way. It either creates tasks from objects on the bucket (Source storage) or pushes annotations to the output bucket (Target storage). Changing something on the bucket side doesn't guarantee consistency in results. * We recommend using a separate bucket folder for each Label Studio project. +* Storage Regions: To minimize latency and improve efficiency, store data in cloud storage buckets that are geographically closer to your team rather than near the Label Studio server.
@@ -282,6 +283,14 @@ After you [configure access to your S3 bucket](#Configure-access-to-your-S3-buck After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync export storage](https://api.labelstud.io/api-reference/api-reference/export-storage/s-3/sync) +
+ +### S3 connection with IAM role access + +In Label Studio Enterprise, you can use an IAM role configured with an external ID to access S3 bucket contents securely. An 'external ID' is a unique identifier that enhances security by ensuring that only trusted entities can assume the role, reducing the risk of unauthorized access. See how to [Set up an S3 connection with IAM role access](https://docs.humansignal.com/guide/storage#Set-up-an-S3-connection-with-IAM-role-access) in the Enterprise documentation. + +
+
### Set up an S3 connection with IAM role access @@ -416,6 +425,72 @@ You can also create a storage connection using the Label Studio API. - See [Create new import storage](/api#operation/api_storages_s3_create) then [sync the import storage](/api#operation/api_storages_s3_sync_create). - See [Create export storage](/api#operation/api_storages_export_s3_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_s3_sync_create). +### IP Filtering and VPN for Enhanced Security for S3 storage + +To maximize security and data isolation behind a VPC, restrict access to the Label Studio backend and internal network users by setting IP restrictions for storage, allowing only trusted networks to perform task synchronization and generate pre-signed URLs. Additionally, establish a secure connection between storage and users' browsers by configuring a VPC private endpoint or limiting storage access to specific IPs or VPCs. + +Read more about [Source storage behind your VPC](security.html#Source-storage-behind-your-VPC). + +
+Bucket Policy Example for S3 storage +
+ +!!! warning + These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see [How to regain access to an Amazon S3 bucket](https://repost.aws/knowledge-center/s3-accidentally-denied-access). + + **Helpful Resources**: + - [AWS Documentation: VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html) + - [AWS Documentation: How to Configure VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/endpoint-services-overview.html) + +Go to your S3 bucket and then **Permissions > Bucket Policy** in the AWS management console. Add the following policy: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "DenyAccessUnlessFromSaaSIPsForListAndGet", + "Effect": "Deny", + "Principal": { + "AWS": "arn:aws:iam::490065312183:user/rw_bucket" + }, + "Action": [ + "s3:ListBucket", + "s3:GetObject" + ], + "Resource": [ + "arn:aws:s3:::YOUR_BUCKET_NAME", + "arn:aws:s3:::YOUR_BUCKET_NAME/*" + ], + "Condition": { + "NotIpAddress": { + "aws:SourceIp": [ + //// IP ranges for app.humansignal.com from the documentation + "x.x.x.x/32", + "x.x.x.x/32", + "x.x.x.x/32" + ] + } + } + }, +//// Optional + { + "Sid": "DenyAccessUnlessFromVPNForGetObject", + "Effect": "Deny", + "Principal": "*", + "Action": "s3:GetObject", + "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*", + "Condition": { + "NotIpAddress": { + "aws:SourceIp": "YOUR_VPN_SUBNET/32" + } + } + } + ] +} +``` +
+ ## Google Cloud Storage Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html/#Secure-access-to-cloud-storage). @@ -472,17 +547,21 @@ You can also create a storage connection using the Label Studio API. - See [Create export storage](/api#operation/api_storages_export_gcs_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_gcs_sync_create). -### IP Filtering for Enhanced Security +### IP Filtering for Enhanced Security for GCS storage Google Cloud Storage offers [bucket IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview) as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets. +Read more about [Source storage behind your VPC](security.html#Source-storage-behind-your-VPC). + **Common Use Cases:** - Restrict bucket access to only your organization's IP ranges - Allow access only from specific VPC networks in your infrastructure - Secure sensitive data by limiting access to known IP addresses - Control access for third-party integrations by whitelisting their IPs -**How to Set Up IP Filtering:** +
+How to Set Up IP Filtering +
1. First, create your GCS bucket through the console or CLI 2. Create a JSON configuration file to define IP filtering rules. You have two options: @@ -543,6 +622,13 @@ gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter [Read more about GCS IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview) +
+ +#### Application Default Credentials as Advanced Security Approach + +**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually. + + ## Microsoft Azure Blob storage Connect your [Microsoft Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) container with Label Studio. For details about how Label Studio secures access to cloud storage, see [Secure access to cloud storage](security.html#Secure-access-to-cloud-storage).