Skip to content

Commit

Permalink
Merge branch 'fb-optic-1553' of github.com:HumanSignal/label-studio i…
Browse files Browse the repository at this point in the history
…nto fb-optic-1553
  • Loading branch information
bmartel committed Feb 13, 2025
2 parents 4c4c486 + c7913bf commit 28e946f
Show file tree
Hide file tree
Showing 57 changed files with 1,180 additions and 85 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/docker-build-ontop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ jobs:
cat "${DOCKERFILE_PATH}"
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3.8.0
uses: docker/setup-buildx-action@v3.9.0

- name: Login to DockerHub
uses: docker/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
echo "build_version=$version" >> $GITHUB_OUTPUT
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3.8.0
uses: docker/setup-buildx-action@v3.9.0

- name: Login to DockerHub
uses: docker/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-release-promote.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ jobs:
fi
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3.8.0
uses: docker/setup-buildx-action@v3.9.0

- name: Login to DockerHub
uses: docker/[email protected]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: 'Follow Merge: Sync PR LSE'
name: 'Follow Merge: Dispatch'

on:
pull_request_target:
Expand All @@ -19,13 +19,10 @@ on:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}

env:
DOWNSTREAM_REPOSITORY: "label-studio-enterprise"

jobs:
sync:
name: "Sync"
if: startsWith(github.head_ref, 'fb-')
if: startsWith(github.head_ref, 'fb-') || (startsWith(github.head_ref, 'revert-') && contains(github.head_ref, '-fb-') )
runs-on: ubuntu-latest
steps:
- uses: hmarr/[email protected]
Expand Down Expand Up @@ -60,25 +57,20 @@ jobs:
body: [
'Hi @${{ github.actor }}!',
'',
`Unfortunately you don't have membership in ${owner} organization, your PR wasn't synced with ${owner}/${{ env.DOWNSTREAM_REPOSITORY }}.`
`Unfortunately you don't have membership in ${owner} organization, Follow Merge dispatch is skipped.`
].join('\n')
});
throw `${{ github.actor }} don't have membership in ${owner} organization`
- name: Dispatch Follow Merge Workflow
uses: actions/github-script@v7
env:
BRANCH_NAME: ${{ github.head_ref }}
- name: Checkout Actions Hub
uses: actions/checkout@v4
with:
github-token: ${{ secrets.GIT_PAT }}
script: |
const branch_name = process.env.BRANCH_NAME;
github.rest.actions.createWorkflowDispatch({
owner: "HumanSignal",
repo: "label-studio-enterprise",
workflow_id: "follow-merge-upstream-repo-sync-v2.yml",
ref: "develop",
inputs: {
branch_name: branch_name,
}
});
token: ${{ secrets.GIT_PAT }}
repository: HumanSignal/actions-hub
path: ./.github/actions-hub

- name: Dispatch label-studio-enterprise Follow Merge
uses: ./.github/actions-hub/actions/follow-merge-dispatch
with:
github_token: ${{ secrets.GIT_PAT }}
downstream_repository: "label-studio-enterprise"
3 changes: 1 addition & 2 deletions .github/workflows/follow-merge-upstream-repo-sync-v2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ jobs:
name: Sync PR
runs-on: ubuntu-latest
outputs:
adala: "${{ steps.upstream-prs.outputs.adala }}"
label-studio-query-vectordb: "${{ steps.upstream-prs.outputs.label-studio-query-vectordb }}"
label-studio-sdk: "${{ steps.upstream-prs.outputs.label-studio-sdk }}"
steps:
- uses: hmarr/[email protected]

Expand Down
7 changes: 7 additions & 0 deletions docs/source/guide/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,13 @@ Data in Label Studio is stored in one or two places, depending on your deploymen
- Project settings and configuration details are stored in Label Studio's internal database.
- Input data (texts, images, audio files) is hosted by external data storage and provided to the Label Studio by using URI links. The data is not stored in Label Studio directly, the content is retrieved client-side only.
- Project annotations are stored in the internal database, and optionally can be stored in a local file directory, a Redis database, or cloud storage buckets on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.


!!! info Tip
There are several advanced security options for AWS and GCP storage, including:
* [Application Default Credentials for GCP](storage#Application-Default-Credentials-for-enhanced-security-for-GCS) (on-prem only)
* [IP filtering for GCP storage](storage#IP-filtering-for-enhanced-security-for-GCS)
* [IP filtering and VPN for S3](storage#IP-Filtering-and-VPN-for-Enhanced-Security-for-S3-Storage)

### Secure database access

Expand Down
24 changes: 14 additions & 10 deletions docs/source/guide/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -532,12 +532,11 @@ In the Label Studio UI, do the following to set up the connection:
- Enable **Treat every bucket object as a source file** if your bucket contains BLOB storage files such as JPG, MP3, or similar file types. This setting creates a URL for each bucket object to use for labeling, such as `gs://my-gcs-bucket/image.jpg`. Leave this option disabled if you have multiple JSON files in the bucket with one task per JSON file.
- Choose whether to disable **Use pre-signed URLs**. If your tasks contain gs://... links, they must be pre-signed in order to be displayed in the browser.
- Adjust the counter for how many minutes the pre-signed URLs are valid.
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket. You can also use the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to specify this file. For example:
```bash
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```
8. In the **Google Application Credentials** field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.

**On-prem users:** Alternatively, you can use the `GOOGLE_APPLICATION_CREDENTIALS` environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See [Application Default Credentials for enhanced security](#Application-Default-Credentials-for-enhanced-security-for-GCS) below.
9. Click **Add Storage**.
10. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.
10. Repeat these steps for **Target Storage** to sync completed data annotations to a bucket.

After adding the storage, click **Sync** to collect tasks from the bucket, or make an API call to [sync import storage](/api#operation/api_storages_gcs_sync_create).

Expand All @@ -546,8 +545,17 @@ You can also create a storage connection using the Label Studio API.
- See [Create new import storage](/api#operation/api_storages_gcs_create) then [sync the import storage](/api#operation/api_storages_gcs_sync_create).
- See [Create export storage](/api#operation/api_storages_export_gcs_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_gcs_sync_create).

### Application Default Credentials for enhanced security for GCS

If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.

The recommended way to to do this is by using the `GOOGLE_APPLICATION_CREDENTIALS` environment variable. For example:

### IP Filtering for Enhanced Security for GCS storage
```bash
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```

### IP filtering for enhanced security for GCS

Google Cloud Storage offers [bucket IP filtering](https://cloud.google.com/storage/docs/ip-filtering-overview) as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.

Expand Down Expand Up @@ -624,10 +632,6 @@ gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter

</details>

#### Application Default Credentials as Advanced Security Approach

**Google ADC**: If you use Label Studio on-premises with Google Cloud Storage, you can set up [Application Default Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc) to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.


## Microsoft Azure Blob storage

Expand Down
24 changes: 24 additions & 0 deletions label_studio/core/all_urls.json
Original file line number Diff line number Diff line change
Expand Up @@ -1756,5 +1756,29 @@
"module": "django.contrib.auth.views.LogoutView",
"name": "rest_framework:logout",
"decorators": ""
},
{
"url": "/api/jwt/settings",
"module": "jwt_auth.views.JWTSettingsAPI",
"name": "jwt_auth:api-jwt-settings",
"decorators": ""
},
{
"url": "/api/token/",
"module": "jwt_auth.views.LSAPITokenView",
"name": "jwt_auth:token_manage",
"decorators": ""
},
{
"url": "/api/token/refresh/",
"module": "jwt_auth.views.DecoratedTokenRefreshView",
"name": "jwt_auth:token_refresh",
"decorators": ""
},
{
"url": "/api/token/blacklist/",
"module": "jwt_auth.views.LSTokenBlacklistView",
"name": "jwt_auth:token_blacklist",
"decorators": ""
}
]
3 changes: 0 additions & 3 deletions label_studio/core/feature_flags/stale_feature_flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@
'ff_front_dev_2432_auto_save_polygon_draft_210622_short': True,
'ff_front_1170_outliner_030222_short': True,
'fflag_fix_front_lsdv_4620_memory_leaks_100723_short': False,
'fflag_feat_optic_198_multi_select_users_short': True,
'fflag_fix_back_lsdv_5410_temporary_disable_auto_inference_jobs_short': True,
'fflag_feat_front_prod_292_archive_workspaces_short': True,
'fflag_feat_all_lsdv_4915_async_task_import_13042023_short': True,
'fflag_fix_all_lsdv_4971_async_reimport_09052023_short': True,
# Jan 16
Expand Down
1 change: 1 addition & 0 deletions label_studio/core/middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ def process_request(self, request) -> None:
or
# scim assign request.user implicitly, check CustomSCIMAuthCheckMiddleware
(hasattr(request, 'is_scim') and request.is_scim)
or (hasattr(request, 'is_jwt') and request.is_jwt)
):
return

Expand Down
5 changes: 4 additions & 1 deletion label_studio/core/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@
'annoying',
'rest_framework',
'rest_framework.authtoken',
'rest_framework_simplejwt.token_blacklist',
'drf_generators',
'core',
'users',
Expand All @@ -229,6 +230,7 @@
'labels_manager',
'ml_models',
'ml_model_providers',
'jwt_auth',
]

MIDDLEWARE = [
Expand All @@ -247,12 +249,13 @@
'core.middleware.ContextLogMiddleware',
'core.middleware.DatabaseIsLockedRetryMiddleware',
'core.current_request.ThreadLocalMiddleware',
'jwt_auth.middleware.JWTAuthenticationMiddleware',
]

REST_FRAMEWORK = {
'DEFAULT_FILTER_BACKENDS': ['django_filters.rest_framework.DjangoFilterBackend'],
'DEFAULT_AUTHENTICATION_CLASSES': (
'rest_framework.authentication.TokenAuthentication',
'jwt_auth.auth.TokenAuthenticationPhaseout',
'rest_framework.authentication.SessionAuthentication',
),
'DEFAULT_PERMISSION_CLASSES': [
Expand Down
1 change: 1 addition & 0 deletions label_studio/core/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@
path('heidi-tips/', views.heidi_tips, name='heidi_tips'),
path('__lsa/', views.collect_metrics, name='collect_metrics'),
re_path(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework')),
re_path(r'^', include('jwt_auth.urls')),
]

if settings.DEBUG:
Expand Down
58 changes: 56 additions & 2 deletions label_studio/feature_flags.json
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,33 @@
"version": 4,
"deleted": false
},
"ff_front_optic_1610_ask_ai_questions": {
"key": "ff_front_optic_1610_ask_ai_questions",
"on": false,
"prerequisites": [],
"targets": [],
"contextTargets": [],
"rules": [],
"fallthrough": {
"variation": 0
},
"offVariation": 1,
"variations": [
true,
false
],
"clientSideAvailability": {
"usingMobileKey": false,
"usingEnvironmentId": false
},
"clientSide": false,
"salt": "5a8d9cd75b6c4a5ea5985316ab34b25d",
"trackEvents": false,
"trackEventsFallthrough": false,
"debugEventsUntilDate": null,
"version": 3,
"deleted": false
},
"fflag-feat-dev-2887-comments-ui-editor-short": {
"key": "fflag-feat-dev-2887-comments-ui-editor-short",
"on": false,
Expand Down Expand Up @@ -1743,6 +1770,33 @@
"version": 2,
"deleted": false
},
"fflag_feat_back_optic_1157_set_ground_truths_action": {
"key": "fflag_feat_back_optic_1157_set_ground_truths_action",
"on": false,
"prerequisites": [],
"targets": [],
"contextTargets": [],
"rules": [],
"fallthrough": {
"variation": 0
},
"offVariation": 1,
"variations": [
true,
false
],
"clientSideAvailability": {
"usingMobileKey": false,
"usingEnvironmentId": false
},
"clientSide": false,
"salt": "7640d1db441f4a919924518c3dfa65f9",
"trackEvents": false,
"trackEventsFallthrough": false,
"debugEventsUntilDate": null,
"version": 2,
"deleted": false
},
"fflag_feat_back_optic_1579_force_memory_profiler": {
"key": "fflag_feat_back_optic_1579_force_memory_profiler",
"on": false,
Expand Down Expand Up @@ -4054,7 +4108,7 @@
},
"fflag_fix_front_optic_1608_improve_video_frame_seek_precision_short": {
"key": "fflag_fix_front_optic_1608_improve_video_frame_seek_precision_short",
"on": false,
"on": true,
"prerequisites": [],
"targets": [],
"contextTargets": [],
Expand All @@ -4076,7 +4130,7 @@
"trackEvents": false,
"trackEventsFallthrough": false,
"debugEventsUntilDate": null,
"version": 3,
"version": 4,
"deleted": false
},
"fflag_fix_leap_246_multi_object_hotkeys_160124_short": {
Expand Down
6 changes: 6 additions & 0 deletions label_studio/jwt_auth/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.contrib import admin
from rest_framework_simplejwt.token_blacklist.models import BlacklistedToken, OutstandingToken

# don't allow token management from admin console
admin.site.unregister(BlacklistedToken)
admin.site.unregister(OutstandingToken)
5 changes: 5 additions & 0 deletions label_studio/jwt_auth/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from django.apps import AppConfig


class JWTAuthConfig(AppConfig):
name = 'jwt_auth'
35 changes: 35 additions & 0 deletions label_studio/jwt_auth/auth.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import logging

from rest_framework.authentication import TokenAuthentication
from rest_framework.exceptions import AuthenticationFailed

logger = logging.getLogger(__name__)


class TokenAuthenticationPhaseout(TokenAuthentication):
"""TokenAuthentication with features to help phase out legacy token auth
Logs usage and triggers a 401 if legacy token auth is not enabled for the organization."""

def authenticate(self, request):
"""Authenticate the request and log if successful."""
from core.feature_flags import flag_set

auth_result = super().authenticate(request)
JWT_ACCESS_TOKEN_ENABLED = flag_set('fflag__feature_develop__prompts__dia_1829_jwt_token_auth')
if JWT_ACCESS_TOKEN_ENABLED and (auth_result is not None):
user, _ = auth_result
org = user.active_organization
org_id = org.id if org else None

# raise 401 if legacy API token auth disabled (i.e. this token is no longer valid)
if org and (not org.jwt.legacy_api_tokens_enabled):
raise AuthenticationFailed(
'Authentication token no longer valid: legacy token authentication has been disabled for this organization'
)

logger.info(
'Legacy token authentication used',
extra={'user_id': user.id, 'organization_id': org_id, 'endpoint': request.path},
)
return auth_result
Loading

0 comments on commit 28e946f

Please sign in to comment.