Mitigation Strategy: Message Signing (Cryptographic Signatures)
-
Choose a Secure Serializer: Select
auth
as the serializer in your Celery configuration. This uses HMAC for signing. -
Generate a Secret Key: Create a strong, random secret key. A good way to do this is using Python's
secrets
module:python -c "import secrets; print(secrets.token_urlsafe(64))"
. The longer the key, the better. -
Securely Store the Key: Never hardcode the key. Store it in an environment variable (e.g.,
CELERY_SECRET_KEY
). For production, use a secrets management system (e.g., HashiCorp Vault, AWS Secrets Manager). -
Configure Celery:
- Set
task_serializer = 'auth'
- Set
result_serializer = 'auth'
- Set
accept_content = ['auth', 'json']
(or just['auth']
) - Set the
CELERY_SECRET_KEY
environment variable in your worker and application environments.
- Set
-
Key Rotation: Establish a process for regularly rotating the secret key. This involves generating a new key, updating the configuration in all environments, and restarting Celery workers. The old key should be kept for a grace period to allow in-flight tasks to complete.
-
Verify Configuration: Ensure all workers and the application are using the same secret key and serializer settings.
-
Threats Mitigated:
- Unauthorized Task Execution: (Severity: Critical) - Prevents attackers from injecting arbitrary tasks into the queue.
- Message Tampering: (Severity: Critical) - Prevents attackers from modifying the content of legitimate task messages.
- Replay Attacks: (Severity: High) - While
auth
doesn't inherently prevent replay attacks, it makes them significantly harder. Combining it with unique task IDs and short task expiration times further mitigates this.
-
Impact:
- Unauthorized Task Execution: Risk reduced from Critical to Very Low.
- Message Tampering: Risk reduced from Critical to Very Low.
- Replay Attacks: Risk reduced from High to Low.
-
Currently Implemented:
- Serializer set to
auth
inceleryconfig.py
. - Secret key stored in environment variable
CELERY_SECRET_KEY
on worker servers. accept_content
configured correctly.
- Serializer set to
-
Missing Implementation:
- Key rotation process is not yet defined or automated.
- The application server (where tasks are sent from) does not yet have the
CELERY_SECRET_KEY
environment variable set, meaning it cannot send signed tasks. This needs to be added to the application server's deployment configuration.
-
Mitigation Strategy: Rate Limiting (Task Level)
-
Identify Rate-Limited Tasks: Determine which tasks are susceptible to abuse or could cause performance issues if executed excessively.
-
Apply
@task(rate_limit='...')
: Use Celery's built-in rate limiting decorator. For example,@task(rate_limit='10/m')
limits the task to 10 executions per minute. You can uses
(seconds),m
(minutes), orh
(hours). -
Consider Burst Limits: If you need to allow short bursts of activity, you might need a custom rate limiting solution (e.g., using Redis) that implements token bucket or leaky bucket algorithms. However, this is less directly Celery-specific.
-
Monitor and Adjust: Monitor the effectiveness of rate limits and adjust them as needed based on observed traffic patterns.
-
Threats Mitigated:
- Denial of Service (DoS) - Task Flooding: (Severity: High) - Limits the rate at which specific tasks can be executed, preventing attackers from overwhelming workers.
- Resource Exhaustion: (Severity: Medium) - Prevents excessive consumption of resources (CPU, memory, database connections) by frequently executed tasks.
-
Impact:
- Denial of Service (DoS) - Task Flooding: Risk reduced from High to Medium.
- Resource Exhaustion: Risk reduced from Medium to Low.
-
Currently Implemented:
- Rate limiting is applied to the
send_email
task (@task(rate_limit='5/m')
).
- Rate limiting is applied to the
-
Missing Implementation:
- No rate limiting is applied to other potentially resource-intensive tasks, such as
process_large_file
. These tasks need to be assessed and have appropriate rate limits applied.
- No rate limiting is applied to other potentially resource-intensive tasks, such as
-
Mitigation Strategy: Task Time Limits
-
Set
task_time_limit
: This is a hard time limit (in seconds). If a task exceeds this limit, it will be forcibly terminated. Set this in your Celery configuration. -
Set
task_soft_time_limit
: This is a soft time limit. If a task exceeds this limit, aSoftTimeLimitExceeded
exception will be raised within the task. Set this in your Celery configuration. -
Handle
SoftTimeLimitExceeded
: Within your tasks, includetry...except
blocks to catch theSoftTimeLimitExceeded
exception and handle it gracefully (e.g., releasing resources, logging the event). This is within the task, so it's less directly a Celery configuration item. -
Choose Appropriate Values: Set time limits based on the expected execution time of your tasks. Start with generous limits and gradually reduce them as you gain confidence in the task's performance.
-
Threats Mitigated:
- Denial of Service (DoS) - Long-Running Tasks: (Severity: High) - Prevents tasks from running indefinitely and consuming worker resources.
- Resource Exhaustion: (Severity: Medium) - Limits the amount of time a task can consume resources.
- Deadlocks/Hangs: (Severity: Medium) - Helps prevent tasks from getting stuck in infinite loops or deadlocks.
-
Impact:
- Denial of Service (DoS) - Long-Running Tasks: Risk reduced from High to Low.
- Resource Exhaustion: Risk reduced from Medium to Low.
- Deadlocks/Hangs: Risk reduced from Medium to Low.
-
Currently Implemented:
task_time_limit = 300
(5 minutes) is set globally inceleryconfig.py
.task_soft_time_limit = 240
(4 minutes) is set globally.
-
Missing Implementation:
- Not all tasks have specific
try...except
blocks to handleSoftTimeLimitExceeded
. This needs to be added to tasks that might benefit from graceful shutdown on timeout. Specifically, theprocess_large_file
task should handle this exception. (While this is important, it's within the task code, not a direct Celery setting.)
- Not all tasks have specific
-
Mitigation Strategy: Avoid Pickle Serializer
-
Use
json
orauth
: Configure Celery to use thejson
serializer for general data or theauth
serializer for signed messages. This is done via thetask_serializer
,result_serializer
, andaccept_content
settings. -
Remove
pickle
fromaccept_content
: Ensure thatpickle
is not included in theaccept_content
setting. This prevents Celery from accepting pickled data. -
Review Existing Code: If
pickle
was previously used, carefully review any code that might have deserialized pickled data and ensure it is no longer vulnerable.-
Threats Mitigated:
- Arbitrary Code Execution (via Deserialization): (Severity: Critical) - Prevents attackers from executing arbitrary code by sending malicious pickled data.
-
Impact:
- Arbitrary Code Execution (via Deserialization): Risk reduced from Critical to None (if
pickle
is completely avoided).
- Arbitrary Code Execution (via Deserialization): Risk reduced from Critical to None (if
-
Currently Implemented:
json
is the default serializer, andauth
is used for signed messages.pickle
is not inaccept_content
.
-
Missing Implementation:
- None. This mitigation is fully implemented.
-
Mitigation Strategy: Prefetch Limit (worker_prefetch_multiplier
)
-
Understand Prefetching: Celery workers prefetch tasks from the queue to improve efficiency. The
worker_prefetch_multiplier
setting controls how many tasks are prefetched per worker process. -
Lower the Multiplier: Set
worker_prefetch_multiplier = 1
(or a low value) in your Celery configuration. This reduces the number of tasks a worker will prefetch, making it less susceptible to being overwhelmed by a sudden burst of tasks. -
Consider Concurrency: The total number of prefetched tasks is
worker_prefetch_multiplier
*concurrency
(number of worker processes). -
Trade-off: Lowering the prefetch multiplier can slightly reduce performance in some scenarios, but it significantly improves resilience to DoS attacks.
-
Threats Mitigated:
- Denial of Service (DoS) - Worker Overload: (Severity: High) - Reduces the impact of a large number of tasks arriving simultaneously.
- Resource Exhaustion (Worker Level): (Severity: Medium) - Prevents a single worker from consuming excessive resources due to prefetching too many tasks.
-
Impact:
- Denial of Service (DoS) - Worker Overload: Risk reduced from High to Medium.
- Resource Exhaustion (Worker Level): Risk reduced from Medium to Low.
-
Currently Implemented:
- Not explicitly set; Celery is using the default value (which is usually 4).
-
Missing Implementation:
worker_prefetch_multiplier
should be set to1
inceleryconfig.py
to improve DoS resilience.
-
Mitigation Strategy: Result Expiration (result_expires
)
-
Set
result_expires
: In your Celery configuration, setresult_expires
to a value (in seconds) that represents how long task results should be stored in the result backend. After this time, the results will be automatically removed. -
Choose an Appropriate Value: Select a value that is long enough for your application to retrieve the results but short enough to minimize the window of exposure. Consider using a relatively short expiration time (e.g., a few hours or a day).
-
Threats Mitigated:
- Result Data Exposure: (Severity: Medium) - Reduces the time window during which task results are available in the result backend.
- Result Backend Storage Exhaustion: (Severity: Low) - Prevents the result backend from filling up with old, unused results.
-
Impact:
- Result Data Exposure: Risk reduced from Medium to Low.
- Result Backend Storage Exhaustion: Risk reduced from Low to Very Low.
-
Currently Implemented:
result_expires
is set to 86400 (24 hours).
-
Missing Implementation:
- None. The current setting is reasonable, but it could be reviewed and potentially shortened if the application doesn't need to retain results for a full day.
-