Mitigation Strategy: Least Privilege for Jaeger Agent
-
Description:
- Create a dedicated user: Create a new, unprivileged user account on the host system specifically for running the Jaeger Agent. Do not use an existing user account, especially not
root
or an account with administrative privileges. Example (Linux):useradd -r -s /sbin/nologin jaeger-agent
- Configure the Agent to run as this user: Modify the Agent's startup script or configuration (e.g., systemd unit file, Dockerfile) to specify the newly created user. For example, in a systemd unit file, use the
User=
andGroup=
directives. In a Dockerfile, use theUSER
instruction. - Restrict Capabilities (Containerized): If running the Agent in a container (highly recommended), use the container runtime's security features to limit the Agent's capabilities.
- Docker: Use the
--cap-drop
flag to drop unnecessary capabilities (e.g.,docker run --cap-drop=all --cap-add=net_bind_service ...
). Start with dropping all capabilities and add back only those absolutely required. - Kubernetes: Use a
securityContext
in the Pod definition to specifycapabilities: drop: ["ALL"]
and then selectively add back necessary capabilities. Also, consider usingreadOnlyRootFilesystem: true
if possible.
- Docker: Use the
- Security Contexts (seccomp, AppArmor, SELinux): Apply a security context to further restrict the Agent's system calls.
- seccomp: Create a seccomp profile that allows only the necessary system calls for the Agent and apply it to the container or process.
- AppArmor/SELinux: Create and enforce profiles that restrict the Agent's access to files, network resources, and other system resources.
- Regular Updates: Ensure the Agent is updated regularly. This can be automated through package managers or container image updates. Use official Jaeger releases.
- Monitoring: Implement monitoring to track the Agent's resource usage (CPU, memory, network), process activity, and any errors. Alert on anomalies.
- Create a dedicated user: Create a new, unprivileged user account on the host system specifically for running the Jaeger Agent. Do not use an existing user account, especially not
-
Threats Mitigated:
- Agent Compromise (Severity: High): An attacker gaining control of the Agent could manipulate trace data, potentially leading to incorrect diagnoses or masking malicious activity. They might also use the Agent's privileges to access the host system.
- Privilege Escalation (Severity: High): If the Agent runs with excessive privileges, a compromise could allow the attacker to escalate their privileges on the host system.
-
Impact:
- Agent Compromise: Significantly reduces the risk of the Agent being used as a vector for further attacks. Limits the potential damage an attacker can inflict.
- Privilege Escalation: Prevents the Agent from being used to gain higher-level access to the system.
-
Currently Implemented: [Placeholder: e.g., "Implemented in Kubernetes deployments using a dedicated service account and restricted securityContext. Not implemented for Agents running directly on VMs."]
-
Missing Implementation: [Placeholder: e.g., "Missing implementation for Agents running on bare-metal servers. Need to create dedicated user accounts and configure systemd unit files."]
Mitigation Strategy: Mutual TLS (mTLS) between Agent and Collector
-
Description:
- Generate Certificates: Generate TLS certificates for both the Jaeger Agent and the Jaeger Collector. Use a trusted Certificate Authority (CA), either a public CA or a private CA within your organization. Ensure the certificates have appropriate subject alternative names (SANs) to match the hostnames or IP addresses used for communication.
- Configure the Agent: Configure the Jaeger Agent to use its certificate and private key for TLS communication. Also, configure it to verify the Collector's certificate against the CA's certificate. This is typically done through command-line flags or configuration files (e.g.,
--reporter.grpc.tls.cert
,--reporter.grpc.tls.key
,--reporter.grpc.tls.ca
). - Configure the Collector: Configure the Jaeger Collector to use its certificate and private key for TLS communication. Also, configure it to require client certificates (mTLS) and to verify the Agent's certificate against the CA's certificate (e.g.,
--collector.grpc.tls.cert
,--collector.grpc.tls.key
,--collector.grpc.tls.ca
,--collector.grpc.tls.client-ca
). - Certificate Rotation: Implement a process for regularly rotating the certificates before they expire. This can be automated using tools like cert-manager in Kubernetes.
- Network Policies (Optional): As an additional layer of defense, use network policies to restrict traffic between Agent and Collector.
-
Threats Mitigated:
- Man-in-the-Middle (MITM) Attack (Severity: High): An attacker intercepting communication between the Agent and Collector could eavesdrop on trace data or inject malicious data.
- Unauthorized Agent Access (Severity: High): An attacker could send fabricated trace data to the Collector, potentially disrupting monitoring or masking malicious activity.
-
Impact:
- MITM Attack: Eliminates the risk of eavesdropping and data tampering during transit.
- Unauthorized Agent Access: Prevents unauthorized Agents from sending data to the Collector.
-
Currently Implemented: [Placeholder: e.g., "Implemented using self-signed certificates for testing. Planned migration to a private CA for production."]
-
Missing Implementation: [Placeholder: e.g., "Certificate rotation is not yet automated. Need to integrate with cert-manager."]
Mitigation Strategy: Secure Collector Deployment
-
Description:
- Least Privilege: Similar to the Agent, run the Collector with the least privilege necessary. Create a dedicated, unprivileged user account.
- Network Segmentation: Use a firewall and network segmentation to restrict access to the Collector's ports. Only allow connections from authorized Jaeger Agents and the Jaeger Query service. Use Kubernetes NetworkPolicies or cloud provider firewall rules.
- Rate Limiting: Implement rate limiting and throttling to prevent denial-of-service attacks. Use the Collector's built-in rate limiting features (e.g.,
--limit.max-traces
,--limit.max-spans
) or a reverse proxy (like Nginx or Envoy) in front of the Collector. - Regular Updates: Keep the Collector software up-to-date with the latest security patches. Use official Jaeger releases.
- Monitoring: Monitor the Collector's resource usage (CPU, memory, network), connection counts, and error rates. Alert on anomalies.
- Input Validation: Although Jaeger Collector primarily handles structured data, ensure any configuration options or API endpoints are validated to prevent injection attacks.
-
Threats Mitigated:
- Collector Compromise (Severity: High): An attacker gaining control of the Collector could access all trace data, potentially exposing sensitive information.
- Denial-of-Service (DoS) (Severity: Medium): An attacker could overwhelm the Collector with requests, making it unavailable to legitimate Agents.
- Unauthorized Access (Severity: High): An attacker could gain access to the Collector's API or data if network access is not properly restricted.
-
Impact:
- Collector Compromise: Reduces the attack surface and limits the potential damage from a compromise.
- DoS: Protects the Collector from being overwhelmed by malicious traffic.
- Unauthorized Access: Prevents unauthorized access to the Collector's resources.
-
Currently Implemented: [Placeholder: e.g., "Collector runs in a Kubernetes cluster with NetworkPolicies restricting access. Rate limiting is partially implemented."]
-
Missing Implementation: [Placeholder: e.g., "Need to fully configure rate limiting and implement more comprehensive monitoring."]
Mitigation Strategy: Secure Jaeger Query/UI Access
-
Description:
- Authentication: Implement strong authentication for accessing the Jaeger Query service and UI. Integrate with an existing identity provider (OAuth 2.0, OpenID Connect, LDAP) or use built-in authentication mechanisms, if available. Jaeger itself doesn't provide built-in authentication, so this often involves a reverse proxy or other external component. However, the configuration of how Jaeger Query connects to the backend and what data it exposes is Jaeger-specific.
- Authorization (RBAC): Implement role-based access control (RBAC) to restrict access to trace data based on user roles and permissions. This often requires custom implementation or integration with external systems, but the mapping of roles to trace data access is a Jaeger-specific concern.
- Rate Limiting: Implement rate limiting on the Query API to prevent denial-of-service attacks. Use
--query.max-traces
and other relevant flags provided by Jaeger. - Regular Updates: Keep the Jaeger Query software up-to-date. Use official Jaeger releases.
-
Threats Mitigated:
- Unauthorized Data Access (Severity: High): An attacker could gain access to trace data without proper authentication.
- DoS (Severity: Medium): An attacker could overwhelm the Query service with requests.
-
Impact:
- Unauthorized Data Access: Prevents unauthorized users from viewing trace data.
- DoS: Protects the Query service from being overwhelmed.
-
Currently Implemented: [Placeholder: e.g., "Authentication is implemented using OAuth 2.0 via a reverse proxy. "]
-
Missing Implementation: [Placeholder: e.g., "Need to implement RBAC to restrict trace data access based on user roles."]
Mitigation Strategy: Data Sanitization and Redaction (Agent/Client Side)
-
Description:
- Developer Education: Train developers on best practices for avoiding the inclusion of sensitive data (PII, API keys, passwords) in trace spans.
- Code Review: Implement code reviews to identify and prevent the logging of sensitive data in spans.
- Instrumentation Libraries: Use or develop instrumentation libraries that automatically redact or mask sensitive data before it is sent to the Jaeger Agent. This is a client-side mitigation, directly impacting how data is sent to Jaeger. This could involve:
- Regular Expressions: Use regular expressions to identify and replace sensitive patterns.
- Whitelisting: Define a whitelist of allowed data fields.
- Hashing/Encryption: Hash or encrypt sensitive data.
- Baggage Propagation: Encourage the use of baggage propagation for carrying contextual information without exposing sensitive details.
-
Threats Mitigated:
- Data Leakage (Severity: High): Sensitive data inadvertently included in trace spans could be exposed.
-
Impact:
- Data Leakage: Significantly reduces the risk of sensitive data being exposed in trace data.
-
Currently Implemented: [Placeholder: e.g., "Developer guidelines exist, but no automated redaction is in place."]
-
Missing Implementation: [Placeholder: e.g., "Need to develop instrumentation libraries with redaction capabilities and implement code review processes."]
Mitigation Strategy: Configure Sampling Appropriately
-
Description:
- Understand Sampling Strategies: Familiarize yourself with the different sampling strategies available in Jaeger (probabilistic, rate limiting, remote, etc.).
- Analyze Traffic Patterns: Analyze the traffic patterns and volume of your services.
- Configure Sampling: Configure sampling strategies appropriately for each service. Use the Jaeger client libraries' configuration options or environment variables.
- Probabilistic Sampling: Set a probability (e.g., 0.01 for 1% of traces).
- Rate Limiting: Set a maximum number of traces per second.
- Remote Sampling: Use the Jaeger Agent's remote sampling capabilities to dynamically adjust sampling rates.
- Monitor and Adjust: Regularly monitor the volume of trace data and adjust.
-
Threats Mitigated:
- Performance Degradation (Severity: Medium): Excessive trace data collection can impact performance.
- Storage Costs (Severity: Low): Collecting more data than necessary can increase storage costs.
- Data Exposure (Severity: Medium): Over-sampling increases the amount of data potentially exposed.
-
Impact:
- Performance Degradation: Optimizes resource usage.
- Storage Costs: Reduces storage costs.
- Data Exposure: Minimizes the amount of data at risk.
-
Currently Implemented: [Placeholder: e.g., "Probabilistic sampling is configured for most services."]
-
Missing Implementation: [Placeholder: e.g., "Need to implement remote sampling for high-traffic services and establish a review process."]
Mitigation Strategy: Jaeger-Specific Monitoring and Alerting
-
Description:
- Metrics Collection: Collect metrics specifically from Jaeger components (Agent, Collector, Query) using a monitoring system (e.g., Prometheus, Grafana, Datadog). Key Jaeger-specific metrics include those exposed by the components themselves (e.g., queue sizes, processing rates, error counts).
- Alerting: Configure alerts based on thresholds for critical Jaeger-specific metrics and error conditions. Examples:
- High Jaeger Agent or Collector CPU/memory usage.
- High Jaeger Collector queue backlogs.
- Jaeger-specific error rates (e.g., span rejection rates).
- Dashboards: Create dashboards to visualize key Jaeger metrics.
- Regular Review: Regularly review Jaeger-specific metrics and alerts.
-
Threats Mitigated:
- Undetected Compromise (Severity: High): Monitoring Jaeger-specific metrics can help detect unusual activity.
- Performance Degradation (Severity: Medium): Monitoring helps identify Jaeger-specific performance bottlenecks.
- Outages (Severity: High): Monitoring provides early warning of potential Jaeger component outages.
-
Impact:
- Undetected Compromise: Improves chances of detecting Jaeger-related security incidents.
- Performance Degradation: Allows proactive identification of Jaeger performance issues.
- Outages: Reduces the duration and impact of Jaeger outages.
-
Currently Implemented: [Placeholder: e.g., "Basic metrics collection using Prometheus and Grafana is in place for the Collector."]
-
Missing Implementation: [Placeholder: e.g., "Need to implement comprehensive monitoring for the Agent and Query service, and configure more specific alerts."]