Skip to content

Latest commit

 

History

History
111 lines (86 loc) · 8.4 KB

File metadata and controls

111 lines (86 loc) · 8.4 KB

Mitigation Strategies Analysis for facebook/rocksdb

Mitigation Strategy: Enable and Verify Checksums

Description:

  1. Configuration: In the RocksDB options configuration (typically when opening the database), ensure that checksum or use_checksum is set to true. This is often the default, but explicit verification is crucial. Consider using a stronger checksum algorithm like kxxHash64 if performance allows. This setting applies to block-level checksums.
  2. Table-Level Checksums: If using table-level options, ensure checksumming is enabled there as well. This provides an additional layer of protection.
  3. Verification: Implement a periodic (e.g., daily or weekly, depending on data sensitivity and change rate) background task or scheduled job that calls DB::VerifyChecksum() on the open database instance. This actively checks the integrity of the entire database. Log the results of this verification.
  4. Error Handling: Implement robust error handling around the DB::VerifyChecksum() call. If a checksum error is detected, trigger alerts and initiate a data recovery process (e.g., restore from backup).

Threats Mitigated:

  • Silent Data Corruption (Severity: High): Caused by hardware errors, filesystem bugs, or even rare RocksDB bugs. Checksums detect data modifications that would otherwise go unnoticed.
  • Bit Rot (Severity: Medium): Gradual data degradation over time due to storage media issues. Checksums help detect this early.
  • Malicious Data Modification (Severity: Medium): If an attacker gains unauthorized access to the data files and can modify them without being detected by other mechanisms, checksums might help detect the tampering (though this is not their primary purpose; encryption is better for this).

Impact:

  • Silent Data Corruption: Risk significantly reduced. Checksums provide a high probability of detecting corruption.
  • Bit Rot: Risk significantly reduced. Early detection allows for timely intervention.
  • Malicious Data Modification: Risk slightly reduced; detection is possible, but not guaranteed.

Currently Implemented:

  • use_checksum = true is set in the main database options in database.cpp.
  • A basic DB::VerifyChecksum() call is present in a unit test (database_test.cpp), but it's not run regularly.

Missing Implementation:

  • No periodic background task or scheduled job to call DB::VerifyChecksum() in the production environment.
  • No robust error handling or alerting if DB::VerifyChecksum() detects an error.
  • No table-level checksum configuration.

Description:

  1. Analyze Workload: Understand the application's read/write patterns and data size. Use RocksDB's built-in statistics and tools (or external monitoring) to identify potential bottlenecks.
  2. Compaction Settings:
    • level0_file_num_compaction_trigger: Adjust this to control when L0 compactions are triggered. Too low a value can lead to excessive compactions.
    • max_bytes_for_level_base: Control the maximum size of L1. A larger L1 can reduce write amplification but increase read latency.
    • target_file_size_base: Control the target size of SST files.
    • write_buffer_size: Size of in-memory buffer.
    • Experiment with different compaction styles (level-based, universal, FIFO) using the RocksDB API.
  3. Memory Limits:
    • block_cache_size: Limit the size of the block cache (using RocksDB options) to prevent excessive memory consumption.
    • write_buffer_size: Control the size of the memtable (write buffer) via RocksDB options.
    • max_open_files: Limit the number of open files (RocksDB option) to prevent resource exhaustion.
  4. Monitoring: Continuously monitor RocksDB's internal statistics (memory usage, compaction statistics, etc.) using the provided APIs and tools. Adjust settings as needed.

Threats Mitigated:

  • Denial of Service (DoS) due to Write Amplification (Severity: Medium): Proper compaction tuning reduces write amplification, making the database less vulnerable to DoS attacks that flood it with writes. This is a RocksDB-specific DoS vector.
  • Denial of Service (DoS) due to Memory Exhaustion (Severity: High): Memory limits (configured within RocksDB) prevent attackers from consuming all available memory through crafted requests that exploit RocksDB's internal memory management.
  • Performance Degradation (Severity: Low): Proper tuning improves overall database performance and responsiveness.

Impact:

  • DoS due to Write Amplification: Risk moderately reduced.
  • DoS due to Memory Exhaustion: Risk significantly reduced.
  • Performance Degradation: Performance improved.

Currently Implemented:

  • Default RocksDB settings are used for compaction.
  • block_cache_size is set to a fixed value, but it hasn't been tuned based on workload.

Missing Implementation:

  • No workload analysis or performance monitoring using RocksDB's tools.
  • No tuning of compaction settings beyond defaults.
  • No limits on write_buffer_size or max_open_files within the RocksDB configuration.

Mitigation Strategy: Implement RocksDB Rate Limiting

Description:

  1. Create RateLimiter: Within your application code, create a rocksdb::RateLimiter object.
  2. Set Rate Limit: Configure the RateLimiter with the desired rate limit (bytes per second). This should be based on your system's capacity and performance requirements. Consider separate limits for different operations (e.g., writes, compactions).
  3. Integrate with RocksDB: Pass the RateLimiter object to the RocksDB Options when opening the database. This applies the rate limiting directly to RocksDB's internal operations.

Threats Mitigated:

  • Denial of Service (DoS) due to Resource Exhaustion (Severity: High): Rate limiting, specifically within RocksDB, prevents attackers from overwhelming the database with requests that would cause excessive disk I/O or other resource consumption within RocksDB's control*.
  • Performance Degradation (Severity: Low): Controlled resource usage can improve overall performance stability.

Impact:

  • DoS due to Resource Exhaustion: Risk significantly reduced.
  • Performance Degradation: Performance stability improved.

Currently Implemented:

  • No RocksDB RateLimiter is used.

Missing Implementation:

  • Complete absence of RocksDB-level rate limiting.

Description:

  1. Understand WAL Options: Familiarize yourself with RocksDB's Write-Ahead Log (WAL) options:
    • wal_ttl_seconds: How long to keep WAL files before deleting them.
    • wal_size_limit_mb: The maximum total size of WAL files.
    • manual_wal_flush: Whether to manually flush the WAL.
    • sync_wal: Whether to sync the WAL to disk on every write.
  2. Configure for Durability: Set these options in the RocksDB Options when opening the database. For high durability, consider:
    • A reasonable wal_ttl_seconds and wal_size_limit_mb to retain enough WAL data for recovery.
    • Setting sync_wal = true if you need to guarantee that every write is immediately durable (this has a performance cost).
  3. Use SyncWAL(): In your application code, use the DB::SyncWAL() method judiciously after critical operations where immediate durability is essential. Don't overuse it, as it impacts performance.

Threats Mitigated:

  • Data Loss due to Crashes/Power Failures (Severity: High): Properly configured WAL settings ensure that data is not lost if the application or system crashes before data is flushed to SST files. This is a core RocksDB data durability mechanism.

Impact:

  • Data Loss due to Crashes/Power Failures: Risk significantly reduced, depending on the specific WAL settings chosen (trade-off between performance and durability).

Currently Implemented:

  • Default WAL settings are used.

Missing Implementation:

  • No explicit configuration of WAL settings for the specific durability requirements of the application.
  • No use of DB::SyncWAL() for critical operations.