-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: DRY object_storage
#1147
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
WalkthroughThe changes update multiple storage modules by removing functions for fetching user dashboards, saved filters, and correlations and replacing them with a new, unified directory listing function Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant SP as Storage Provider
participant DS as Directory Listing Service
C->>SP: list_dirs_relative(relative_path)
SP->>DS: Read directory names (from file system/blob storage)
DS-->>SP: Return list of directory names
SP-->>C: Return Vec<String> directory list
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
src/storage/s3.rs (1)
515-555
: 🛠️ Refactor suggestionClean up the commented multipart upload code.
The commented code contains references to removed functionality and should be either:
- Updated to use the current API (preferred)
- Removed if no longer needed
🧹 Nitpick comments (7)
src/storage/localfs.rs (1)
361-380
: Consider adding a dedicated unit test.
This newlist_dirs_relative
function is straightforward, but adding a test case for an empty or non-existent directory would help ensure correctness and guard against edge cases.src/storage/azure_blob.rs (1)
668-681
: Validate return of common_prefixes.
The implementation oflist_dirs_relative
utilizinglist_with_delimiter
is correct for directory-level listings. Ensure that calling code expects only top-level subdirectories (any deeper path segments are flattened).src/storage/object_storage.rs (4)
125-148
: Slight inconsistency in path building for dashboards.
Here, the path is built via string concatenation and converted back to aRelativePathBuf
. Consider using a consistent approach (e.g.,RelativePathBuf::from_iter([USERS_ROOT_DIR, user, DASHBOARDS_DIR])
) for clarity and uniformity.
150-175
: Maintain consistent path handling.
Similar to dashboards, building correlation paths involves mixing string-based.from(format!(...))
andRelativePathBuf::from(...)
. Using a single pattern (e.g.,RelativePathBuf::from_iter
) would strengthen clarity and reduce the chance of mistakes.
776-776
: Ignoring removal errors may mask issues.
This line discards any I/O error if file removal fails. If preserving reliability is crucial, consider logging or handling this error.
783-783
: Same concern regarding file removal.
As above, consider logging the potential error instead of quietly ignoring it.src/storage/s3.rs (1)
486-512
: Address the multipart upload TODO.The multipart upload functionality is currently disabled with a TODO comment. This could impact performance when uploading large files.
Would you like me to help implement the multipart upload functionality? I can provide a solution that follows AWS best practices for handling large file uploads.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/storage/azure_blob.rs
(6 hunks)src/storage/localfs.rs
(3 hunks)src/storage/mod.rs
(4 hunks)src/storage/object_storage.rs
(4 hunks)src/storage/s3.rs
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
🔇 Additional comments (15)
src/storage/localfs.rs (3)
20-20
: No concerns with added HashSet import.
TheHashSet
type aligns with usage in functions likelist_streams
.
30-30
: Futures imports look appropriate.
No issues detected with the addition of concurrency-related imports.
37-39
: Import reorganization looks fine.
These changes pose no functional or stylistic concerns.src/storage/azure_blob.rs (2)
18-57
: Import adjustments appear reasonable.
All newly introduced items (e.g.,Arc
,Instant
, path usage) seem relevant to the changed code.
170-170
: No issues with the registry usage.
Registering the object store with the default registry remains consistent with the rest of the code.src/storage/object_storage.rs (4)
19-28
: New imports appear relevant.
They align with new methods that handle dashboards, filters, correlations, and file I/O.
39-40
: No concerns with logging imports.
Introducinginfo
,error
, andwarn
fromtracing
is beneficial for detailed logging.
43-56
: Additional crate references are valid.
The newly referenced modules (e.g.,correlation
,alerts
) align with the new methods below.
95-98
: Trait signature forlist_dirs_relative
.
Defining this method in the trait ensures a uniform directory-listing approach across all storage backends.src/storage/mod.rs (4)
19-22
: LGTM!The new imports are well-organized and necessary for the enhanced path handling functionality.
37-37
: LGTM!Making the module public aligns with the PR's DRY objectives by allowing code reuse across crates.
267-269
: LGTM!The utility function provides a clean abstraction for path conversion, promoting consistent path handling across the codebase.
81-82
: Verify the timeout values against AWS best practices.The timeout values seem reasonable, but let's verify them against AWS recommendations:
- Connection timeout: 5 seconds
- Request timeout: 300 seconds (5 minutes)
✅ Verification successful
🌐 Web query:
What are the recommended timeout values for AWS S3 SDK operations?
💡 Result:
AWS S3 SDK timeout recommendations vary by use case and SDK, but general best practices emphasize balancing fail-fast behavior with sufficient time for large operations. Key guidelines include:
General Timeout Strategies
- Large object transfers: Increase total timeout (
timeout
,operation_timeout
) while keeping per-attempt limits low
Example: iOS SDK setstimeout=600s
for large files butconnection_timeout=60s
[1]- Latency-sensitive apps:
- Retry GET/PUT after 2 seconds with exponential backoff (+4 sec per retry) [9]
- Track & retry slowest 5% of large requests (>128MB) [9]
- Avoid idle connections:
- VPC endpoints terminate idle connections after 350 seconds[6]
SDK-Specific Configurations
SDK Key Parameters Recommended Values Source Java apiCallTimeout
+apiCallAttemptTimeout
Total: <15min
Attempt: ~1.5-5sec[4][9] .NET Timeout
+ReadWriteTimeout
Default 100/300sec
Max for uploads[7] Rust operation_timeout
+connect_timeout
Total: ~5sec
Connect: ~3sec[2] C++ RequestTimeout
Increase from default 3sec [5][10] iOS timeout
+connectionTimeout
Large files: 600s
Fast fail:60s[1] Retry Optimization
// Node.js Lambda example using remaining time req.setTimeout(context.getRemainingTimeInMillis() - 500); // Reserve 500ms
- Set integration timeouts shorter than function timeout (e.g., API Gateway:29sec vs Lambda:3-6sec)[3]
- Use circuit breakers when exceeding downstream limits[9]
For high-throughput scenarios using KMS encryption, verify KMS request quotas align with your timeout strategy[9]. Monitor via CloudWatch/X-Ray to fine-tune values[3][9].
Citations:
- 1: https://aws.amazon.com/blogs/mobile/understanding-timeout-and-retry-options-in-the-aws-sdk-for-ios/
- 2: https://docs.aws.amazon.com/ja_jp/sdk-for-rust/latest/dg/timeouts.html
- 3: https://lumigo.io/aws-lambda-performance-optimization/aws-lambda-timeout-best-practices/
- 4: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/best-practices.html
- 5: S3: question about RequestTimeout aws/aws-sdk-cpp#346
- 6: https://repost.aws/questions/QUs9Xx9yWmTJOHk9FpwcRRbw/network-issue-to-s3-timeouts-in-s3
- 7: https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/retries-timeouts.html
- 8: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-guidelines.html
- 9: https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/timeouts-and-retries-for-latency-sensitive-applications.html
- 10: Default Timeout is Surprising aws/aws-sdk-cpp#57
Timeout Values Review: AWS Best Practices Comparison
- Connection Timeout (5 seconds): While some AWS SDK documentation for Rust suggests a connect timeout closer to ~3 seconds for standard operations, a 5-second timeout can be acceptable if your environment anticipates network variability.
- Request Timeout (300 seconds): A longer request timeout (5 minutes) fits scenarios involving large object transfers or operations that might take longer than typical API calls. This aligns with some SDK configurations (e.g., .NET) and the general advice to use extended timeouts where needed.
Overall, the chosen values are reasonable provided they match your application's use case. If most S3 operations are lightweight, you might consider reducing the connection timeout for fail-fast behavior. Otherwise, if long transfers are common, the 300-second request timeout is justified.
src/storage/s3.rs (2)
19-58
: LGTM!The imports are well-organized, logically grouped, and necessary for the enhanced functionality.
798-811
: LGTM!The new
list_dirs_relative
method provides a clean, reusable implementation for directory listing, aligning well with the PR's DRY objectives.
async fn get_all_saved_filters( | ||
&self, | ||
) -> Result<HashMap<RelativePathBuf, Vec<Bytes>>, ObjectStorageError>; | ||
) -> Result<HashMap<RelativePathBuf, Vec<Bytes>>, ObjectStorageError> { | ||
let mut filters: HashMap<RelativePathBuf, Vec<Bytes>> = HashMap::new(); | ||
|
||
let users_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR]); | ||
for user in self.list_dirs_relative(&users_dir).await? { | ||
let stream_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR, &user]); | ||
for stream in self.list_dirs_relative(&stream_dir).await? { | ||
let filters_path = RelativePathBuf::from(&stream); | ||
let filter_bytes = self | ||
.get_objects( | ||
Some(&filters_path), | ||
Box::new(|file_name| file_name.ends_with(".json")), | ||
) | ||
.await?; | ||
filters | ||
.entry(filters_path) | ||
.or_default() | ||
.extend(filter_bytes); | ||
} | ||
} | ||
Ok(filters) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential path construction bug for saved filters.
Within get_all_saved_filters
, the path components for each stream folder are combined incorrectly. Currently, the code does not preserve the intermediate directory hierarchy under USERS_ROOT_DIR/<user>
. As a result, calls to get_objects
may target a path that doesn’t reflect the intended subdirectory structure.
Below is an example fix where each found directory is appended properly:
for user in self.list_dirs_relative(&users_dir).await? {
let stream_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR, &user]);
for stream in self.list_dirs_relative(&stream_dir).await? {
- let filters_path = RelativePathBuf::from(&stream);
+ let filters_path = stream_dir.join(&RelativePathBuf::from(&stream));
let filter_bytes = self
.get_objects(
Some(&filters_path),
Box::new(|file_name| file_name.ends_with(".json")),
)
.await?;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async fn get_all_saved_filters( | |
&self, | |
) -> Result<HashMap<RelativePathBuf, Vec<Bytes>>, ObjectStorageError>; | |
) -> Result<HashMap<RelativePathBuf, Vec<Bytes>>, ObjectStorageError> { | |
let mut filters: HashMap<RelativePathBuf, Vec<Bytes>> = HashMap::new(); | |
let users_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR]); | |
for user in self.list_dirs_relative(&users_dir).await? { | |
let stream_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR, &user]); | |
for stream in self.list_dirs_relative(&stream_dir).await? { | |
let filters_path = RelativePathBuf::from(&stream); | |
let filter_bytes = self | |
.get_objects( | |
Some(&filters_path), | |
Box::new(|file_name| file_name.ends_with(".json")), | |
) | |
.await?; | |
filters | |
.entry(filters_path) | |
.or_default() | |
.extend(filter_bytes); | |
} | |
} | |
Ok(filters) | |
} | |
async fn get_all_saved_filters( | |
&self, | |
) -> Result<HashMap<RelativePathBuf, Vec<Bytes>>, ObjectStorageError> { | |
let mut filters: HashMap<RelativePathBuf, Vec<Bytes>> = HashMap::new(); | |
let users_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR]); | |
for user in self.list_dirs_relative(&users_dir).await? { | |
let stream_dir = RelativePathBuf::from_iter([USERS_ROOT_DIR, &user]); | |
for stream in self.list_dirs_relative(&stream_dir).await? { | |
let filters_path = stream_dir.join(&RelativePathBuf::from(&stream)); | |
let filter_bytes = self | |
.get_objects( | |
Some(&filters_path), | |
Box::new(|file_name| file_name.ends_with(".json")), | |
) | |
.await?; | |
filters | |
.entry(filters_path) | |
.or_default() | |
.extend(filter_bytes); | |
} | |
} | |
Ok(filters) | |
} |
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Refactor