Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds support for encryption #109

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

feat: adds support for encryption #109

wants to merge 3 commits into from

Conversation

brunocalza
Copy link
Member

@brunocalza brunocalza commented Oct 31, 2024

Summary

This PR adds support for encrypting files with encrypting key provided by the client

Closes #110

@carsonfarmer marking you as a reviewer. But maybe you can review more from a high-level protocol/cryptography/correctness perspective?
@dtbuchholz marking you as a reviewer. But maybe more from a CLI/SDK methods DX perspective?
@avichalp marking you as reviewer. Maybe more from code and Rust perspective? You already gave a lot of input in the cryptographic stuff

cc @andrewxhill @sanderpick @rohhan @oed

Pre-requisites for reviewing

Implementation overview

At a high level this is everything implemented in this PR:

  • Adds the options to encrypt objects by providing a 256-bit base64 encoded key in add_from_path and add_reader methods
  • If encryption is requested, we wrap the reader with encryption logic
    • The encryption logic can be grasped by having an understanding of the pre-requisites and reading Encryption POC using MinIO's KES (although this is not about KMS and S3, the core of the encryption logic is there)
  • Adds the option to decrypt objects in the get method
  • If decryption is requested, we wrap the writer with decryption logic
  • Enhances the CLI to account for encryption and decryption
# encrypting with base64-encoded encryption key
hoku bucket add -a t2h4egii4s7p76w5fuwg6srrefnjanhkxuxiew6ha --enc-c AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= --key hello.txt $(PWD)/hello.txt
# without decrypting
hoku bucket get -a t2h4egii4s7p76w5fuwg6srrefnjanhkxuxiew6ha  hello.txt 

# decrypting
hoku bucket get -a t2h4egii4s7p76w5fuwg6srrefnjanhkxuxiew6ha --enc-c AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= hello.txt

# decrypting with range
hoku bucket get -a t2h4egii4s7p76w5fuwg6srrefnjanhkxuxiew6ha --enc-c AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= --range 2-4 hello.txt
  • Adds support for range queries over encrypted data
  • Adds integration tests that require a network running

Review orientation

  • Start at sdk/src/machine/bucket.rs, which is the crux of the SDK methods and we how they were altered to support encryption
  • Start at encryption, then go to decryption
  • Play with the CLI
  • Try writing a code as an example of how to add an encrypted object and how to retrieve it

Additional notes

only for customer supplied key

Signed-off-by: Bruno Calza <[email protected]>
@brunocalza brunocalza self-assigned this Oct 31, 2024
@@ -116,6 +116,9 @@ struct BucketPutArgs {
broadcast_mode: BroadcastMode,
#[command(flatten)]
tx_args: TxArgs,
/// Encrypt/Decrypt objects using 256-bit encryption keys. Formats: RawBase64.
#[arg(long)]
enc_c: Option<String>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new option to put comand is added (--enc-c). It's through this option that the client provides the encryption key. It must be a 256-bit base64-encoded key

@@ -171,6 +174,9 @@ struct BucketGetArgs {
/// or a specific block height, e.g., "123".
#[arg(long, value_parser = parse_query_height, default_value = "committed")]
height: FvmQueryHeight,
/// Encrypt/Decrypt objects using 256-bit encryption keys. Formats: RawBase64.
#[arg(long)]
enc_c: Option<String>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new option to get comand is added (--enc-c). It's through this option that the client provides the encryption key to decrypt the object. It must be the same used for encryption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--enc-c sounds a bit confusing name for an argument to me. is that a convention for encryption CLIs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just using the same name used by minio's client. I don't think there's a convention, aws cli uses --sse-c and google cloud cli uses --encryption-key

}

#[pin_project]
pub struct DecryptWriter<W: AsyncWrite + Unpin> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object is probably very hard to understand. And it took me a while to get it right (mostly because of my lack of understanding of Pin).

In essence, it decorates a writer that implements AsyncWrite. The idea is that as the data is written to this writer we intercept it, decrypt it, and fill the inner writer.

This is the function that does the wrapping logic.

You can pass a Filter to this object in case of a Range query. So, if you want to apply a range query on encrypted data you use with_filter, if not you use new

The implementation follows a state machine with the following states:

  • ReadingHeader: the state where we need to grab the next 16B of the payload and decode the header
  • Decrypting: the state where grab the rest of the package 65 KiB (ciphertext) + 16B (tag), decrypt it and fill the decrypted buffer
  • Writing: the state where we asynchronously write to the inner writer`

The Filter struct means:

  • offset: any byte less than offset should be ignored and not written to the inner
  • length: How many bytes should we include from the offset? Anything more than offset + length should be ignored
  • consumed: This is just to indicate how many bytes we have already consumed (or peeked, or considered) from the beginning

}
}

fn filter_bytes<'a>(&mut self, plaintext: &'a [u8]) -> &'a [u8] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a complicated method, but in essence, we are trying to figure out how much of the plaintext chunk (decrypted content chunk) falls within offset + length.

consumed is used as a pointer (not in a programming language sense), and we do math based if we have reached offset or if we have passed offset+ length. We use remaining to indicate how many bytes we still need to write. After remaining reaches 0, we ignore everything

I would review this last, also take a look at the test

.map_err(|err| Error::new(ErrorKind::InvalidData, err.to_string()))?;

#[allow(clippy::drop_non_drop)]
drop(this);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ugly, but I cannot call the method filter_bytes on this or self for some reason. The borrow checker complains. There's something about it I still don't understand, so I came up with this hack.

self.domain.clone()
}

pub fn unseal(&self, kek: String, object_path: &str) -> anyhow::Result<ObjectKey> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we decrypt ObjectKey

fn size_decrypted(&self) -> u64;
}

impl EncryptedObjectExt for Object {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're extending Hoku Object to add some functionality:

  • Is this object encrypted?
  • Which kind of encryption was used (client or kms) note: kms is not really implemented
  • Give me the sealed object key
  • Give me the size of the decrypted object

This makes the client code cleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

let iv = sealed_object_key.iv_as_string()?;
let algorithm = sealed_object_key.algorithm();

let mut metadata = HashMap::new();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to store these metadata for decryption

use std::collections::HashMap;
use tokio::io::AsyncRead;

pub fn encrypt_reader<R: AsyncRead>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the encryption logic

  1. derives object key
  2. wraps reader
  3. encrypts object key
  4. stores metadata

end: Option<u64>,
}

impl Range {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object is responsible for range logic

For an unencrypted object, there's no logic. We pass to Hoku whatever the user sent. For encrypted, we need to transform to a new range because encrpyted content is bigger, and also i need to retrieve the entire package to decrypt even if part of it is not withing the range. Check this test to understand this

@brunocalza brunocalza marked this pull request as ready for review October 31, 2024 21:31
}
// Read the first 10 bytes of your downloaded 100 bytes
let mut read_file = tokio::fs::File::open(&obj_path).await?;
let mut contents = vec![0; 10];
read_file.read(&mut contents).await?;
let _ = read_file.read(&mut contents).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may not need let _, if you just unwrap here. probably okay to unwrap in the example code

Copy link
Contributor

@carsonfarmer carsonfarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @brunocalza. This is a serious PR. I've done two passes now. I'm not 100% confident I understand all that is going on, but it looks correct from a high-level. I'm marking approved, so as not to slow things down. But I will likely want to do another pass over this to double-check nonce use etc.

}

// Encrypt the chunk and store it in the buffer
// TODO: handle error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to do this before initial merge? Or can we wait? It seems reasonable to panic here... but you might just have to wrap this error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better handling it now

fn size_decrypted(&self) -> u64;
}

impl EncryptedObjectExt for Object {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

@carsonfarmer
Copy link
Contributor

This build fails because it has no access to the dare repo. Can someone set those deploys keys for me?

Who do we need to do this for @brunocalza?

@avichalp
Copy link
Member

@brunocalza: added the deploy key and secret in the repos. the build is still failing due to some lint issue but the keys should work now: #124

@brunocalza
Copy link
Member Author

holding this off for a bit until we have more clarity when we want to land this feature

@brunocalza brunocalza marked this pull request as draft November 27, 2024 22:27
@sanderpick sanderpick added the archived Archived for now label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archived Archived for now
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable encryption with client encryption key
4 participants