Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/support non mandatory key file #1129

Merged

Conversation

JBOClara
Copy link
Contributor

@JBOClara JBOClara commented Dec 6, 2023

What this PR does:

While storageSecretRef is optional, there is no way to configure medusa without passing a secret. But when we want to use role with IRSA , we don't want to passe a ~/.aws/credentials file to medusa.

Which issue(s) this PR fixes:
Fixes #1152
Fixes thelastpickle/cassandra-medusa#581

Linked to thelastpickle/cassandra-medusa#691

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (a6271b4) 57.17% compared to head (289587a) 57.15%.
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1129      +/-   ##
==========================================
- Coverage   57.17%   57.15%   -0.02%     
==========================================
  Files         103      103              
  Lines       10648    10655       +7     
==========================================
+ Hits         6088     6090       +2     
- Misses       4027     4032       +5     
  Partials      533      533              
Files Coverage Δ
controllers/k8ssandra/medusa_reconciler.go 62.19% <0.00%> (-1.96%) ⬇️
pkg/medusa/reconcile.go 62.22% <34.78%> (+0.15%) ⬆️

... and 3 files with indirect coverage changes

Copy link

@laurentbardin laurentbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I know very little about Go and Cassandra so... 🤷

Copy link
Contributor

@adejanovski adejanovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JBOClara, thanks for pushing this PR!
I did a quick review and posted a few comments to start the conversation.

@JBOClara
Copy link
Contributor Author

Hello @adejanovski
Could you give me a hint about this error https://github.com/k8ssandra/k8ssandra-operator/actions/runs/7120365184/job/19387445258?pr=1129#step:6:402 ?

I've no clue why is this failing.

@adejanovski
Copy link
Contributor

I've restarted it, since we have some flakes on the K8ssandraTask tests. Let's see how it goes.

@adejanovski
Copy link
Contributor

it passed ;)

@JBOClara JBOClara marked this pull request as ready for review December 27, 2023 16:27
@JBOClara JBOClara requested a review from a team as a code owner December 27, 2023 16:27
@JBOClara
Copy link
Contributor Author

Hello @adejanovski

Could you please review my PR and let me know if there's anything to add ? Are there any additional tests or documentation updates required to support this change?

Thank you for your guidance.

@JBOClara JBOClara force-pushed the fix/support-non-mandatory-key-file branch from 25f0347 to 955baae Compare January 3, 2024 09:53
@yesterdays-vigilante
Copy link

This is blocking us at the moment as well - what's required to get this into the operator? Happy to help if I can.

@adejanovski
Copy link
Contributor

This is blocking us at the moment as well - what's required to get this into the operator? Happy to help if I can.

Hi @yesterdays-vigilante,

if you could test this PR and verify that the feature works as expected, it would be much appreciated.
I think it's going to need a rebase on top of main first to get the upgrade to the latest patch release of Medusa which supports role based auth.
@JBOClara, do you think you could rebase and force push?

@yesterdays-vigilante
Copy link

Hi @adejanovski! I've deployed this PR into our staging environment, and it seems to work alright - Medusa is able to put the backups in S3 using role-based authentication, and Cassandra still seems to be running fine.

If needed I can also rebase this branch on my own fork and make a new PR.

@adejanovski
Copy link
Contributor

thanks a lot for the quick turnaround on testing this @yesterdays-vigilante 🙏

Let's give @JBOClara a tiny bit of time to get back to this. That PR has been open for a while and I'm the one who dropped the ball on following up with it.

@JBOClara JBOClara requested a review from adejanovski January 26, 2024 16:24
@adejanovski
Copy link
Contributor

Hi @JBOClara, it looks like you still have a conflict on go.sum.

@JBOClara JBOClara force-pushed the fix/support-non-mandatory-key-file branch from 955baae to 633372c Compare January 29, 2024 19:55
@adejanovski
Copy link
Contributor

@JBOClara, for the unit tests to pass you'll need to regenerate the manifests I think (using make manifests) and commit the new CRDs before pushing again.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions

7.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Copy link
Contributor

@adejanovski adejanovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is green and everything looks good. Thanks @JBOClara !

@adejanovski adejanovski merged commit 1f846da into k8ssandra:main Feb 1, 2024
59 of 62 checks passed
@bradfordcp
Copy link
Member

Thank you so much for working through this issue! I had worked around it in the past by providing an empty file via the referenced secret, but that always felt like a hack. Thanks again!

@c3-clement
Copy link
Contributor

c3-clement commented Apr 11, 2024

Hello @JBOClara , thank you so much for this change!
We were doing the same hack than @bradfordcp .

Is it also planned to support IRSA for Google storage and Azure storage?

We have to implement similar hacks like we did for S3 in order to support Google Cloud and Azure backups.

FYI @adejanovski

@JBOClara
Copy link
Contributor Author

JBOClara commented Apr 15, 2024

Hi @c3-clement,

I'm not able to test it on GCP or Azure.

But, have you already tried? Could you share the errors encountered?

Unfortunately, I don't plan to dedicate time to this topic.

@JamesXu2017
Copy link

Thank you so much for working through this issue! I had worked around it in the past by providing an empty file via the referenced secret, but that always felt like a hack. Thanks again!

I tried the fake the secret, and using sa I created with S3 policay and role setup, I still not able to get around on this. I am getting erros inside medusa-stanadalone
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity
Any suggestion? @adejanovski

@JBOClara
Copy link
Contributor Author

JBOClara commented Apr 18, 2024

There is no need for a fake secret. Just use role-based as credentialsType.

Then, check this is correctly configured :

  • Medusa runs with a specific k8s service account.
  • This k8s sa is annotated with the IAM Role ARN annotation for IRSA.
  • this IAM role has the permissions to write to the backup bucket
  • this role has a trusted policy, that allows to assume role with web identity, with a condition based on the namespace and k8s SA name (see AWS Doc).
  • your k8s cluster has a correctly configured IAM oidc provider https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html
  • this oidc provider is used in the trusted policy.

@adejanovski
Copy link
Contributor

That's a great explanation that we'd need to put in the docs @JBOClara !

@c3-clement
Copy link
Contributor

Hi @c3-clement,

I'm not able to test it on GCP or Azure.

But, have you already tried? Could you share the errors encountered?

Unfortunately, I don't plan to dedicate time to this topic.

Hello @JBOClara,

Yes I already tried. I had to patch k8ssandra-operator and medusa - I don't have anymore error logs, but here is what went wrong:

Azure:

medusa was crashing at this line https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/storage/azure_storage.py#L48 because it's expects a key to be provided, which is not needed with Workload identity. Only the storage account name is needed.

Also, Azure Workload Identity auth requires to set the label azure.workload.identity/use: "true" to pods, and k8ssandra-operator does not propagate labels set in data centers to the medusa standalone deployment.

Google:

In the backup secret mounted to medusa pods, we have to set a field credentials with an arbitrary value, otherwise medusa will crash.

Also, in medusa codebase, the GCS list implementation is accessing unsafely the field 'md5Hash' in the object returned by Google API
https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/storage/google_storage.py#L81

However, if the GCS bucket is KMS encrypted, the Google API does not expose the 'md5Hash' field (I have talked with Google cloud support about this)
So if the bucket is KMS encrypted, Medusa will crash. We fixed it by using the etag field instead.

I will open proper GitHub issues for all of this when I got a sec.
I will also like to offer fixes for those issues I got time @JBOClara @adejanovski

@c3-clement
Copy link
Contributor

Thank you so much for working through this issue! I had worked around it in the past by providing an empty file via the referenced secret, but that always felt like a hack. Thanks again!

I tried the fake the secret, and using sa I created with S3 policay and role setup, I still not able to get around on this. I am getting erros inside medusa-stanadalone botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity Any suggestion? @adejanovski

I believe that the serviceAccount configured in datacenters is not propagated to the Medusa standalone deployment.
Could it be the cause of your issue @JamesXu2017 ?
FYI @JBOClara

@JBOClara
Copy link
Contributor Author

Here is a PR #1293 @adejanovski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants