Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1131: AWS Tags DAY2 Update #297

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

anirudhAgniRedhat
Copy link

@anirudhAgniRedhat anirudhAgniRedhat commented Oct 11, 2024

This PR introduces a custom EBSVolumeTagController that monitors the OpenShift Infrastructure resource for changes in AWS ResourceTags. When tags are updated, the controller automatically fetches all AWS EBS-backed PersistentVolumes (PVs) in the cluster, retrieves their volume IDs, and updates the associated EBS tags in AWS.

Key Changes:

Monitors Infrastructure resource for AWS ResourceTags updates.
Directly fetches all PVs using the AWS EBS CSI driver (ebs.csi.aws.com).
Updates AWS EBS tags by merging new and existing tags using the AWS SDK.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 11, 2024

@anirudhAgniRedhat: This pull request references CFE-1131 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR introduces a custom EBSVolumeTagController that monitors the OpenShift Infrastructure resource for changes in AWS ResourceTags. When tags are updated, the controller automatically fetches all AWS EBS-backed PersistentVolumes (PVs) in the cluster, retrieves their volume IDs, and updates the associated EBS tags in AWS.

Key Changes:

Monitors Infrastructure resource for AWS ResourceTags updates.
Directly fetches all PVs using the AWS EBS CSI driver (ebs.csi.aws.com).
Updates AWS EBS tags by merging new and existing tags using the AWS SDK.
Graceful operator restart on tag changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 11, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2024
@anirudhAgniRedhat
Copy link
Author

/hold
Please don't review now currently WIP

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2024
@openshift-ci openshift-ci bot requested review from dobsonj and RomanBednar October 11, 2024 11:28
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from 4d92e01 to 14ecc0e Compare October 14, 2024 12:42
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 14, 2024

@anirudhAgniRedhat: This pull request references CFE-1131 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR introduces a custom EBSVolumeTagController that monitors the OpenShift Infrastructure resource for changes in AWS ResourceTags. When tags are updated, the controller automatically fetches all AWS EBS-backed PersistentVolumes (PVs) in the cluster, retrieves their volume IDs, and updates the associated EBS tags in AWS.

Key Changes:

Monitors Infrastructure resource for AWS ResourceTags updates.
Directly fetches all PVs using the AWS EBS CSI driver (ebs.csi.aws.com).
Updates AWS EBS tags by merging new and existing tags using the AWS SDK.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from d68d71a to 44e34e7 Compare October 14, 2024 16:00
@anirudhAgniRedhat anirudhAgniRedhat changed the title [WIP] CFE-1131: AWS Tags DAY2 Update CFE-1131: AWS Tags DAY2 Update Oct 14, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 14, 2024
@anirudhAgniRedhat
Copy link
Author

/unhold
open for reviews

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 15, 2024

go ebsTagsController.Run(ctx)

klog.Info("EBS Volume Tag Controller is running")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TBD - cleanup info logs before merge.

@anirudhAgniRedhat
Copy link
Author

/cc @jsafrane
PTAL!!

@openshift-ci openshift-ci bot requested a review from jsafrane October 21, 2024 04:16
Copy link
Contributor

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update cluster-storage-operator to add token-minter sidecar.

Comment on lines 240 to 241
if err != nil {
klog.Errorf("Error updating tags for volume %s: %v", volumeID, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be retry with exp. backoff. Esp. when CreateTags calls are throttled by AWS.
That probably implies a queue of PVs.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch 2 times, most recently from 146634c to 858b442 Compare October 22, 2024 18:47
infraInformer := c.commonClient.ConfigInformers.Config().V1().Infrastructures().Informer()

// Add event handler to process updates only when ResourceTags change
infraInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have do this. If we are using factory.WithInformers, we will reconcile this within Sync function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gnufied I was bit confused with this!! basically I don't want to run reconciliation on every change on InfraStructure resource. I need to run reconciliation only if there is a any change in infra.Status.PlatformStatus.AWS.ResourceTags.

Can you suggest a better way to do this so that we can remove the unnecessarily computes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work when controller is restarted? You are also doing WithInformers below and hence any change in infra object will still trigger Sync.

So - most bulletproof way of ensuring that, we don't unnecessarily process all PVs is to store the information that we have processed these PVs somewhere in a persistent way.

So, what we have currently is worst of both the worlds. If I were to design this, I will probably make a hash of sorted tags and annotate PV with tag hash. If tag hash annotation in PV and computed tag hash don't change, then I will not update the PV or else I will.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work when controller is restarted? You are also doing WithInformers below and hence any change in infra object will still trigger Sync.

What I am thinking is on every restart I would like to run a sync function and update the volumes if there is a change and further On each change in resource-tags we would again run the sync.

So, what we have currently is worst of both the worlds. If I were to design this, I will probably make a hash of sorted tags and annotate PV with tag hash. If tag hash annotation in PV and computed tag hash don't change, then I will not update the PV or else I will.

This looks easy way to manage this!! I will then add a new field in controller struct which will have the updated hash for the sorted map of tags! now in each reconciliation I will update tags only if the hash is different from the other one? Does this sounds better to you??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is not what I said. I said, we should store hash of sorted tags in PV objects as annotation and compare those with current tags we are about to apply. We should only apply tags with AWS if hashes change.

Storing them just in-memory doesn't help us much.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Thanks For Suggestion!!

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from 858b442 to fe7b17e Compare October 23, 2024 08:31
if err != nil {
return err
}
err = c.processInfrastructure(infra, ec2Client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does this controller needs to be an opt-in or a default controller? I don't assume every OCP customer wants this feature and if tagging were to fail after OCP upgrade, their clusters will be degraded and we will have support nightmare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is presented as enabled by default + no opt out for all HyperShift clusters in the enhancement.

}

// startFailedQueueWorker runs a worker that processes failed batches independently
func (c *EBSVolumeTagsController) startFailedQueueWorker(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need this function at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason I brought this change is I would like to retry the the batches which have failed to update tags!! Here I would like to update tags in a serial order(one By one) Discussuion link for the volumes. I cannot use the similar sync function here or else you could suggest a better way for this!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to per-pv hash, then we are not going to need this right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The points I want to make sure here are

  1. I definitely need to batch volumes in order to not to hit throttling condition.
  2. Now on failure we would need to retry!!
  3. Since we are using batch APIs, so AWS SDK's APIs will give us error if any one of the VolumeID in the batch hits the any error(May be validation, Auth, Permission etc), All the VolumeIDs in that batch will not be able to update the tags! In this condition I would like to add a worker queue that will handle the the serial update of PV's tags and will retry to update the tags in exponential back-off time-period.

Since We need to figure out the trade-off between the either in first place we should update all PVs using AWS API's in serial order and retry using similar sync function or should we use Batch Volumes to be called in the sync Functions and later retry should be processed with the queue function serially until the queue is empty!!

Copy link
Member

@gnufied gnufied Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely need to batch volumes in order to not to hit throttling condition.

That is fine. You are already batching PVs in fetchPVsAndUpdateTags.

Now on failure we would need to retry!!

We are unnecessarily complicating this. The controller is going to resync every 20 minutes anyways, so it will try to tag all the PVs which aren't tagged. So do we even need to keep separate queue for failed PVs? I am also afraid that, your failed worker queue is going to race with normal controller resync.

What is the point of separate failed worker queue when every 20 minutes, we are going to try and sync tag for all PVs which doesn't have matching tag-hash? If you really want a separate worker queue, you will have to redesign the whole thing, so as at least they are not racy. But I don't really understand point of doing it.

Copy link
Author

@anirudhAgniRedhat anirudhAgniRedhat Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gnufied I completely agree that this retrying the failure is unnecessarily making the changes complicated.

@jsafrane I had a chat with @TrilokGeer regarding this, IMHO we should drop the idea to add degraded condition on tags update failure. As the cluster should not be degraded for failure to apply Tags. as the sync will anyhow retry to tag the volumes withing resync period, in this way we will not immediately require to retry the failures and can remove this queue worker.
Also anyways we are emitting the warning events from the this controller if the tags update is failed. so user will know that the tags update has been failed due the the certain reason also can think of alerts based on that.

/cc @TrilokGeer
Can you also put your views on this!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW are we planning to backport this PR to older releases?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied I guess we would need to backport this to 4.17. Slack Thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I noticed that you removed failed worker logic. The thing is - I was talking to @jsafrane offline and he has me convinced that, we do need some kind of additional logic so as we can try retagging of "failed" PVs one-by-one, rather than in a batch. This will ensure that, one bad apple in a batch doesn't prevent tagging of rest of the PVs.

But - we need to be careful when doing this.

  • We should make sure that, PVs which will be tagged via failed worker, doesn't get processed via regular controller resync (so no race).
  • I would move the entire failed worker code in a separate file.

Copy link
Author

@anirudhAgniRedhat anirudhAgniRedhat Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack;
I believe we can remove the ResyncEvery parameter from the controller builder as this is a overkill for us why do we want to run the sync function in every 20 mins if nothing has changed in the resource Tags.

Alternatively, If you really think resync is important here then, I think that we can add the handle the race condition by using another annotation in PVs for Tagging status and filtering based on tagshash and status in the annotation. But this will also cost us some volume Update calls and further need to handle cases where we are not able to update the status within batches. WDYT?

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch 3 times, most recently from a9b8b54 to 42579da Compare October 25, 2024 11:40

// writeCredentialsToTempFile writes credentials data to a temporary file and returns the filename
func writeCredentialsToTempFile(data []byte) (string, error) {
f, err := os.CreateTemp("", "aws-shared-credentials")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work with readOnlyFilesystem field we are planning to add to operator pods? cc @dfajmon @jsafrane

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to exclude tmp or mount tmp as emptydir for this to work once readOnlyFileSystem is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's as you say, this would mean excluding the /tmp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have updated the implementation required for AWS session authentication. Now, the session is created using the role-arn data and will be re-authenticated if it expires. We will not need to create the temp file for authenticating the session.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch 2 times, most recently from 1ec389b to 48122e1 Compare February 3, 2025 06:09
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from 48122e1 to b6cc22d Compare February 5, 2025 12:38
Comment on lines 133 to 135
c.mutex.Lock()
c.awsSession = sess
c.mutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions about locking and if we can make this a single-threaded sync loop apply here.
#313 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already addressed same in #313 (comment)

PTAL

Copy link
Author

@anirudhAgniRedhat anirudhAgniRedhat Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dobsonj based on the suggestion in #313 (comment)
I have updated the implementation. Thanks for the suggestion.

Putting some points based on the things I changed it would be easier to review then.

  1. Using a single worker just to update the tags by AWS API calls.
  2. I am passing the updateType and []pvNames which needs to be updated via queue worker thread.
  3. Have added updateType because we wanted to handle the batch and individual request seperately.
  4. Used the Events to let the user know that something has failed on tags update.
  5. removed rate limiter as we would not need it now.
  6. we are still using mutex to lock the queueSet mao to know whether pv's still in worker queue we would not want to push them again if they are still in the queue to be updated.
  7. Not passing completely set of PV resource to optimize the memory resources for the operator. Also updating PV annotation if new update is done on top of it will returns us error.
  8. Added some unit tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @anirudhAgniRedhat , this is a very nice improvement. I added a few follow-up comments, but you can mark this one resolved.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from 610c351 to dc3328e Compare February 6, 2025 13:49
Comment on lines 47 to 48
updateTypeBatch = "batch"
updateTypeIndividual = "individual"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a type for these, to prevent someone from setting updateType = "foobar" where behavior is undefined? Something like the following:

type UpdateType string

const (
    updateTypeBatch UpdateType = "batch"
    updateTypeIndividual UpdateType = "individual"
)

type pvUpdateQueueItem struct {
	updateType UpdateType
	pvNames    []string
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please first check my other comments though, I wonder if we really need updateType.

Comment on lines 113 to 118
} else {
if len(item.pvNames) == 0 {
c.queue.Forget(item)
return
}
pv, err := c.getPersistentVolumeByName(item.pvNames[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broader question: do we really need to separate out the batch and individual update types? I could be missing something, but the individual case looks mostly the same as the batch case, just with an array length of 1. Can this function handle both cases with out an if/else for updateType?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are special cases for batch updates, you could use len(item.pvNames) > 1 the same as item.updateType == "batch". But it would be better to minimize those special cases and handle an array length of 1 the same as >1 as a general rule.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of doing that in the same way but it makes the code a bit messy and a bit hard to understand.
putting updateType clarifies what kind of operation is expected to happen where In batch operation there can be a batch that will have only 1 volume.
Putting some points down:

  1. AWS errors are not very detailed. In a batch operation, they will return the first error message let's say in a batch operation: if the nth volume ID does not exist, it will return an error for that volume only maybe n+xth volume might give some different error message. which we are not sure of.
  2. Better readability of code.
  3. Better Events can be published with error messages in case of batch and individual updates.
  4. There can be cases where the user has again changed the infrastructure tags to previous tags, then we would not need to make more requests for a particular PV. IMHO including them within the same function will make it a bit confusing

I just want to put my reasoning for this field else I am open to refactoring it in the same way as you suggested.
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explaining, I'm okay with keeping the update types as you have them.

One more idea to simplify this function a bit: would it be possible to refactor the code inside the if / else block into separate methods? So that processVolumes could just have the common code and then call the right method, instead of nesting the code inside the if / else?

switch item.updateType {
    case updateTypeBatch:
        c.processBatchVolumes(ctx, item, infra, ec2Client)
    case updateTypeIndividual:
        c.processIndividualVolumes(ctx, item, infra, ec2Client)
    default:
        panic("invalid update type: %v", item.updateType)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; I have added the same.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from ee77c98 to 4c93dea Compare February 10, 2025 03:48
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the AWS_DAY2_TAGS_RECONCILIATION branch from 4c93dea to ceb3d9c Compare February 10, 2025 03:53
@anirudhAgniRedhat
Copy link
Author

Hey @dobsonj, do you feel anything else is needed for this change?
Else we can consider to merge this..

/retest

Copy link
Contributor

openshift-ci bot commented Feb 13, 2025

@anirudhAgniRedhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hypershift-e2e-openstack-csi-cinder 869a428 link true /test hypershift-e2e-openstack-csi-cinder
ci/prow/hypershift-e2e-openstack-csi-manila 869a428 link true /test hypershift-e2e-openstack-csi-manila
ci/prow/aws-efs-operator-e2e-extended 20a925b link false /test aws-efs-operator-e2e-extended
ci/prow/smb-win2022-operator-e2e 20a925b link false /test smb-win2022-operator-e2e
ci/prow/e2e-azurestack-csi 6afec9d link false /test e2e-azurestack-csi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dobsonj
Copy link
Member

dobsonj commented Feb 15, 2025

Hey @dobsonj, do you feel anything else is needed for this change? Else we can consider to merge this..

I think it looks good, thanks for all your work on this @anirudhAgniRedhat

/lgtm
/approve

/cc @ropatil010
for pre-merge testing + qe-approved label

/hold for openshift/cluster-storage-operator#528 to merge first

@openshift-ci openshift-ci bot requested a review from ropatil010 February 15, 2025 00:00
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 15, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 15, 2025
Copy link
Contributor

openshift-ci bot commented Feb 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anirudhAgniRedhat, dobsonj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2025
@anirudhAgniRedhat
Copy link
Author

Hey @ropatil010 Have you completed the pre-merge testing? so we can close this.
Thanks

@anirudhAgniRedhat
Copy link
Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants