Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels volumesnapshot class with storageid #240

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

vbnrh
Copy link
Member

@vbnrh vbnrh commented Nov 21, 2024

This commit labels all rbd and cephfs storageclasses and associated volumesnapshotclasses with storageid.

Also adds storageid label to the volumereplication class being created

The implementation is moved to work for both MDR and RDR scenarios

This commit labels all rbd and cephfs storageclasses and associated
volumesnapshotclasses with storageid.

Also adds storageid label to the volumereplication class being created

The implementation is moved to work for both MDR and RDR scenarios

Signed-off-by: vbadrina <[email protected]>
@vbnrh
Copy link
Member Author

vbnrh commented Nov 21, 2024

/test integration-test

1 similar comment
@vbnrh
Copy link
Member Author

vbnrh commented Nov 21, 2024

/test integration-test

Copy link

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than queries raised this PR is functionally fine.

I have other concerns though which are more dated (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

  • labelStorageClasses is currently labeling all SCs that match the provisioner name with the same fsid
    • So if a user created a new replica-2 pool or a new SC for some other pool type, this code would label it with the same fsid as the storageID
    • We should be selective while labeling the SCs, currently only the default SC should be labelled and other SCs should not be labelled at all based on the fact that we only mirror enable the default CephBlockPool with MCO
  • labelVolumeSnapshotClasses has the same concerns as above
  • We should specialize the fsid to be more narrow than the overall Ceph fsid and should add the poolID into the mix
    • For example, if a user creates a non-default replica-2 pool, then we would need to differentiate between the 2 pools storageID. For such purposes, we would need to consider the pool ID in addition to the fsid when labeling this as the storageID (and similarly when crafting the replicationID as well)

As it currently stands, this will work in the narrow use-case where a user has the default SCs (and hence pools) in the internal mode of operation. It would be good to address some of the ID generation and mapping the same into the classes as a subsequent PR for the current release cycle, such that as we consider enhancements to support mirror enabling user created pools, we can distinguish their storage/replication IDs sufficiently well.d (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

cc: @umangachapagain

@@ -188,6 +189,7 @@ func (r *DRPolicyReconciler) createOrUpdateManifestWorkForVRC(ctx context.Contex
labels := make(map[string]string)
labels[fmt.Sprintf(RamenLabelTemplate, ReplicationIDKey)] = replicationId
labels[fmt.Sprintf(RamenLabelTemplate, "maintenancemodes")] = "Failover"
labels[fmt.Sprintf(RamenLabelTemplate, StorageIDKey)] = clusterFSIDs[pr.ClusterName]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On upgrade the reconciler will ensure this is added? Just checking, the code is using the appropriate CreateOrUpdate etc. but asking anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i tested it.

return ctrl.Result{}, fmt.Errorf("an unknown error occurred while fetching the cluster fsids, retrying again: %v", err)
clusterFSIDs := make(map[string]string)
logger.Info("Fetching clusterFSIDs")
err = r.fetchClusterFSIDs(ctx, &mirrorPeer, clusterFSIDs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is of type hasStorageClientRef then will the call the fetchClusterFSIDs work or always error out? As it was called within the if prior to this case, wondering if we should retain it as such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will work. If hasStorageClientRef is true , then these agents would be created on provider clusters. It should always be able to fetch ceph fsids.

@vbnrh
Copy link
Member Author

vbnrh commented Nov 25, 2024

Other than queries raised this PR is functionally fine.

I have other concerns though which are more dated (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

* `labelStorageClasses` is currently labeling all SCs that match the provisioner name with the same fsid
  
  * So if a user created a new replica-2 pool or a new SC for some other pool type, this code would label it with the same fsid as the storageID
  * We should be selective while labeling the SCs, currently only the default SC should be labelled and other SCs should not be labelled at all based on the fact that we only mirror enable the default CephBlockPool with MCO

* labelVolumeSnapshotClasses has the same concerns as above

* We should specialize the fsid to be more narrow than the overall Ceph fsid and should add the poolID into the mix
  
  * For example, if a user creates a non-default replica-2 pool, then we would need to differentiate between the 2 pools storageID. For such purposes, we would need to consider the pool ID in addition to the fsid when labeling this as the storageID (and similarly when crafting the replicationID as well)

As it currently stands, this will work in the narrow use-case where a user has the default SCs (and hence pools) in the internal mode of operation. It would be good to address some of the ID generation and mapping the same into the classes as a subsequent PR for the current release cycle, such that as we consider enhancements to support mirror enabling user created pools, we can distinguish their storage/replication IDs sufficiently well.d (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

cc: @umangachapagain

If the ID generation

Other than queries raised this PR is functionally fine.

I have other concerns though which are more dated (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

* `labelStorageClasses` is currently labeling all SCs that match the provisioner name with the same fsid
  
  * So if a user created a new replica-2 pool or a new SC for some other pool type, this code would label it with the same fsid as the storageID
  * We should be selective while labeling the SCs, currently only the default SC should be labelled and other SCs should not be labelled at all based on the fact that we only mirror enable the default CephBlockPool with MCO

* labelVolumeSnapshotClasses has the same concerns as above

* We should specialize the fsid to be more narrow than the overall Ceph fsid and should add the poolID into the mix
  
  * For example, if a user creates a non-default replica-2 pool, then we would need to differentiate between the 2 pools storageID. For such purposes, we would need to consider the pool ID in addition to the fsid when labeling this as the storageID (and similarly when crafting the replicationID as well)

As it currently stands, this will work in the narrow use-case where a user has the default SCs (and hence pools) in the internal mode of operation. It would be good to address some of the ID generation and mapping the same into the classes as a subsequent PR for the current release cycle, such that as we consider enhancements to support mirror enabling user created pools, we can distinguish their storage/replication IDs sufficiently well.d (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).

cc: @umangachapagain

Is the ask only to make the StorageID more unique ? (CephFSID + PoolID)

We may need to sync it onto the hub in simliar way to cephfsid when we would apply StorageID onto the volumereplicationclasses as well. We will have to add it onto the design. I will send a subsequent PR for it. Will that work ?

@ShyamsundarR
Copy link

As it currently stands, this will work in the narrow use-case where a user has the default SCs (and hence pools) in the internal mode of operation. It would be good to address some of the ID generation and mapping the same into the classes as a subsequent PR for the current release cycle, such that as we consider enhancements to support mirror enabling user created pools, we can distinguish their storage/replication IDs sufficiently well.d (and possibly can be fixed/updated in the current cycle as Ramen starts leveraging the IDs more ).
cc: @umangachapagain

Is the ask only to make the StorageID more unique ? (CephFSID + PoolID)

We may need to sync it onto the hub in simliar way to cephfsid when we would apply StorageID onto the volumereplicationclasses as well. We will have to add it onto the design. I will send a subsequent PR for it. Will that work ?

Ack! I suggest we complete this soon. Get this PR in, such that current DR works, but avoid upgrade issues by tackling the issue of ID uniqueness soon after within the same release.

The intention for internal mode would be (as it does not leverage RADOS namespaces (yet?)):

  • StorageID = Ceph fsid + Pool ID (MDS pool of CephFS or RBD pool ID as the case maybe)
  • ReplicationID = StorageID1+StorageID2 (I think this is the case at present, but just making sure of the same)

@umangachapagain
Copy link
Contributor

We will continue this discussion and send a follow up patch. Merging this PR.

Copy link
Contributor

openshift-ci bot commented Nov 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: umangachapagain, vbnrh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [umangachapagain,vbnrh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 1aeb02f into red-hat-storage:main Nov 27, 2024
11 checks passed
@vbnrh
Copy link
Member Author

vbnrh commented Dec 4, 2024

/cherry-pick release-4.18

@openshift-cherrypick-robot

@vbnrh: new pull request created: #243

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants