diff --git a/training/modules/ocs4/nav.adoc b/training/modules/ocs4/nav.adoc index 492cbec..2a5a3ca 100644 --- a/training/modules/ocs4/nav.adoc +++ b/training/modules/ocs4/nav.adoc @@ -20,3 +20,4 @@ * xref:ocs4-additionalfeatures-devtype.adoc[Mixed OSD device type configuration] * xref:ocs4-additionalfeatures-override.adoc[Ceph configuration override] * xref:ocs4-additionalfeatures-segregation.adoc[Data Segregation] +* xref:ocs4-cluster-downsize.adoc[Cluster Downsizing] diff --git a/training/modules/ocs4/pages/ocs4-cluster-downsize.adoc b/training/modules/ocs4/pages/ocs4-cluster-downsize.adoc index adeddaf..6311115 100644 --- a/training/modules/ocs4/pages/ocs4-cluster-downsize.adoc +++ b/training/modules/ocs4/pages/ocs4-cluster-downsize.adoc @@ -1,4 +1,4 @@ -= How to Downsize a Red Hat OpenShift Container Storage 4.X Internal Cluster += How to Downsize a Red Hat OpenShift Data Foundation 4.X Internal Cluster // :toc: right // :toclevels: 3 :icons: font @@ -72,17 +72,18 @@ rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Runnin rook-ceph-osd-3-cb4b7bb9c-9zncq 1/1 Running 0 3m11s rook-ceph-osd-4-75c8d6894-fp9wb 1/1 Running 0 3m10s rook-ceph-osd-5-7b4f4c6785-kgwb4 1/1 Running 0 3m9s -rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 7m20s -rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4-4xqhv 0/1 Completed 0 3m44s -rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 7m19s -rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk-lpd27 0/1 Completed 0 3m42s -rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 7m19s -rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg-dm5mx 0/1 Completed 0 3m40s +rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 7m20s +rook-ceph-osd-prepare-ocs-deviceset-0-data1-q72z4-4xqhv 0/1 Completed 0 3m44s +rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 7m19s +rook-ceph-osd-prepare-ocs-deviceset-1-data1-jv2qk-lpd27 0/1 Completed 0 3m42s +rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 7m19s +rook-ceph-osd-prepare-ocs-deviceset-2-data1-r7dwg-dm5mx 0/1 Completed 0 3m40s ---- Before you can downsize your cluster you need to validate how many `storageDeviceSets` have been deployed so you can adjust the value properly. Each `storageDeviceSets` requires 3 OSDs deployed on 3 unique OCP nodes and the minimum number in a cluster is 1. The following command will provide you with the current number of `storageDeviceSets` configured in your cluster: + [source,role="execute"] ---- # deviceset=$(oc get storagecluster -n openshift-storage -o jsonpath='{.items[0].spec.storageDeviceSets[0].count}') @@ -155,15 +156,15 @@ Before you can proceed you have to identify the `storageDeviceSets` that are to ---- .Example output ---- -rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx 1/1 29s 44m -rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4 1/1 32s 40m -rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj 1/1 27s 44m -rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk 1/1 32s 40m -rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch 1/1 36s 44m -rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg 1/1 28s 40m +rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx 1/1 29s 44m +rook-ceph-osd-prepare-ocs-deviceset-0-data1-q72z4 1/1 32s 40m +rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj 1/1 27s 44m +rook-ceph-osd-prepare-ocs-deviceset-1-data1-jv2qk 1/1 32s 40m +rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch 1/1 36s 44m +rook-ceph-osd-prepare-ocs-deviceset-2-data1-r7dwg 1/1 28s 40m ---- -**Note:** Each `storageDeviceSets` has 3 jobs, one per replica. The rank of the `storageDeviceSets` is materialized by the value after `data`. If we look at the job `xxx-deviceset-0-data-0-yyy` it means the job is for the first replica (**`deviceset-0`**) for the first rank (**`data-0`**). +**Note:** Each `storageDeviceSets` has 3 jobs, one per replica. The rank of the `storageDeviceSets` is materialized by the value after `data`. If we look at the job `xxx-deviceset-0-data0-yyy` it means the job is for the first replica (**`deviceset-0`**) for the first rank (**`data0`**). We recommend that you shrink your cluster by removing the higher OSD IDs that are deployed for the higher rank `storageDeviceSets`. To identify the correct OSDs, verify which OSDs have been deployed with the following command. @@ -189,7 +190,7 @@ In the example above, the first `storageDeviceSets` correspond to OSDs 0 through ---- .Example output ---- -ocs-deviceset-1-data-1-jv2qk +ocs-deviceset-1-data1-jv2qk ---- From the example above the following objects will be removed from the cluster: @@ -197,9 +198,9 @@ From the example above the following objects will be removed from the cluster: * OSD with id 5 * OSD with id 4 * OSD with id 3 -* DeviceSet with id ocs-deviceset-2-data-1 -* DeviceSet with id ocs-deviceset-1-data-1 -* DeviceSet with id ocs-deviceset-0-data-1 +* DeviceSet with id ocs-deviceset-2-data1 +* DeviceSet with id ocs-deviceset-1-data1 +* DeviceSet with id ocs-deviceset-0-data1 === Remove OSDs from the Ceph Cluster You **MUST** remove each OSD, ONE AT A TIME, using the following set of commands. Make sure the cluster reaches `HEALTH_OK` status before removing the next OSD. @@ -229,13 +230,20 @@ Once the OSD pod has been verified, you can remove the OSD from the Ceph cluster [source,role="execute"] ---- -# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_ID=${osd_id_to_remove} | oc create -f - +# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} | oc create -f - ---- .Example output ---- job.batch/ocs-osd-removal-5 created ---- +You may watch the OSD removal job with: + +[source,role="execute"] +---- +# oc logs -n openshift-storage job/ocs-osd-removal -f +---- + ==== Step 3 - Check Cluster Status and Data Protection Check cluster status and wait until the status is `HEALTH_OK`. @@ -320,7 +328,7 @@ deployment.apps/rook-ceph-osd-3 scaled [source,role="execute"] ---- # oc get pods -n openshift-storage | grep osd-${osd_id_to_remove} -# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_ID=${osd_id_to_remove} | oc create -f - +# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} | oc create -f - ---- .Example output ---- @@ -347,127 +355,27 @@ HEALTH_WARN too many PGs per OSD (288 > max 250) **Note:** Although the status of the cluster is not `HEALTH_OK` in the above example no warning or error is reported regarding the protection of the data itself. -=== Remove OSD Deployment Objects +=== Cleanup -Now that the OSDs have been removed from the Ceph cluster and the OSD pods have been removed from the OCP cluster we will remove the deployment object for each OSD we have removed. - -[source,role="execute"] ----- -for i in 5 4 3; do oc delete -n openshift-storage deployment.apps/rook-ceph-osd-${i}; done ----- -.Example output ----- -deployment.apps "rook-ceph-osd-5" deleted -deployment.apps "rook-ceph-osd-4" deleted -deployment.apps "rook-ceph-osd-3" deleted ----- - -=== Remove Prepare Jobs - -Now that the deployments have been removed we will clean up the prepare jobs that were responsible for preparing the storage devices for the OSDs that no longer exist. +The OSD deployments (`rook-ceph-osd-*`) and the "prepare" jobs (`rook-ceph-osd-prepare-*`) corresponding to the deleted OSD, should have been deleted automatically. Verify with: [source,role="execute"] ---- +# oc get pod -n openshift-storage | grep osd # oc get job -n openshift-storage | grep prepare ---- -.Example output ----- -rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx 1/1 29s 162m -rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4 1/1 32s 159m -rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj 1/1 27s 162m -rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk 1/1 32s 158m -rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch 1/1 36s 162m -rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg 1/1 28s 158m ----- - -Remove only the jobs corresponding to the `storageDeviceSets` we have removed. - -[source,role="execute"] ----- -# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg ----- -.Example output ----- -job.batch "rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg" deleted ----- - -[source,role="execute"] ----- -# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk ----- -.Example output ----- -job.batch "rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk" deleted ----- - -[source,role="execute"] ----- -# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4 ----- -.Example output ----- -job.batch "rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4" deleted ----- - -=== Remove Persistent Volume Claims - -List all PVCs created for the OSDs in the cluster. - -[source,role="execute"] ----- -# oc get pvc -n openshift-storage| grep deviceset ----- -.Example output ----- -ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 165m -ocs-deviceset-0-data-1-q72z4 Bound pvc-36e0a5f7-9ef3-49e6-99d5-68c791870e61 2Ti RWO gp2 162m -ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 165m -ocs-deviceset-1-data-1-jv2qk Bound pvc-fbd93d58-eb56-4ac1-b987-91a3983b9e00 2Ti RWO gp2 162m -ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 165m -ocs-deviceset-2-data-1-r7dwg Bound pvc-e100bbf6-426d-4f10-af83-83b92181fb41 2Ti RWO gp2 162m ----- - -Then delete only the PVCs corresponding to the OSDs we have removed. - -[source,role="execute"] ----- -# oc delete -n openshift-storage pvc ocs-deviceset-2-data-1-r7dwg ----- -.Example output ----- -persistentvolumeclaim "ocs-deviceset-2-data-1-r7dwg" deleted ----- - -[source,role="execute"] ----- -# oc delete -n openshift-storage pvc ocs-deviceset-1-data-1-jv2qk ----- -.Example output ----- -persistentvolumeclaim "ocs-deviceset-1-data-1-jv2qk" deleted ----- - -[source,role="execute"] ----- -# oc delete -n openshift-storage pvc ocs-deviceset-0-data-1-q72z4 ----- -.Example output ----- -persistentvolumeclaim "ocs-deviceset-0-data-1-q72z4" deleted ----- -=== Final Cleanup -Verify the physical volumes that were dynamically provisioned for the OSDs we removed have been deleted. +Also verify the persistent volume claims and physical volumes that were dynamically provisioned for the OSDs we removed have been deleted. [source,role="execute"] ---- -# oc get pvc -n openshift-storage| grep deviceset +# oc get pvc -n openshift-storage | grep deviceset ---- .Example output ---- -ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 169m -ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 169m -ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 169m +ocs-deviceset-0-data0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 169m +ocs-deviceset-1-data0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 169m +ocs-deviceset-2-data0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 169m ---- [source,role="execute"] @@ -476,9 +384,9 @@ ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b ---- .Example output ---- -pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti openshift-storage/ocs-deviceset-0-data-0-hwzhx gp2 -pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti openshift-storage/ocs-deviceset-2-data-0-d6tch gp2 -pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti openshift-storage/ocs-deviceset-1-data-0-bmpzj gp2 +pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti openshift-storage/ocs-deviceset-0-data0-hwzhx gp2 +pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti openshift-storage/ocs-deviceset-2-data0-d6tch gp2 +pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti openshift-storage/ocs-deviceset-1-data0-bmpzj gp2 ---- Delete the OSD removal jobs. @@ -547,9 +455,9 @@ rook-ceph-operator-f44596d6-lh4zq 1/1 Runnin rook-ceph-osd-0-7947c4f995-l4bx4 1/1 Running 0 178m rook-ceph-osd-1-7cd6dc86c8-484bw 1/1 Running 0 178m rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Running 0 178m -rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 179m -rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 179m -rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 179m +rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 179m +rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 179m +rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 179m rook-ceph-tools-65fcc8988c-nw8r5 1/1 Running 0 171m ---- @@ -589,12 +497,12 @@ rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Runnin rook-ceph-osd-3-5967bdf767-2ffcr 1/1 Running 0 50s rook-ceph-osd-4-f7dcc6c7f-zd6tx 1/1 Running 0 48s rook-ceph-osd-5-99885889b-z8x95 1/1 Running 0 46s -rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 3h4m -rook-ceph-osd-prepare-ocs-deviceset-0-data-1-hwwr7-ntm4w 0/1 Completed 0 78s -rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 3h4m -rook-ceph-osd-prepare-ocs-deviceset-1-data-1-zdttb-mb5fx 0/1 Completed 0 77s -rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 3h4m -rook-ceph-osd-prepare-ocs-deviceset-2-data-1-s469h-kjgdf 0/1 Completed 0 75s +rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 3h4m +rook-ceph-osd-prepare-ocs-deviceset-0-data1-hwwr7-ntm4w 0/1 Completed 0 78s +rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 3h4m +rook-ceph-osd-prepare-ocs-deviceset-1-data1-zdttb-mb5fx 0/1 Completed 0 77s +rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 3h4m +rook-ceph-osd-prepare-ocs-deviceset-2-data1-s469h-kjgdf 0/1 Completed 0 75s ---- [source,role="execute"] @@ -605,12 +513,12 @@ rook-ceph-osd-prepare-ocs-deviceset-2-data-1-s469h-kjgdf 0/1 Comple ---- NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-0 Bound pvc-a45d2583-9ec1-4640-b2c9-8cb0d24be7f4 50Gi RWO ocs-storagecluster-ceph-rbd 3h4m -ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 3h4m -ocs-deviceset-0-data-1-hwwr7 Bound pvc-db64ec09-81c7-4e53-b91d-f089607a4824 2Ti RWO gp2 101s -ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 3h4m -ocs-deviceset-1-data-1-zdttb Bound pvc-21243378-5c7a-4df8-8605-d49559a4b01b 2Ti RWO gp2 100s -ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 3h4m -ocs-deviceset-2-data-1-s469h Bound pvc-64a6d4db-ce5c-4a5c-87b2-3bcde59c902f 2Ti RWO gp2 98s +ocs-deviceset-0-data0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 3h4m +ocs-deviceset-0-data1-hwwr7 Bound pvc-db64ec09-81c7-4e53-b91d-f089607a4824 2Ti RWO gp2 101s +ocs-deviceset-1-data0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 3h4m +ocs-deviceset-1-data1-zdttb Bound pvc-21243378-5c7a-4df8-8605-d49559a4b01b 2Ti RWO gp2 100s +ocs-deviceset-2-data0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 3h4m +ocs-deviceset-2-data1-s469h Bound pvc-64a6d4db-ce5c-4a5c-87b2-3bcde59c902f 2Ti RWO gp2 98s rook-ceph-mon-a Bound pvc-d4977e7f-8770-45de-bc12-9c213e3d0766 10Gi RWO gp2 3h6m rook-ceph-mon-b Bound pvc-2df867fc-38ff-4cb1-93fd-b3281f6c5fa2 10Gi RWO gp2 3h6m rook-ceph-mon-c Bound pvc-b70f812e-7d02-451c-a3fb-66b438a2304b 10Gi RWO gp2 3h6m