Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ocs4-cluster-downsize for ODF 4.10 #276

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions training/modules/ocs4/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@
* xref:ocs4-additionalfeatures-devtype.adoc[Mixed OSD device type configuration]
* xref:ocs4-additionalfeatures-override.adoc[Ceph configuration override]
* xref:ocs4-additionalfeatures-segregation.adoc[Data Segregation]
* xref:ocs4-cluster-downsize.adoc[Cluster Downsizing]
200 changes: 54 additions & 146 deletions training/modules/ocs4/pages/ocs4-cluster-downsize.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= How to Downsize a Red Hat OpenShift Container Storage 4.X Internal Cluster
= How to Downsize a Red Hat OpenShift Data Foundation 4.X Internal Cluster
// :toc: right
// :toclevels: 3
:icons: font
Expand Down Expand Up @@ -72,17 +72,18 @@ rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Runnin
rook-ceph-osd-3-cb4b7bb9c-9zncq 1/1 Running 0 3m11s
rook-ceph-osd-4-75c8d6894-fp9wb 1/1 Running 0 3m10s
rook-ceph-osd-5-7b4f4c6785-kgwb4 1/1 Running 0 3m9s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 7m20s
rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4-4xqhv 0/1 Completed 0 3m44s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 7m19s
rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk-lpd27 0/1 Completed 0 3m42s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 7m19s
rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg-dm5mx 0/1 Completed 0 3m40s
rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 7m20s
rook-ceph-osd-prepare-ocs-deviceset-0-data1-q72z4-4xqhv 0/1 Completed 0 3m44s
rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 7m19s
rook-ceph-osd-prepare-ocs-deviceset-1-data1-jv2qk-lpd27 0/1 Completed 0 3m42s
rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 7m19s
rook-ceph-osd-prepare-ocs-deviceset-2-data1-r7dwg-dm5mx 0/1 Completed 0 3m40s
----

Before you can downsize your cluster you need to validate how many `storageDeviceSets` have been deployed so you can adjust the value properly. Each `storageDeviceSets` requires 3 OSDs deployed on 3 unique OCP nodes and the minimum number in a cluster is 1.

The following command will provide you with the current number of `storageDeviceSets` configured in your cluster:

[source,role="execute"]
----
# deviceset=$(oc get storagecluster -n openshift-storage -o jsonpath='{.items[0].spec.storageDeviceSets[0].count}')
Expand Down Expand Up @@ -155,15 +156,15 @@ Before you can proceed you have to identify the `storageDeviceSets` that are to
----
.Example output
----
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx 1/1 29s 44m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4 1/1 32s 40m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj 1/1 27s 44m
rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk 1/1 32s 40m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch 1/1 36s 44m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg 1/1 28s 40m
rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx 1/1 29s 44m
rook-ceph-osd-prepare-ocs-deviceset-0-data1-q72z4 1/1 32s 40m
rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj 1/1 27s 44m
rook-ceph-osd-prepare-ocs-deviceset-1-data1-jv2qk 1/1 32s 40m
rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch 1/1 36s 44m
rook-ceph-osd-prepare-ocs-deviceset-2-data1-r7dwg 1/1 28s 40m
----

**Note:** Each `storageDeviceSets` has 3 jobs, one per replica. The rank of the `storageDeviceSets` is materialized by the value after `data`. If we look at the job `xxx-deviceset-0-data-0-yyy` it means the job is for the first replica (**`deviceset-0`**) for the first rank (**`data-0`**).
**Note:** Each `storageDeviceSets` has 3 jobs, one per replica. The rank of the `storageDeviceSets` is materialized by the value after `data`. If we look at the job `xxx-deviceset-0-data0-yyy` it means the job is for the first replica (**`deviceset-0`**) for the first rank (**`data0`**).

We recommend that you shrink your cluster by removing the higher OSD IDs that are deployed for the higher rank `storageDeviceSets`. To identify the correct OSDs, verify which OSDs have been deployed with the following command.

Expand All @@ -189,17 +190,17 @@ In the example above, the first `storageDeviceSets` correspond to OSDs 0 through
----
.Example output
----
ocs-deviceset-1-data-1-jv2qk
ocs-deviceset-1-data1-jv2qk
----

From the example above the following objects will be removed from the cluster:

* OSD with id 5
* OSD with id 4
* OSD with id 3
* DeviceSet with id ocs-deviceset-2-data-1
* DeviceSet with id ocs-deviceset-1-data-1
* DeviceSet with id ocs-deviceset-0-data-1
* DeviceSet with id ocs-deviceset-2-data1
* DeviceSet with id ocs-deviceset-1-data1
* DeviceSet with id ocs-deviceset-0-data1

=== Remove OSDs from the Ceph Cluster
You **MUST** remove each OSD, ONE AT A TIME, using the following set of commands. Make sure the cluster reaches `HEALTH_OK` status before removing the next OSD.
Expand Down Expand Up @@ -229,13 +230,20 @@ Once the OSD pod has been verified, you can remove the OSD from the Ceph cluster

[source,role="execute"]
----
# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_ID=${osd_id_to_remove} | oc create -f -
# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} | oc create -f -
----
.Example output
----
job.batch/ocs-osd-removal-5 created
----

You may watch the OSD removal job with:

[source,role="execute"]
----
# oc logs -n openshift-storage job/ocs-osd-removal -f
----

==== Step 3 - Check Cluster Status and Data Protection
Check cluster status and wait until the status is `HEALTH_OK`.

Expand Down Expand Up @@ -320,7 +328,7 @@ deployment.apps/rook-ceph-osd-3 scaled
[source,role="execute"]
----
# oc get pods -n openshift-storage | grep osd-${osd_id_to_remove}
# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_ID=${osd_id_to_remove} | oc create -f -
# oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} | oc create -f -
----
.Example output
----
Expand All @@ -347,127 +355,27 @@ HEALTH_WARN too many PGs per OSD (288 > max 250)

**Note:** Although the status of the cluster is not `HEALTH_OK` in the above example no warning or error is reported regarding the protection of the data itself.

=== Remove OSD Deployment Objects
=== Cleanup

Now that the OSDs have been removed from the Ceph cluster and the OSD pods have been removed from the OCP cluster we will remove the deployment object for each OSD we have removed.

[source,role="execute"]
----
for i in 5 4 3; do oc delete -n openshift-storage deployment.apps/rook-ceph-osd-${i}; done
----
.Example output
----
deployment.apps "rook-ceph-osd-5" deleted
deployment.apps "rook-ceph-osd-4" deleted
deployment.apps "rook-ceph-osd-3" deleted
----

=== Remove Prepare Jobs

Now that the deployments have been removed we will clean up the prepare jobs that were responsible for preparing the storage devices for the OSDs that no longer exist.
The OSD deployments (`rook-ceph-osd-*`) and the "prepare" jobs (`rook-ceph-osd-prepare-*`) corresponding to the deleted OSD, should have been deleted automatically. Verify with:

[source,role="execute"]
----
# oc get pod -n openshift-storage | grep osd
# oc get job -n openshift-storage | grep prepare
----
.Example output
----
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx 1/1 29s 162m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4 1/1 32s 159m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj 1/1 27s 162m
rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk 1/1 32s 158m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch 1/1 36s 162m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg 1/1 28s 158m
----

Remove only the jobs corresponding to the `storageDeviceSets` we have removed.

[source,role="execute"]
----
# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg
----
.Example output
----
job.batch "rook-ceph-osd-prepare-ocs-deviceset-2-data-1-r7dwg" deleted
----

[source,role="execute"]
----
# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk
----
.Example output
----
job.batch "rook-ceph-osd-prepare-ocs-deviceset-1-data-1-jv2qk" deleted
----

[source,role="execute"]
----
# oc delete -n openshift-storage job rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4
----
.Example output
----
job.batch "rook-ceph-osd-prepare-ocs-deviceset-0-data-1-q72z4" deleted
----

=== Remove Persistent Volume Claims

List all PVCs created for the OSDs in the cluster.

[source,role="execute"]
----
# oc get pvc -n openshift-storage| grep deviceset
----
.Example output
----
ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 165m
ocs-deviceset-0-data-1-q72z4 Bound pvc-36e0a5f7-9ef3-49e6-99d5-68c791870e61 2Ti RWO gp2 162m
ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 165m
ocs-deviceset-1-data-1-jv2qk Bound pvc-fbd93d58-eb56-4ac1-b987-91a3983b9e00 2Ti RWO gp2 162m
ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 165m
ocs-deviceset-2-data-1-r7dwg Bound pvc-e100bbf6-426d-4f10-af83-83b92181fb41 2Ti RWO gp2 162m
----

Then delete only the PVCs corresponding to the OSDs we have removed.

[source,role="execute"]
----
# oc delete -n openshift-storage pvc ocs-deviceset-2-data-1-r7dwg
----
.Example output
----
persistentvolumeclaim "ocs-deviceset-2-data-1-r7dwg" deleted
----

[source,role="execute"]
----
# oc delete -n openshift-storage pvc ocs-deviceset-1-data-1-jv2qk
----
.Example output
----
persistentvolumeclaim "ocs-deviceset-1-data-1-jv2qk" deleted
----

[source,role="execute"]
----
# oc delete -n openshift-storage pvc ocs-deviceset-0-data-1-q72z4
----
.Example output
----
persistentvolumeclaim "ocs-deviceset-0-data-1-q72z4" deleted
----

=== Final Cleanup
Verify the physical volumes that were dynamically provisioned for the OSDs we removed have been deleted.
Also verify the persistent volume claims and physical volumes that were dynamically provisioned for the OSDs we removed have been deleted.

[source,role="execute"]
----
# oc get pvc -n openshift-storage| grep deviceset
# oc get pvc -n openshift-storage | grep deviceset
----
.Example output
----
ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 169m
ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 169m
ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 169m
ocs-deviceset-0-data0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 169m
ocs-deviceset-1-data0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 169m
ocs-deviceset-2-data0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 169m
----

[source,role="execute"]
Expand All @@ -476,9 +384,9 @@ ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b
----
.Example output
----
pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti openshift-storage/ocs-deviceset-0-data-0-hwzhx gp2
pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti openshift-storage/ocs-deviceset-2-data-0-d6tch gp2
pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti openshift-storage/ocs-deviceset-1-data-0-bmpzj gp2
pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti openshift-storage/ocs-deviceset-0-data0-hwzhx gp2
pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti openshift-storage/ocs-deviceset-2-data0-d6tch gp2
pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti openshift-storage/ocs-deviceset-1-data0-bmpzj gp2
----

Delete the OSD removal jobs.
Expand Down Expand Up @@ -547,9 +455,9 @@ rook-ceph-operator-f44596d6-lh4zq 1/1 Runnin
rook-ceph-osd-0-7947c4f995-l4bx4 1/1 Running 0 178m
rook-ceph-osd-1-7cd6dc86c8-484bw 1/1 Running 0 178m
rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Running 0 178m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 179m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 179m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 179m
rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 179m
rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 179m
rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 179m
rook-ceph-tools-65fcc8988c-nw8r5 1/1 Running 0 171m
----

Expand Down Expand Up @@ -589,12 +497,12 @@ rook-ceph-osd-2-6b7659dd58-h5lp7 1/1 Runnin
rook-ceph-osd-3-5967bdf767-2ffcr 1/1 Running 0 50s
rook-ceph-osd-4-f7dcc6c7f-zd6tx 1/1 Running 0 48s
rook-ceph-osd-5-99885889b-z8x95 1/1 Running 0 46s
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-hwzhx-577f8 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-0-data-1-hwwr7-ntm4w 0/1 Completed 0 78s
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-bmpzj-27t5s 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-1-data-1-zdttb-mb5fx 0/1 Completed 0 77s
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-d6tch-ld7sd 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-2-data-1-s469h-kjgdf 0/1 Completed 0 75s
rook-ceph-osd-prepare-ocs-deviceset-0-data0-hwzhx-577f8 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-0-data1-hwwr7-ntm4w 0/1 Completed 0 78s
rook-ceph-osd-prepare-ocs-deviceset-1-data0-bmpzj-27t5s 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-1-data1-zdttb-mb5fx 0/1 Completed 0 77s
rook-ceph-osd-prepare-ocs-deviceset-2-data0-d6tch-ld7sd 0/1 Completed 0 3h4m
rook-ceph-osd-prepare-ocs-deviceset-2-data1-s469h-kjgdf 0/1 Completed 0 75s
----

[source,role="execute"]
Expand All @@ -605,12 +513,12 @@ rook-ceph-osd-prepare-ocs-deviceset-2-data-1-s469h-kjgdf 0/1 Comple
----
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-noobaa-db-0 Bound pvc-a45d2583-9ec1-4640-b2c9-8cb0d24be7f4 50Gi RWO ocs-storagecluster-ceph-rbd 3h4m
ocs-deviceset-0-data-0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 3h4m
ocs-deviceset-0-data-1-hwwr7 Bound pvc-db64ec09-81c7-4e53-b91d-f089607a4824 2Ti RWO gp2 101s
ocs-deviceset-1-data-0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 3h4m
ocs-deviceset-1-data-1-zdttb Bound pvc-21243378-5c7a-4df8-8605-d49559a4b01b 2Ti RWO gp2 100s
ocs-deviceset-2-data-0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 3h4m
ocs-deviceset-2-data-1-s469h Bound pvc-64a6d4db-ce5c-4a5c-87b2-3bcde59c902f 2Ti RWO gp2 98s
ocs-deviceset-0-data0-hwzhx Bound pvc-10930547-e0d0-47cf-ba56-d68dbe59d33c 2Ti RWO gp2 3h4m
ocs-deviceset-0-data1-hwwr7 Bound pvc-db64ec09-81c7-4e53-b91d-f089607a4824 2Ti RWO gp2 101s
ocs-deviceset-1-data0-bmpzj Bound pvc-fe3806cc-92f9-4382-8dad-026edae39906 2Ti RWO gp2 3h4m
ocs-deviceset-1-data1-zdttb Bound pvc-21243378-5c7a-4df8-8605-d49559a4b01b 2Ti RWO gp2 100s
ocs-deviceset-2-data0-d6tch Bound pvc-f523ea66-6c0b-4c00-b618-a66129af563b 2Ti RWO gp2 3h4m
ocs-deviceset-2-data1-s469h Bound pvc-64a6d4db-ce5c-4a5c-87b2-3bcde59c902f 2Ti RWO gp2 98s
rook-ceph-mon-a Bound pvc-d4977e7f-8770-45de-bc12-9c213e3d0766 10Gi RWO gp2 3h6m
rook-ceph-mon-b Bound pvc-2df867fc-38ff-4cb1-93fd-b3281f6c5fa2 10Gi RWO gp2 3h6m
rook-ceph-mon-c Bound pvc-b70f812e-7d02-451c-a3fb-66b438a2304b 10Gi RWO gp2 3h6m
Expand Down