Manage storage
Expand Rook-Ceph storage
https://rook.io/docs/rook/v1.9/ceph-osd-mgmt.html
Initially, each agent node in the Kubernetes cluster will have 2 volumes of 100GB each. 6 volumes total. You can increase this number to 3 per node or any other number of volumes by updating the Pulumi code. Without additional updates to the code we are always adding the same number of volumes to each agent node.
foreach (var agent in k8sAgentServer)
{
int numberOfVolumesPerAgent = 3; // <-- UPDATE THIS NUMBER TO ADD MORE VOLUMES
for (var i = 1; i <= numberOfVolumesPerAgent; i++)
{
Output.Format($"{agent.Name}_volume_{i}").Apply(volumeName =>
{
var volume = new HCloud.Volume(volumeName, new()
{
Name = volumeName,
Size = 100,
ServerId = agent.Id.Apply(int.Parse),
});
return Task.CompletedTask;
});
}
}
Make sure you run pulumi preview before applying the updates. When updating from the default 2 volumes to 3 volumes per node the output should look like this:
Previewing update (production):
Type Name Plan
pulumi:pulumi:Stack hetzner-production
+ ├─ hcloud:index:Volume agent1_volume_3 create
+ ├─ hcloud:index:Volume agent2_volume_3 create
+ └─ hcloud:index:Volume agent3_volume_3 create
Resources:
+ 3 to create
38 unchanged
Run pulumi up to apply the changes.
In order to create new OSDs in the ceph cluster Rook operator needs to be restarted by deleting the operator pod
kubectl -n rook-ceph delete pod $(kubectl -n rook-ceph get pod -l "app=rook-ceph-operator" -o jsonpath='{.items[0].metadata.name}')
Confirm that the new OSDs have been added to the cluster and are in up status. This may take a few minutes!
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
Confirm that the Ceph cluster is healthy health: HEALTH_OK by running
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
Expanding PVCs
It is easy to expand a PVC. Shrinking is not supported. Expand wisely!
Let's for example expand the PVC of the test whoami workload.
Update the file k8s/whoami/whoami.yaml as follows:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: busybox-logs
namespace: whoami
spec:
storageClassName: ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi # <- UPDATE THIS VALUE FROM 50Mi to 100Mi
Run kubectl apply
envsubst < k8s/whoami/whoami.yaml | kubectl apply --wait -f -
Expansion will happen automatically after a few seconds.
Shrink Rook-Ceph storage
Ensure your new configuration has enough space to hold all the data.
Shrinking storage is a complex operation and may lead to data loss if not done properly.
It is recommended to backup all important data before proceeding.
Also ensure that the number of volumes per node doesn't go bellow 2!
Always remove one volume at a time per node and wait for the Ceph cluster to stabilize before continuing.
Details at https://rook.io/docs/rook/v1.9/ceph-osd-mgmt.html#remove-an-osd
# Stop Rook operator
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0
It is recommended to run the following sequence of commands for each OSD wait for the Ceph cluster to stabilize before continuing on with removals.
OSD IDs are zero based. Initial 6 OSDs they will be numbered from 0 to 5. If you are removing the last volume from each node, the OSD IDs will be 6, 7, 8 etc.
# Stop OSD deployment
kubectl -n rook-ceph scale deployment rook-ceph-osd-{ID} --replicas=0
# Mark OSD as down (may report that the OSD is already down)
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd down osd.{ID}
# Mark OSDs as out
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd out osd.{ID}
# Wait for Cepth to finish backfilling to other OSDs
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
# Purge OSDs
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd purge {ID} --yes-i-really-mean-it
# Confirm they are removed
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
# Remove OSD deployment
kubectl delete deployment -n rook-ceph rook-ceph-osd-{ID}
Update Pulumi code by reducing the number of volumes. Always remove 1 volume per node per session to allow Ceph cluster to stabilize.
int numberOfVolumesPerAgent = 2;
Update the infrastructure
# Preview changes
pulumi preview
# Apply updates
pulumi up
Start Rook operator
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1