How can I run Ceph commands on Rook Ceph cluster running in Kubernetes / OpenShift?. If you use Ceph on Kubernetes with rook, you’ll need a way to access ceph command line tool for troubleshooting issues when they arise. Rook is CNCF Certified, Production ready, Open-Source Cloud-Native Storage Solution for Kubernetes. It eases the management for File, Block and Object Storage.

The rook-ceph toolbox is a pod with common tools used for Ceph debugging and testing. You can configure Ceph directly when running Rook on Kubernetes. This is achieved by using Ceph’s CLI from the Rook-Ceph toolbox pod. From the toolbox container, you can change Ceph configurations, enable manager modules, create users and pools, and much more.

Run Ceph Toolbox on Kubernetes

The rook toolbox can run as a deployment in a Kubernetes cluster. After you ensure you have a running Kubernetes cluster with rook deployed, launch the rook-ceph-tools pod.

Create Toolbox deployment file:

$ vim toolbox.yaml

Add below data to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-ceph
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: rook-ceph-tools
        image: rook/ceph:master
        command: ["https://computingforgeeks.com/tini"]
        args: ["-g", "--", "https://computingforgeeks.com/usr/local/bin/toolbox.sh"]
        imagePullPolicy: IfNotPresent
        env:
          - name: ROOK_ADMIN_SECRET
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: admin-secret
        volumeMounts:
          - mountPath: /etc/ceph
            name: ceph-config
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      volumes:
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
            - key: data
              path: mon-endpoints
        - name: ceph-config
          emptyDir: {}
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 5

Once you save the file, launch the rook-ceph-tools pod:

kubectl create -f toolbox.yaml

Wait for the toolbox pod to download its container and get to the running state:

kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

Once the rook-ceph-tools pod is running, you can connect to it with:

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

When you are done with the toolbox, you can remove the deployment:

kubectl -n rook-ceph delete deployment rook-ceph-tools

Run Ceph Toolbox on OpenShift Container Storage (OCS) v4.2

If you’re running OpenShift Container Storage which uses Rook. First enable Ceph Tools by running the command below.

oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "https://computingforgeeks.com/spec/enableCephTools", "value": true }]'

Create a new file:

$ vi toolbox.yaml

Add the following contents to the file created.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: openshift-storage
  labels:
    app: rook-ceph-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: rook-ceph-tools
        image: registry.redhat.io/ocs4/rook-ceph-rhel8-operator:latest
        command: ["https://computingforgeeks.com/tini"]
        args: ["-g", "--", "https://computingforgeeks.com/usr/local/bin/toolbox.sh"]
        imagePullPolicy: IfNotPresent
        env:
          - name: ROOK_ADMIN_SECRET
            valueFrom:
              secretKeyRef:
                name: rook-ceph-mon
                key: admin-secret
        securityContext:
          privileged: true
        volumeMounts:
          - mountPath: /dev
            name: dev
          - mountPath: /sys/bus
            name: sysbus
          - mountPath: /lib/modules
            name: libmodules
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      # if hostNetwork: false, the "rbd map" command hangs, see https://github.com/rook/rook/issues/2021
      hostNetwork: true
      volumes:
        - name: dev
          hostPath:
            path: /dev
        - name: sysbus
          hostPath:
            path: /sys/bus
        - name: libmodules
          hostPath:
            path: /lib/modules
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
            - key: data
              path: mon-endpoints

Launch the rook-ceph toolbox pod after

oc create -f toolbox.yaml

Wait for the toolbox pod to download its container and get to the running state:

$ oc -n openshift-storage get pod -l "app=rook-ceph-tools"
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-tools-86cbb6dddb-vnht9   1/1     Running   0          6m49s

Once the rook-ceph toolbox pod is running, you can connect to it with:

oc -n openshift-storage exec -it $(oc -n openshift-storage get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

Common Ceph Commands for Troubleshooting

Here are some common commands to troubleshoot a Ceph cluster:

  • ceph status
  • ceph osd status
  • ceph osd df
  • ceph osd utilization
  • ceph osd pool stats
  • ceph osd tree
  • ceph pg stat

All the commands can be executed on the toolbox container. See examples below.

# ceph -s
  cluster:
    id:     58a41eac-5550-42a2-b7b2-b97c7909a833
    health: HEALTH_WARN
            1 osds down
            1 host (1 osds) down
            1 rack (1 osds) down
            Degraded data redundancy: 91080/273240 objects degraded (33.333%), 80 pgs degraded, 104 pgs undersized
 
  services:
    mon: 3 daemons, quorum a,b,c (age 2h)
    mgr: a(active, since 2h)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 2 up (since 2h), 3 in (since 4w)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
 
  task status:
 
  data:
    pools:   10 pools, 104 pgs
    objects: 91.08k objects, 335 GiB
    usage:   670 GiB used, 3.3 TiB / 4.0 TiB avail
    pgs:     91080/273240 objects degraded (33.333%)
             80 active undersized degraded
             24 active undersized
 
  io:
    client:   7.7 KiB/s rd, 24 MiB/s wr, 3 op/s rd, 236 op/s wr

Check OSD tree.

#  ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                            STATUS REWEIGHT PRI-AFF 
 -1       5.99698 root default                                                 
 -4       1.99899     rack rack0                                               
 -3       1.99899         host ocs-deviceset-0-0-prf65                         
  0   ssd 1.99899             osd.0                      down  1.00000 1.00000 
-12       1.99899     rack rack1                                               
-11       1.99899         host ocs-deviceset-1-0-mfgmx                         
  2   ssd 1.99899             osd.2                        up  1.00000 1.00000 
 -8       1.99899     rack rack2                                               
 -7       1.99899         host ocs-deviceset-2-0-b96pk                         
  1   ssd 1.99899             osd.1                        up  1.00000 1.00000 

Get a list of Pools.

# ceph osd lspools
1 ocs-storagecluster-cephblockpool
2 ocs-storagecluster-cephobjectstore.rgw.control
3 ocs-storagecluster-cephfilesystem-metadata
4 ocs-storagecluster-cephobjectstore.rgw.meta
5 ocs-storagecluster-cephfilesystem-data0
6 ocs-storagecluster-cephobjectstore.rgw.log
7 .rgw.root
8 ocs-storagecluster-cephobjectstore.rgw.buckets.index
9 ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec
10 ocs-storagecluster-cephobjectstore.rgw.buckets.data

Reference:

More on Ceph and Kubernetes:

Create a Pool in Ceph Storage Cluster

Ceph Persistent Storage for Kubernetes with Cephfs

Persistent Storage for Kubernetes with Ceph RBD

How To Configure AWS S3 CLI for Ceph Object Gateway Storage