Ceph: Difference between revisions
(Created page with "== Deleted OSD == ceph osd out [OSD-NUM] <ref>Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/</ref>") |
(→OSD) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== | <!-- == Subpage Table of Contents == --> | ||
'''Subpage Table of Contents''' | |||
{{Special:PrefixIndex/{{PAGENAME}}/}} | |||
<ref>Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/</ref> | <br/> | ||
== Ceph == | |||
== Hardware Recommendations == | |||
hardware recommendations — Ceph Documentation | |||
https://docs.ceph.com/en/quincy/start/hardware-recommendations/ | |||
== Status == | |||
ceph status | |||
# OR: ceph -s | |||
Example: | |||
<pre> | |||
# ceph status | |||
cluster: | |||
id: ff74f760-84b2-4dc4-b518-8408e3f10779 | |||
health: HEALTH_OK | |||
services: | |||
mon: 3 daemons, quorum vm-05,vm-06,vm-07 (age 12m) | |||
mgr: vm-07(active, since 47m), standbys: vm-06, vm-05 | |||
mds: 1/1 daemons up, 2 standby | |||
osd: 3 osds: 3 up (since 4m), 3 in (since 4m) | |||
data: | |||
volumes: 1/1 healthy | |||
pools: 4 pools, 97 pgs | |||
objects: 3.68k objects, 13 GiB | |||
usage: 38 GiB used, 3.7 TiB / 3.7 TiB avail | |||
pgs: 97 active+clean | |||
io: | |||
client: 107 KiB/s rd, 4.0 KiB/s wr, 0 op/s rd, 0 op/s wr | |||
</pre> | |||
== Health == | |||
Health summary: | |||
osd health | |||
# good health: | |||
HEALTH_OK | |||
# bad health: | |||
HEALTH_WARN Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 54 slow ops, oldest one blocked for 212 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.lmt-vm-05] have slow ops. | |||
Health details: | |||
osd health detail | |||
# good health: | |||
HEALTH_OK | |||
<pre> | |||
# bad health: | |||
HEALTH_WARN 1 osds down; 1 host (1 osds) down; Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 49 slow ops, oldest one blocked for 306 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.prox-05] have slow ops. | |||
[WRN] OSD_DOWN: 1 osds down | |||
osd.5 (root=default,host=prox-06) is down | |||
[WRN] OSD_HOST_DOWN: 1 host (1 osds) down | |||
host prox-06 (root=default) (1 osds) is down | |||
[WRN] PG_AVAILABILITY: Reduced data availability: 47 pgs inactive, 47 pgs peering | |||
pg 3.0 is stuck peering for 6m, current state peering, last acting [3,5,4] | |||
pg 3.3 is stuck peering for 7w, current state peering, last acting [5,1,0] | |||
... | |||
</pre> | |||
== Watch == | |||
Watch live changes: | |||
ceph -w | |||
== OSD == | |||
=== List OSDs === | |||
==== volume lvm list ==== | |||
Note: only shows local OSDs.. | |||
ceph-volume lvm list | |||
Example: | |||
<pre> | |||
====== osd.0 ======= | |||
[block] /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 | |||
block device /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 | |||
block uuid uvsm7p-c9KU-iaVe-GJGv-NBRM-xGrr-XPf3eB | |||
cephx lockbox secret | |||
cluster fsid ff74f760-84b2-4dc4-b518-8408e3f10779 | |||
cluster name ceph | |||
crush device class | |||
encrypted 0 | |||
osd fsid ff991dbd-7698-44ab-ad90-102340ec05c7 | |||
osd id 0 | |||
osdspec affinity | |||
type block | |||
vdo 0 | |||
devices /dev/fioa | |||
</pre> | |||
<ref>https://docs.ceph.com/en/quincy/ceph-volume/lvm/list/</ref> | |||
==== osd tree ==== | |||
ceph osd tree | |||
Example: | |||
<pre> | |||
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF | |||
-1 3.69246 root default | |||
-3 1.09589 host vm-05 | |||
0 ssd 1.09589 osd.0 up 1.00000 1.00000 | |||
-7 1.09589 host vm-06 | |||
2 ssd 1.09589 osd.2 down 0 1.00000 | |||
-5 1.50069 host vm-07 | |||
1 ssd 1.50069 osd.1 up 1.00000 1.00000 | |||
</pre> | |||
List down tree OSD nodes: <ref>https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/</ref> | |||
ceph osd tree down | |||
==== osd stat ==== | |||
ceph osd stat | |||
==== osd dump ==== | |||
ceph osd dump | |||
=== Mark OSD Online (In) === | |||
ceph osd in [OSD-NUM] | |||
=== Mark OSD Offline (Out) === | |||
ceph osd out [OSD-NUM] | |||
=== Deleted OSD === | |||
First mark it out: | |||
ceph osd out osd.{osd-num} | |||
Mark it down: | |||
ceph osd down osd.{osd-num} | |||
Remove it: | |||
ceph osd rm osd.{osd-num} | |||
Check tree for removal: | |||
ceph osd tree | |||
--- | |||
If you get an error that it is busy.. <ref>https://medium.com/@george.shuklin/how-to-remove-osd-from-ceph-cluster-b4c37cc0ec87</ref> | |||
Go to host that has the OSD and stop the service: | |||
systemctl stop ceph-osd@{osd-num} | |||
Remove it again: | |||
ceph osd rm osd.{osd-num} | |||
Check tree for removal: | |||
ceph osd tree | |||
If 'ceph osd tree' reports 'DNE (do not exist), then do the following... | |||
Remove from the CRUSH: | |||
ceph osd crush rm osd.{osd-num} | |||
Clear auth: | |||
ceph auth del osd.{osd-num}. | |||
ref: <ref>Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/</ref> | |||
=== Create OSD === | |||
Create OSD:<ref>https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osd_create</ref> | |||
pveceph osd create /dev/sd[X] | |||
If the disk was in use before (for example, for ZFS or as an OSD) you first need to zap all traces of that usage: | |||
ceph-volume lvm zap /dev/sd[X] --destroy | |||
Create OSD ID: | |||
ceph osd create | |||
# will generate the next ID in sequence | |||
Create directory: | |||
mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number} | |||
Init data directory: | |||
ceph-osd -i {osd-num} --mkfs --mkkey | |||
Register: | |||
ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring | |||
Add to CRUSH map: | |||
ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...] | |||
== File System == | |||
ceph fs ls | |||
Get fsid for mount: | |||
sudo ceph fsid | |||
sudo mount.ceph client.admin@8fc87072-5946-466f-a10a-6fa9bd6fa925.cephfs=/ /mnt | |||
== POOL == | |||
=== Pool Stats === | |||
ceph osd pool stats | |||
== Mouting Ceph == | |||
Mount CephFS using Kernel Driver — Ceph Documentation | |||
https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/ | |||
mount -t ceph {device-string}={path-to-mounted} {mount-point} -o {key-value-args} {other-args} | |||
mkdir /mnt/mycephfs | |||
mount -t ceph <name>@<fsid>.<fs_name>=/ /mnt/mycephfs | |||
name is the username of the CephX user we are using to mount CephFS. fsid is the FSID of the ceph cluster which can be found using ceph fsid command. fs_name is the file system to mount. | |||
example: | |||
mount -t ceph cephuser@b3acfc0d-575f-41d3-9c91-0e7ed3dbb3fa.cephfs=/ -o mon_addr=192.168.0.1:6789,secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ== | |||
When using the mount helper, monitor hosts and FSID are optional. mount.ceph helper figures out these details automatically by finding and reading ceph conf file, .e.g: | |||
mount -t ceph cephuser@.cephfs=/ -o secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ== | |||
Note that the dot (.) still needs to be a part of the device string. | |||
A potential problem with the above command is that the secret key is left in your shell’s command history. To prevent that you can copy the secret key inside a file and pass the file by using the option secretfile instead of secret: | |||
mount -t ceph cephuser@.cephfs=/ /mnt/mycephfs -o secretfile=/etc/ceph/cephuser.secret | |||
== References == | |||
{{references}} | |||
== keywords == |
Latest revision as of 05:11, 8 September 2024
Subpage Table of Contents
Ceph
Hardware Recommendations
hardware recommendations — Ceph Documentation https://docs.ceph.com/en/quincy/start/hardware-recommendations/
Status
ceph status # OR: ceph -s
Example:
# ceph status cluster: id: ff74f760-84b2-4dc4-b518-8408e3f10779 health: HEALTH_OK services: mon: 3 daemons, quorum vm-05,vm-06,vm-07 (age 12m) mgr: vm-07(active, since 47m), standbys: vm-06, vm-05 mds: 1/1 daemons up, 2 standby osd: 3 osds: 3 up (since 4m), 3 in (since 4m) data: volumes: 1/1 healthy pools: 4 pools, 97 pgs objects: 3.68k objects, 13 GiB usage: 38 GiB used, 3.7 TiB / 3.7 TiB avail pgs: 97 active+clean io: client: 107 KiB/s rd, 4.0 KiB/s wr, 0 op/s rd, 0 op/s wr
Health
Health summary:
osd health
# good health: HEALTH_OK
# bad health: HEALTH_WARN Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 54 slow ops, oldest one blocked for 212 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.lmt-vm-05] have slow ops.
Health details:
osd health detail
# good health: HEALTH_OK
# bad health: HEALTH_WARN 1 osds down; 1 host (1 osds) down; Reduced data availability: 47 pgs inactive, 47 pgs peering; 47 pgs not deep-scrubbed in time; 47 pgs not scrubbed in time; 49 slow ops, oldest one blocked for 306 sec, daemons [osd.0,osd.1,osd.2,osd.5,osd.9,mon.prox-05] have slow ops. [WRN] OSD_DOWN: 1 osds down osd.5 (root=default,host=prox-06) is down [WRN] OSD_HOST_DOWN: 1 host (1 osds) down host prox-06 (root=default) (1 osds) is down [WRN] PG_AVAILABILITY: Reduced data availability: 47 pgs inactive, 47 pgs peering pg 3.0 is stuck peering for 6m, current state peering, last acting [3,5,4] pg 3.3 is stuck peering for 7w, current state peering, last acting [5,1,0] ...
Watch
Watch live changes:
ceph -w
OSD
List OSDs
volume lvm list
Note: only shows local OSDs..
ceph-volume lvm list
Example:
====== osd.0 ======= [block] /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 block device /dev/ceph-64fda9eb-2342-43e3-bc3e-78e5c1bcda31/osd-block-ff991dbd-7698-44ab-ad90-102340ec05c7 block uuid uvsm7p-c9KU-iaVe-GJGv-NBRM-xGrr-XPf3eB cephx lockbox secret cluster fsid ff74f760-84b2-4dc4-b518-8408e3f10779 cluster name ceph crush device class encrypted 0 osd fsid ff991dbd-7698-44ab-ad90-102340ec05c7 osd id 0 osdspec affinity type block vdo 0 devices /dev/fioa
osd tree
ceph osd tree
Example:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.69246 root default -3 1.09589 host vm-05 0 ssd 1.09589 osd.0 up 1.00000 1.00000 -7 1.09589 host vm-06 2 ssd 1.09589 osd.2 down 0 1.00000 -5 1.50069 host vm-07 1 ssd 1.50069 osd.1 up 1.00000 1.00000
List down tree OSD nodes: [2]
ceph osd tree down
osd stat
ceph osd stat
osd dump
ceph osd dump
Mark OSD Online (In)
ceph osd in [OSD-NUM]
Mark OSD Offline (Out)
ceph osd out [OSD-NUM]
Deleted OSD
First mark it out:
ceph osd out osd.{osd-num}
Mark it down:
ceph osd down osd.{osd-num}
Remove it:
ceph osd rm osd.{osd-num}
Check tree for removal:
ceph osd tree
---
If you get an error that it is busy.. [3]
Go to host that has the OSD and stop the service:
systemctl stop ceph-osd@{osd-num}
Remove it again:
ceph osd rm osd.{osd-num}
Check tree for removal:
ceph osd tree
If 'ceph osd tree' reports 'DNE (do not exist), then do the following...
Remove from the CRUSH:
ceph osd crush rm osd.{osd-num}
Clear auth:
ceph auth del osd.{osd-num}.
ref: [4]
Create OSD
Create OSD:[5]
pveceph osd create /dev/sd[X]
If the disk was in use before (for example, for ZFS or as an OSD) you first need to zap all traces of that usage:
ceph-volume lvm zap /dev/sd[X] --destroy
Create OSD ID:
ceph osd create # will generate the next ID in sequence
Create directory:
mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/ceph-{osd-number}
Init data directory:
ceph-osd -i {osd-num} --mkfs --mkkey
Register:
ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-{osd-num}/keyring
Add to CRUSH map:
ceph osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
File System
ceph fs ls
Get fsid for mount:
sudo ceph fsid
sudo mount.ceph client.admin@8fc87072-5946-466f-a10a-6fa9bd6fa925.cephfs=/ /mnt
POOL
Pool Stats
ceph osd pool stats
Mouting Ceph
Mount CephFS using Kernel Driver — Ceph Documentation https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/
mount -t ceph {device-string}={path-to-mounted} {mount-point} -o {key-value-args} {other-args}
mkdir /mnt/mycephfs mount -t ceph <name>@<fsid>.<fs_name>=/ /mnt/mycephfs
name is the username of the CephX user we are using to mount CephFS. fsid is the FSID of the ceph cluster which can be found using ceph fsid command. fs_name is the file system to mount.
example:
mount -t ceph cephuser@b3acfc0d-575f-41d3-9c91-0e7ed3dbb3fa.cephfs=/ -o mon_addr=192.168.0.1:6789,secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ==
When using the mount helper, monitor hosts and FSID are optional. mount.ceph helper figures out these details automatically by finding and reading ceph conf file, .e.g:
mount -t ceph cephuser@.cephfs=/ -o secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ==
Note that the dot (.) still needs to be a part of the device string.
A potential problem with the above command is that the secret key is left in your shell’s command history. To prevent that you can copy the secret key inside a file and pass the file by using the option secretfile instead of secret:
mount -t ceph cephuser@.cephfs=/ /mnt/mycephfs -o secretfile=/etc/ceph/cephuser.secret
References
- ↑ https://docs.ceph.com/en/quincy/ceph-volume/lvm/list/
- ↑ https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/
- ↑ https://medium.com/@george.shuklin/how-to-remove-osd-from-ceph-cluster-b4c37cc0ec87
- ↑ Adding/Removing OSDs — Ceph Documentation - https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/
- ↑ https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_osd_create