diff --git a/content/ceph-benchmarking/index.md b/content/ceph-benchmarking/index.md index ac06270..342c403 100644 --- a/content/ceph-benchmarking/index.md +++ b/content/ceph-benchmarking/index.md @@ -1,6 +1,6 @@ +++ title = "Ceph Benchmarking" -date = 2026-03-01 +date = 2026-02-21 description = "The results of some of my recent ceph benchmarks" draft = true @@ -9,6 +9,9 @@ categories = ["Homelab"] tags = ["Homelab", "Ceph"] +++ +## Motivation +I have been running a ceph cluster in my homelab for about 2 years now, but never properly benchmarked it, let alone wrote down my findings or any potential conclusions. + ## Setup everything for the Benchmarking On a machine in the ceph-cluster already: 1. Generate a minimal ceph config using `ceph config generate-minimal-conf` @@ -39,33 +42,100 @@ Setup the benchmark itself: 5. `mount /dev/rbd0 /mnt/bench` ## Benchmarks +All benchmarks are run with the same configuration, only changing the access (read/write, random/sequential). +Key configuration options are: +- using libaio +- direct io +- 1 job +{% details(summary="fio config") %} +``` +[global] +ioengine=libaio +direct=1 +size=4G +numjobs=1 +runtime=60s +time_based +startdelay=5s +group_reporting +stonewall + +name=write +rw=write +filename=bench + +[1io_4k] +iodepth=1 +bs=4k +[1io_8k] +iodepth=1 +bs=8k +[1io_64k] +iodepth=1 +bs=64k +[1io_4M] +iodepth=1 +bs=4M + +[32io_4k] +iodepth=32 +bs=4k +[32io_8k] +iodepth=32 +bs=8k +[32io_64k] +iodepth=32 +bs=64k +[32io_4M] +iodepth=32 +bs=4M +``` +{% end %} ## Results -{% details(summary="Random Reads - 1 Job") %} +{% details(summary="Random Reads") %} {{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_read.json") }} {% end %} -{% details(summary="Random Writes - 1 Job") %} +{% details(summary="Random Writes") %} {{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_write.json") }} {% end %} -{% details(summary="Sequential Reads - 1 Job") %} +{% details(summary="Sequential Reads") %} {{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_read.json") }} {% end %} -{% details(summary="Sequential Writes - 1 Job") %} +{% details(summary="Sequential Writes") %} {{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_write.json") }} {% end %} -## TODO -- Try directly on the block device -- Try this using xfs instead of ext4 -- Try this with and without drive caches +## Conclusion +1. Overall I am satisfied with the performance of the cluster for my current use-case. +2. There is a lot of room for improvement in the low queue-depth range +3. The network is not really a limiting factor currently + - None of the nodes in the cluster exceeded 500MiB/s of TX or RX, so there is plenty of room for growth + - My client used for testing was limited by the network, evident by the fact that the highest speed achieved is ~1.2GB/s (~10Gb/s) +4. My smallest node (the embedded epyc) could be the limiting factor as in some benchmarks, it reached 100% cpu usage, while my other nodes never exceeded 40% +## Extra Details +{% details(summary="Cluster Hardware") %} +- 10 Gb Networking between all nodes +- Node + - Ryzen 5 5500 + - 64GB RAM + - 4x 480GB enterprise SSD +- Node + - Ryzen 5 3600 + - 64GB RAM + - 4x 480GB enterprise SSD +- Node + - EPYC 3151 + - 64GB RAM + - 4x 480GB enterprise SSD +{% end %} -## Details -{% details(summary="Command to convert raw data into vis data") %} +{% details(summary="Command to convert raw data into data for visualisation") %} ```bash jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operations: { iops: .write.iops, bw_bytes: .write.bw_bytes } }]' content/ceph-benchmarking/benchmarks/raw_random_write.json | jq ' @@ -96,3 +166,8 @@ jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operati ' ``` {% end %} + +## Future Work +- Try directly on the block device +- Try this using xfs instead of ext4 +- Try this with and without drive caches