Flesh out the benchmark post some more
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
+++
|
+++
|
||||||
title = "Ceph Benchmarking"
|
title = "Ceph Benchmarking"
|
||||||
date = 2026-03-01
|
date = 2026-02-21
|
||||||
description = "The results of some of my recent ceph benchmarks"
|
description = "The results of some of my recent ceph benchmarks"
|
||||||
draft = true
|
draft = true
|
||||||
|
|
||||||
@@ -9,6 +9,9 @@ categories = ["Homelab"]
|
|||||||
tags = ["Homelab", "Ceph"]
|
tags = ["Homelab", "Ceph"]
|
||||||
+++
|
+++
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
I have been running a ceph cluster in my homelab for about 2 years now, but never properly benchmarked it, let alone wrote down my findings or any potential conclusions.
|
||||||
|
|
||||||
## Setup everything for the Benchmarking
|
## Setup everything for the Benchmarking
|
||||||
On a machine in the ceph-cluster already:
|
On a machine in the ceph-cluster already:
|
||||||
1. Generate a minimal ceph config using `ceph config generate-minimal-conf`
|
1. Generate a minimal ceph config using `ceph config generate-minimal-conf`
|
||||||
@@ -39,33 +42,100 @@ Setup the benchmark itself:
|
|||||||
5. `mount /dev/rbd0 /mnt/bench`
|
5. `mount /dev/rbd0 /mnt/bench`
|
||||||
|
|
||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
All benchmarks are run with the same configuration, only changing the access (read/write, random/sequential).
|
||||||
|
Key configuration options are:
|
||||||
|
- using libaio
|
||||||
|
- direct io
|
||||||
|
- 1 job
|
||||||
|
|
||||||
|
{% details(summary="fio config") %}
|
||||||
|
```
|
||||||
|
[global]
|
||||||
|
ioengine=libaio
|
||||||
|
direct=1
|
||||||
|
size=4G
|
||||||
|
numjobs=1
|
||||||
|
runtime=60s
|
||||||
|
time_based
|
||||||
|
startdelay=5s
|
||||||
|
group_reporting
|
||||||
|
stonewall
|
||||||
|
|
||||||
|
name=write
|
||||||
|
rw=write
|
||||||
|
filename=bench
|
||||||
|
|
||||||
|
[1io_4k]
|
||||||
|
iodepth=1
|
||||||
|
bs=4k
|
||||||
|
[1io_8k]
|
||||||
|
iodepth=1
|
||||||
|
bs=8k
|
||||||
|
[1io_64k]
|
||||||
|
iodepth=1
|
||||||
|
bs=64k
|
||||||
|
[1io_4M]
|
||||||
|
iodepth=1
|
||||||
|
bs=4M
|
||||||
|
|
||||||
|
[32io_4k]
|
||||||
|
iodepth=32
|
||||||
|
bs=4k
|
||||||
|
[32io_8k]
|
||||||
|
iodepth=32
|
||||||
|
bs=8k
|
||||||
|
[32io_64k]
|
||||||
|
iodepth=32
|
||||||
|
bs=64k
|
||||||
|
[32io_4M]
|
||||||
|
iodepth=32
|
||||||
|
bs=4M
|
||||||
|
```
|
||||||
|
{% end %}
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
{% details(summary="Random Reads - 1 Job") %}
|
{% details(summary="Random Reads") %}
|
||||||
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_read.json") }}
|
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_read.json") }}
|
||||||
{% end %}
|
{% end %}
|
||||||
|
|
||||||
{% details(summary="Random Writes - 1 Job") %}
|
{% details(summary="Random Writes") %}
|
||||||
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_write.json") }}
|
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/random_write.json") }}
|
||||||
{% end %}
|
{% end %}
|
||||||
|
|
||||||
{% details(summary="Sequential Reads - 1 Job") %}
|
{% details(summary="Sequential Reads") %}
|
||||||
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_read.json") }}
|
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_read.json") }}
|
||||||
{% end %}
|
{% end %}
|
||||||
|
|
||||||
{% details(summary="Sequential Writes - 1 Job") %}
|
{% details(summary="Sequential Writes") %}
|
||||||
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_write.json") }}
|
{{ fio_benchmark(path="content/ceph-benchmarking/benchmarks/seq_write.json") }}
|
||||||
{% end %}
|
{% end %}
|
||||||
|
|
||||||
## TODO
|
## Conclusion
|
||||||
- Try directly on the block device
|
1. Overall I am satisfied with the performance of the cluster for my current use-case.
|
||||||
- Try this using xfs instead of ext4
|
2. There is a lot of room for improvement in the low queue-depth range
|
||||||
- Try this with and without drive caches
|
3. The network is not really a limiting factor currently
|
||||||
|
- None of the nodes in the cluster exceeded 500MiB/s of TX or RX, so there is plenty of room for growth
|
||||||
|
- My client used for testing was limited by the network, evident by the fact that the highest speed achieved is ~1.2GB/s (~10Gb/s)
|
||||||
|
4. My smallest node (the embedded epyc) could be the limiting factor as in some benchmarks, it reached 100% cpu usage, while my other nodes never exceeded 40%
|
||||||
|
|
||||||
|
## Extra Details
|
||||||
|
{% details(summary="Cluster Hardware") %}
|
||||||
|
- 10 Gb Networking between all nodes
|
||||||
|
- Node
|
||||||
|
- Ryzen 5 5500
|
||||||
|
- 64GB RAM
|
||||||
|
- 4x 480GB enterprise SSD
|
||||||
|
- Node
|
||||||
|
- Ryzen 5 3600
|
||||||
|
- 64GB RAM
|
||||||
|
- 4x 480GB enterprise SSD
|
||||||
|
- Node
|
||||||
|
- EPYC 3151
|
||||||
|
- 64GB RAM
|
||||||
|
- 4x 480GB enterprise SSD
|
||||||
|
{% end %}
|
||||||
|
|
||||||
## Details
|
{% details(summary="Command to convert raw data into data for visualisation") %}
|
||||||
{% details(summary="Command to convert raw data into vis data") %}
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operations: { iops: .write.iops, bw_bytes: .write.bw_bytes } }]' content/ceph-benchmarking/benchmarks/raw_random_write.json | jq '
|
jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operations: { iops: .write.iops, bw_bytes: .write.bw_bytes } }]' content/ceph-benchmarking/benchmarks/raw_random_write.json | jq '
|
||||||
@@ -96,3 +166,8 @@ jq '[.jobs[] | { iodepth: ."job options".iodepth, bs: ."job options".bs, operati
|
|||||||
'
|
'
|
||||||
```
|
```
|
||||||
{% end %}
|
{% end %}
|
||||||
|
|
||||||
|
## Future Work
|
||||||
|
- Try directly on the block device
|
||||||
|
- Try this using xfs instead of ext4
|
||||||
|
- Try this with and without drive caches
|
||||||
|
|||||||
Reference in New Issue
Block a user