Benchmarking ratarmount

Ratarmount is an excellent tool for mounting archives as filesystems, and I use it a lot. Mostly for union-mounting tar.xz telemetry bundles created by sos report. The ratarmount README suggests to prefer indexed tar.xz archives created using pixz for performance, so let’s see what’s the best compression to use.


TL;DRs

  • On huge archives, tar.gz is always fast and fastest, no optimization required.
  • Best to recompress tar.xz to tar.gz for best performance. Recompressing with pixz yields an improvement, but not as much as gzip.
  • On tiny archives close to the host’s RAM size, performance is hard to predict and may put gzip behind.

The “Backup” use-case

For my test to have a bit of a sizable workload, I pick a reasonably-sized tar.gz, a remnant historical backup of a long-gone server:

-rw-r----- 1 root root 1.7G Aug  6 10:16 example.tar.gz

This is stored on a RAID-1 of 7200 rpm hard drives, which should amplify all seek performance issues. 8 GB RAM, 6 physical CPU cores, 12 threads.

I prepare a list of 1000 random files from the archive that I’ll be reading from the mounted archive.

tar ztf example.tar.gz | egrep -v '(/$|(sys|proc|dev|run))' | shuf | head -1000 > example.list

Now, I mount the tar.gz for my baseline measurement.

umount ./mnt; ratarmount example.tar.gz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m13.974s
user    0m1.178s
sys     0m0.984s

I recompress the tar from gzip to pixz and measure again:

gzip -dc example.tar.gz | pixz > example.tar.pxz
umount ./mnt; ratarmount example.tar.pxz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m57.408s
user    0m1.216s
sys     0m0.904s

Multiple times slower! Back to ratarmount‘s README: “In contrast to bzip2 and gzip compressed files, true seeking on XZ and ZStandard files is only possible at block or frame boundaries.”Are you telling me gzip is not the issue and only naively compressed xz is? A quick recompress using vanilla xz and a comparison of that to the pixz compressed archive:

gzip -dc example.tar.gz | xz --threads=$(nproc) > example.tar.xz
umount ./mnt; ratarmount example.tar.xz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    1m33.549s
user    0m1.109s
sys     0m0.838s

Indeed a noticable, although not huge, penalty compared to pixz. Now that I’m here and wasted this much time, a final measurement using bzip2:

gzip -dc example.tar.gz | bzip2 > example.tar.bz2
umount ./mnt; ratarmount example.tar.bz2 ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m44.410s
user    0m1.301s
sys     0m1.164s

So ratarmount handles bzip2 around the same speed as xz created by pixz.

Gzip is always fastest, even without any special treatment, and I assume this is because multi-threaded rapidgzip literally is ratarmount’s sister project.


The “Telemetry” use-case

Back to my tiny sosreport files in tar.xz format, still on the 7200-rpm HDD system. For consistency, I’ll use the same md5sum benchmark on 1000 archive members as above.

ls -lh sosreport.tar.xz
-rw------- 1 root root 9.3M Aug 14 16:07 sosreport.tar.xz
tar Jtf sosreport.tar.xz | egrep -v '(/$|(sys|proc|dev|run))' | shuf | head -1000 > sosreport.list
umount ./mnt; ratarmount sosreport.tar.xz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m4.979s
user    0m0.940s
sys     0m0.625s

A conversion to pixz:

xz -dc sosreport.tar.xz | pixz > sosreport.tar.pxz
umount ./mnt; ratarmount sosreport.tar.pxz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m6.847s
user    0m0.829s
sys     0m0.553s

And a conversion to tar.gz:

xz -dc sosreport.tar.xz | gzip > sosreport.tar.gz
umount ./mnt; ratarmount sosreport.tar.gz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m13.202s
user    0m0.918s
sys     0m0.598s

gzip is suddenly slower here, and I believe it’s because the file turned out more than 50% larger than the xz versions, both of which are close to the hosts’s RAM size of 8 GB:

-rw-r--r-- 1 root root  14M Aug 14 16:17 sosreport.tar.gz
-rw-r--r-- 1 root root 7.8M Aug 14 16:16 sosreport.tar.pxz
-rw------- 1 root root 9.3M Aug 14 16:07 sosreport.tar.xz

My initial notes on how to install ratarmount in a python virtualenv are documented in Too good to #0013.