Tag Archives: ratarmount

Benchmarking ratarmount

Ratarmount is an excellent tool for mounting archives as filesystems, and I use it a lot. Mostly for union-mounting tar.xz telemetry bundles created by sos report. The ratarmount README suggests to prefer indexed tar.xz archives created using pixz for performance, so let’s see what’s the best compression to use.


TL;DRs

  • On huge archives, tar.gz is always fast and fastest, no optimization required.
  • Best to recompress tar.xz to tar.gz for best performance. Recompressing with pixz yields an improvement, but not as much as gzip.
  • On tiny archives close to the host’s RAM size, performance is hard to predict and may put gzip behind.

The “Backup” use-case

For my test to have a bit of a sizable workload, I pick a reasonably-sized tar.gz, a remnant historical backup of a long-gone server:

-rw-r----- 1 root root 1.7G Aug  6 10:16 example.tar.gz

This is stored on a RAID-1 of 7200 rpm hard drives, which should amplify all seek performance issues. 8 GB RAM, 6 physical CPU cores, 12 threads.

I prepare a list of 1000 random files from the archive that I’ll be reading from the mounted archive.

tar ztf example.tar.gz | egrep -v '(/$|(sys|proc|dev|run))' | shuf | head -1000 > example.list

Now, I mount the tar.gz for my baseline measurement.

umount ./mnt; ratarmount example.tar.gz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m13.974s
user    0m1.178s
sys     0m0.984s

I recompress the tar from gzip to pixz and measure again:

gzip -dc example.tar.gz | pixz > example.tar.pxz
umount ./mnt; ratarmount example.tar.pxz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m57.408s
user    0m1.216s
sys     0m0.904s

Multiple times slower! Back to ratarmount‘s README: “In contrast to bzip2 and gzip compressed files, true seeking on XZ and ZStandard files is only possible at block or frame boundaries.”Are you telling me gzip is not the issue and only naively compressed xz is? A quick recompress using vanilla xz and a comparison of that to the pixz compressed archive:

gzip -dc example.tar.gz | xz --threads=$(nproc) > example.tar.xz
umount ./mnt; ratarmount example.tar.xz ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    1m33.549s
user    0m1.109s
sys     0m0.838s

Indeed a noticable, although not huge, penalty compared to pixz. Now that I’m here and wasted this much time, a final measurement using bzip2:

gzip -dc example.tar.gz | bzip2 > example.tar.bz2
umount ./mnt; ratarmount example.tar.bz2 ./mnt
time xargs -I{} md5sum ./mnt/{} < example.list
...
real    0m44.410s
user    0m1.301s
sys     0m1.164s

So ratarmount handles bzip2 around the same speed as xz created by pixz.

Gzip is always fastest, even without any special treatment, and I assume this is because multi-threaded rapidgzip literally is ratarmount’s sister project.


The “Telemetry” use-case

Back to my tiny sosreport files in tar.xz format, still on the 7200-rpm HDD system. For consistency, I’ll use the same md5sum benchmark on 1000 archive members as above.

ls -lh sosreport.tar.xz
-rw------- 1 root root 9.3M Aug 14 16:07 sosreport.tar.xz
tar Jtf sosreport.tar.xz | egrep -v '(/$|(sys|proc|dev|run))' | shuf | head -1000 > sosreport.list
umount ./mnt; ratarmount sosreport.tar.xz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m4.979s
user    0m0.940s
sys     0m0.625s

A conversion to pixz:

xz -dc sosreport.tar.xz | pixz > sosreport.tar.pxz
umount ./mnt; ratarmount sosreport.tar.pxz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m6.847s
user    0m0.829s
sys     0m0.553s

And a conversion to tar.gz:

xz -dc sosreport.tar.xz | gzip > sosreport.tar.gz
umount ./mnt; ratarmount sosreport.tar.gz ./mnt
time xargs -I{} md5sum ./mnt/{} < sosreport.list
...
real    0m13.202s
user    0m0.918s
sys     0m0.598s

gzip is suddenly slower here, and I believe it’s because the file turned out more than 50% larger than the xz versions, both of which are close to the hosts’s RAM size of 8 GB:

-rw-r--r-- 1 root root  14M Aug 14 16:17 sosreport.tar.gz
-rw-r--r-- 1 root root 7.8M Aug 14 16:16 sosreport.tar.pxz
-rw------- 1 root root 9.3M Aug 14 16:07 sosreport.tar.xz

My initial notes on how to install ratarmount in a python virtualenv are documented in Too good to #0013.

Too good to #0013

In this installment:

  • Pipewire Easyeffects with RNNoise on Debian 12 / Bookworm
  • Gnome Night Light from Sunrise to Sunset, without Location Services
  • Revisiting ratarmount

Pipewire Easyeffects with RNNoise on Debian 12 / Bookworm

RNNoise for removing background sound on your microphone is not included in Debian 12, due to licensing issues around its training data. The least painful way to work around this is to install the Easyeffects Flatpak instead of the packaged easyeffects:

sudo apt-get install flatpak gnome-software-plugin-flatpak
sudo apt-get remove easyeffects
flatpak remote-add --if-not-exists flathub https://dl.flathub.org/repo/flathub.flatpakrepo
flatpak install flathub com.github.wwmm.easyeffects

Gnome Night Light from Sunrise to Sunset, without Location Services

Set latitude and longitude manually (coordinates shown are for Frankfurt, Germany: 50° 6′ 38″ N, 8° 40′ 56″ E):

gsettings set \
  org.gnome.settings-daemon.plugins.color \
  night-light-last-coordinates '(50.0, 9.0)'

Revisiting ratarmount

Since my previous look at ratarmount, it went from 0.6.3 to 1.0.0, had the most interesting developments and now mounts archives via HTTP:

virtualenv ~/.local/ratarmount
~/.local/ratarmount/bin/pip3 install -U ratarmount[fsspec]
~/.local/ratarmount/bin/ratarmount
install -D ~/.local/ratarmount/bin/ratarmount ~/bin/

ratarmount -f https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.13.4.tar.xz ~/mnt