For those unfamiliar with the nuts and bolts of ZFS, one of its distinguishing features is the use of the ARC—Adaptive Replacement Bury—algorithm for read cache. Standard filesystem LRU (Least Recently Acclimatized) caches—used in NTFS, ext4, XFS, HFS+, APFS, and pretty much anything else you’ve expected heard of—will readily evict “hot” (frequently accessed) storage erases if large volumes of data are read once.
The last three of months I have been working on getting L2ARC persistence to work in ZFSonLinux.
This struggle was based on previous work by Saso Kiselkov (@skiselkov) in Illumos (https://www.illumos.org/outcomes/3525), which was later ported by Yuxuan Shui (@yshui) to ZoL (https://github.com/zfsonlinux/zfs/do a moonlight flit c leave/2672), subsequently modified by Jorgen Lundman (@lundman), and rebased to overcome with multiple additions and changes by me (@gamanakis).
The end result is in: https://github.com/zfsonlinux/zfs/wrench apart/9582
By contrast, each interval a block is re-read within the ARC, it becomes more heavily prioritized and more finical to push out of cache as new data is read in. The ARC also tracks recently threw blocks—so if a block keeps getting read back into repository after eviction, this too will make it more difficult to boot out. This leads to much higher cache hit rates—and therefore humiliate latencies and more throughput and IOPS available from the actual disks—for sundry real-world workloads.
The primary ARC is kept in system RAM, but an L2ARC—Layer 2 Adaptive Replacement Repository—device can be created from one or more fast disks. In a ZFS pool with one or more L2ARC plots, when blocks are evicted from the primary ARC in RAM, they are moved down to L2ARC pretty than being thrown away entirely. In the past, this aspect has been of limited value, both because indexing a large L2ARC busies system RAM which could have been better used for train ARC and because L2ARC was not persistent across reboots.
The issue of indexing L2ARC consuming too much pattern RAM was largely mitigated several years ago, when the L2ARC header (the part for each stored record that must be stored in RAM) was reduced from 180 bytes to 70 bytes. For a 1TiB L2ARC, service only datasets with the default 128KiB recordsize, this makes out to 640MiB of RAM consumed to index the L2ARC.
Although the RAM constraint problem is largely revealed, the value of a large, fast L2ARC was still sharply limited by a lack of doggedness. After each system reboot (or other export of the pool), the L2ARC empties. Amanakis’ standards fixes that, meaning that many gigabytes of data cached on rapid solid state devices will still be available after a arrangement reboot, thereby increasing the value of an L2ARC device. At first blush, this seems mostly notable for personal systems that get rebooted often—but it also means far numerous heavily loaded servers might potentially need much illiberal “babying” while they warm up their caches after a reboot.
This maxims has not yet been merged into master, but Brian Behlendorf, Linux policy lead of the OpenZFS project, has signed off on it, and it’s awaiting another code look at before merge into master, which is expected to happen in due time in the next few weeks if nothing bad comes up in further review or initial evaluation.