Table of Contents

ZFS Quick Guide

The ZFS file system is used widely on BSD, and is coming into more use on Linux. Following are some of my notes on it.

Initial Setup (FreeBSD)

Start ZFS service

FreeBSD comes with the ZFS service installed, but not active. We need to start the service, and also tell the system to start it when the system reboots.

echo 'zfs_enable="YES"' >> /etc/rc.conf
service zfs start

Create a zpool

Now that we have ZFS running, we'll create a zpool, the basic container for all of our stuff. In this case, I want to use the raidz2 for redundancy (two drives are used for checksumming). Since I don't know the correct names for everything, I'll egrep /var/run/dmesg.boot to find them

# find the drives on the system
egrep 'da[0-9]|cd[0-9]' /var/run/dmesg.boot | sort
# we want RAID-6, name it storage, and us /dev/da0 through 7
zpool create -f storage raidz2 /dev/da[01234567]

Create a Dataset

Datasets are just allocations in the file system for specific purposes. Unlike traditional disks and volume managers, space in ZFS is not preallocated. You can create an allocation with various options (using the -o flag). Anything not set will be inherited from the parent.

zfs create -o quota=150G -o atime=off -o compression=lz4 storage/backups/client1/server.client1
# or
zfs create -o quota=150G -o atime=off -o compression=on storage/backups/client1/server.client1

This allocates 150G of space in the zpool storage, turning off atime, but setting compression to lz4. It will be mounted wherever client1 is mounted. Everything else is inherited from client1, which in turn inherits from backup, which inherits from storage.

Use a ZFS Volume for swap space

It is perfectly fine to set up swap on a ZFS volume, but we do want to turn of checksumming. Here, we'll create a 2G ZFS volume named swap (in storage, so storage/swap), add an entry to fstab, and turn it on.

zfs create -V 2G -o checksum=off storage/swap
echo '/dev/zvol/storage/swap none swap sw 0 0' >> /etc/fstab
swapon /dev/zvol/storage/swap

NOTE: you can add space to this swap area in real time, if you do it at a time when the swap is not necessary to the continued operating of the machine. Simply turn off swap, increase the volume size, then turn swap back on again.

# turn off swap
swapoff /dev/zvol/storage/swap
# increase volume size to 4G
zfs set volsize=4096M storage/swap
# check that it worked
zfs get volsize,reservation storage/swap
# turn swap back on (could also use swapon /dev/zvol/storage/swap, but I'm lazy)
swapon -aL

Using a file for swap space

Instead of using a swap partition or zvol, we can simply use a file. In this case, we can add swap space by deleting/recreating the swap file (after turning swap off), but a simpler way if you need more swap space is to simply add a second swap file.

Following code creates an 8G swap file. Note that after reading https://www.cyberciti.biz/faq/create-a-freebsd-swap-file/, I modified my old way of doing this.

# create an 8G swap file
dd if=/dev/zero of=/swap bs=1G count=8
# set permissions
chmod 0600 /swap
# look for an unused md device (ie, not listed)
mdconfig -lv 
cp /etc/fstab /etc/fstab.save
# edit /etc/fstab and add the following line, using the correct md##
joe /etc/fstab
md42	none	swap	sw,file=/swap	0	0
# save your file
# turn on swap
swapon -aq
# look at swap information
swapinfo -k

iSCSI considerations

On FreeBSD, the iSCSI config is /etc/ctl.conf, and the service is ctld

service ctld reload # reread iscsi exports on FreeBSD

iSCSI generally uses volumes which are then exported by the target. I have found that it is useful, from a management perspective, to place them under a dataset strictly for them, since I back up iSCSI volumes on a different timeline than I do other stuff.

A lot of time, my iSCSI targets are just the operating system, and maybe log files. If I also use it for dynamic data (e-mail repo, databases, web sites), I back up through a different means, though a lot of time I use NFS to store this.

I generally create something like storage/iscsi, then create the volumes under that.

zfs create storage/iscsi
zfs create -V 10G storage/iscsi/server1.disk0
zfs create -V 10G storage/iscsi/server1.disk1

This allows me to snapshot my iSCSI volumes with one command, then send them with one:

zfs snapshot -r storage/iscsi@2022-04-20_22-30
zfs send -Ri storage/iscsi@2022-04-20_22-30 | ssh someplace 'zfs recv -Fduv storage/backups/iscsi'

Your iSCSI paths will be the same as you would expect; in this case, stored under /dev/zvol/storage/iscsi

Note: If you have an existing system and want to change, zfs rename is your friend.

  1. On initiator
    1. Shut down access to volume on the iSCSI initiator
    2. detach from iSCSI initiator (not sure if this is required)
  2. On iSCSI target
    1. Rename the volume
       zfs rename storage/volumename storage/iscsi/volumename
    2. Edit the config to point to new location (/etc/ctl.conf on FreeBSD)
    3. reload iscsi service
      service ctld reload # on FreeBSD
  3. On initiator
    1. rescan target
    2. allow access to volume

Getting and setting properties

ZFS has properties that allow you to modify the way the file system works. The following example sets a quota of 100 Megabytes for storage/varlog so our logs do not fill up the system.

Note the last line showing you can set multiple properties at one time by separating the properties by spaces

zfs get all storage/varlog
zfs get quota storage/varlog
zfs set quota=100M storage/varlog
zfs set exec=off checksum=off storage/varlog

If you want to return to the default settings, where a property is inherited from the container, use the following code

zfs inherit -r quota storage/varlog

Deduplication

If you want to see what the effect of deduplication would be, you can use zdb to calculate it. NOTE This will take a long time as it must open every block on the disk and build a deduplication table from it. However, it is well worth doing if you are considering adding the additional complexity of deduplication.

zdb -S -U /path/to/some/cache/file poolname

You can find the name of the cache file (if configured) with

zpool get cachefile poolname

Dedup requires 320 bytes per block of memory, so you can take the output of the above command, multiply by 320, and see the amount of RAM which will be required to run dedup. NOTE: This can grow as more unique blocks are allocated.

Find differences between two snapshots

So, you have zfs running, and you have some automated process running every morning at 4am. Now, you want to see what changed in the 24 hour period.

# get a list of snapshots so we know exactly what to ask for
zfs list -r -t snapshot storage/someplace
# find the differences. First one should be the most recent
zfs diff storage/someplace/subdir@20181209_041339 storage/someplace/subdir@20181209_041339
# same thing, but looking only for one subdir named joe
zfs diff storage/someplace/subdir@20181209_041339 storage/someplace/subdir@20181208_041339 | grep '/joe/'

output is similar to Subversion, ie a 'M' indicates it was modified, a - indicates it was removed, and a + indicates it was added. Note that your snapshots do not have to be consecutive; you can look at the diff between your oldest and newest snapshot, or even your current copy, as this shows:

# compare snapshot taken 20181209_041339 with the current copy
zfs diff storage/someplace/subdir@20181209_041339 storage/someplace/subdir

Useful commands

References