Monday, December 11, 2017

ZFS on CentOS : Initial setup

CentOS install

Info on the hardware I'm using can be found here.

For this demo, I'm using a minimal CentOS install. All install options are otherwise default.

Post install run a full system update

#yum update

Installing ZFS

I'm following the instructions here from zfsonlinux, annotating with additional steps I had to use to get things working.

First we need to install two required repositories; the epel-release and zfsonlinux repos

# yum install epel-release
# yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm 

These will give us access to the various non-standard packages we need to complete the install. Now install kerenl-devel and reboot, I find the reboot here helps prevent errors down the line (which I talk more about below).

# yum install kernel-devel
# reboot

Now install zfs, and reboot again (again this helps with errors)

# yum install zfs
# reboot
This should complete without errors (if you get an error about 'requires dkms >= x.x.x.x' make sure the epel-release repo installed/ is working correctly) Now Let's see if it works correcyly with a zpool status. We don't have any pools yet, so this won't generate any info, but it will throw an error if things didn't install correctly

# zpool status
no pools available

However if we do get an error, then we'll have to do a bit of troubleshooting. I've run through this process four times now, and it hasn't worked exactly the same way twice. Here's a few of the errors I ran into.

DKMS required version

As explain above. This error shows up during the install of zfs if the epel-release repo has not been correctly setup. zfs requires a newer version of dkms than is available through the standard repositories.

The ZFS modules are not loaded

The error comes up after running the 'zpool status' command. This error indicates that the kernel modules didn't install correctly during the zfs install. As mentioned before, I haven't found a precise cause of this but there's a few ways to work around it. First get the status of the modules:

# dkms status
There should be two different modules 'spl' and 'zfs'. These modules should be 'installed' however if they are just listed as 'added' try the following:

  • If both show installed, try using modprobe to load the module
    • modprobe zfs
  • If spl is listed as added, reboot the machine this will often allow that module to install. If spl doesn't install with a reboot, try to manually install it:
    • dkms install -m spl -v x.x.x
    • the version x.x.x will be listed in the 'dkms status' command 
  • if spl is installed, but zfs is just 'added', first try a reboot, if that doesn't work try a manual install
    • dkms install -m zfs -v x.x.x
      • version x.x.x will be listed in the 'dkms status' command
    • If manual install throws an error, run the manual install again
      • This did actually happen to me once, manual install failed, ran the exact same command again, then it worked.
    • If manual install continues to fail, we'll need to clear out the module and start over
      • dkms remove -m zfs -v x.x.x --all
      • dkms add -m zfs -v x.x.x
      • dkms install -m zfs -v x.x.x

ZFS on CentOS 7 : Preface and HW info

Preface

These are the notes on my adventure into getting a ZFS server running on centos 7; with the objective of creating a software-defined-storage solution without the hardware compatibility limitations of vendor offerings.

I am using a mixture of enterprise and consumer grade hardware to start, basically the spare stuff I have around the office.

Hardware info

  • Server
    • Dell Poweredge R610
    • 24GB Ram
    • 4x Gbs ethernet
    • HDD
      •  x2 10k SAS 300GB HDD
      • Raid-1 handled by Poweredge controller
      • These are for the OS,  ZFS will not touch these
    • SDD
      • x2 Sandisk SATA 32GB
      • Passed through Poweredge controller to be handled by ZFS
      • These will be used for various caches as I test different tuning options
    • SAS HBA connected to MD3200
  •  JBOD-ish Device
    • DELL MD3200
    • 10x 7.2K SAS 1.2TB HDD
    • Technically this is not, and cannot be, a JBOD. But I've set each drive in it's own disk group, so it presents itself more-or-less like a JBOD.
  • Other Hardware
    • 4 Additional R610 servers, running XenServer to use as clients to test virtualization performance.
    • Dell Force10 10GB switch. This is in use for other things, but should provide sufficient bandwidth since my hosts are limited to 1GB by their hardware 

>> CentOS install and ZFS install

Thursday, December 7, 2017

package: zfs-dkms requires dkms >= 2.2.0.3

Solution

First install EPEL release repo

sudo yum install epel-release

This should fix the initial install issue. However once the install is complete zfs may not work properly, you may see:

# zpool status
The ZFS modules are not loaded
Try running '/sbin/modprobe zfs' as root to load them

However, running modprobe results in the following
 
# modprobe zfs
modprobe: FATAL: Module zfs not found

Looking at DKMS you will see the modules are added, but never installed and attempting to install them fails. This seems to be a bug if you don't install epel-release before installing kernel-devel.
 
# dkms status
spl, 0.7.3: added
zfs, 0.7.3: added

# dkms install -m spl -v 0.7.3
Error ! echo
Your kernel headers for kernel 3.10.0.693.el7.x86_64 cannot be found at
/lib/modules/3.10.0-693.el7.x86_64/build or /lib/modules/3.10.0.693.el7.x86_64/source

A reboot seems to solve the spl module issue, but the zfs module still refuses to install 

#reboot
[truncated reboot output]
 ...
[login]
 ...
#dkms status
spl, 0.7.3 3.10.0-693.11.1.el7.x86_64, x86_64: installed
zfs, 0.7.3: added

# dkms install -m zfs -v 0.7.3
[truncating error message, but there's a lot of exit code status 2, failed to clean build area, unable to remove file messages]

However, then run the zfs install again and it works... for some reason? 

# dkms install -m zfs -v 0.7.3
 [truncating a really long install process output]
DKMS: install completed

# modprobe zfs
# zpool status
no pools available

This doesn't make much sense, but best guess is that the install fails initially (due to the epel-release not being installed) in a not-very-elegant way. This leaves behind junk that causes the install to fail in the future. Rebooting and manually running the install clears that junk. This is all conjecture, but I hope it helps someone!
 

Additional troubleshooting

I've run through this a few times now, and the error doesn't seem to resolve the same way everytime. Most recent time I did it, the restart/manual install didn't work for the zfs module (spl installed after a reboot as before). I had to remove the ZFS module and re-add it.


# dkms remove -m zfs -v 0.7.3 --all
[output from removal]

# dkms add -m zfs -v 0.7.3
[output from add]

# dkms install -m zfs -v 0.7.3
[and it should work now]