FR5 - fast intelligent software RAID5 linux driver


SourceForge.net Logo Support This Project
Peter T. Breuer
August 2004
This is the fast software RAID5 linux driver, "fr5". It's intelligent. That is, it doesn't blindly resynchronize a whole array component when only a few blocks need resyncing. It resyncs only those blocks it need to. That can save hours of resync time on a large device.

You should use this driver instead of standard kernel software RAID5 particularly if you are running with some network block devices as array components. But you can use it in quite normal circumstances too, and it will do nothing but good. The driver is entirely compatible with plain RAID5.

How does it work? The driver keeps a bitmap of missed writes in memory, and updates only those marked to the missing disk when that disk comes back on line. Temporary outages are common for network block devices, where the incident will be caused by a network brownout. Or you may have deliberately taken out a disk temporarily in order to blow the dust off! Or it's part of a backup rotation strategy. Anyway, temporary diconnects of component devices are what this driver deals with. In a network setting, it stops the admin going crazy. Which is good.

The bitmap is created piece by piece on demand, so it's not expensive in memory. A terabyte sized device with blocks of 4K will cost max 32MB of memory for the bitmap. The driver is tolerant wrt memory faults - it'll work if you run out of memory, just with less precision.

  1. DOWNLOAD
  2. HOW TO EXTRACT THE FILES FROM THE ARCHIVE
  3. HOW TO COMPILE THE DRIVER
  4. HOW TO USE IT

HOW TO EXTRACT THE FILES FROM THE ARCHIVE

Let's start with the basics. Do:
cd /tmp; tar xzvf fr5-1.0.tgz; cd fr5-1.0
or similar. Substitute /tmp by the directory where you plan on doing the compiling, and substitute "1.0" by the actual version number on the archive.

HOW TO COMPILE THE DRIVER

  1. Edit the Makefile in the source directory, change LINUXDIR to point to the kernel source for your target kernel (that'll be /usr/local/src/linux-2.4.20, or some nearby approximate), and if you are compiling for an SMP machine, set SMPOPTS to "-D__SMP__", otherwise set it to "" (empty string);
  2. type "make" and wait till cooked - you'll find the results of the cooking below the build/linux-blah.blah.blah/ subdir;
  3. put the freshly built fr5.o and bitmap.o modules in the misc/ subdirectory of your kernel modules in /lib/modules/blah.blah.blah/ and replace the kernel md.o module with the md.o module that just got made.
  4. run /sbin/depmod -a, if you are running under the target kernel right now.

HOW TO USE IT

    Insert the module into the kernel with "insmod md.o; insmod bitmap.o; modprobe xor; insmod fr5.o" or "modprobe fr5". It'll be visible with lsmod.


To use it, you use the raid tools.

  1. edit /etc/raidtab and put in an entry for a typical raid5 array device for /dev/md0. Here's an example:
raiddev /dev/md0
    raid-level               5
    nr-raid-disks            3
    nr-spare-disks           0
    persistent-superblock    1
    chunk-size               4
    device                   /dev/loop0
    raid-disk                0
    device                   /dev/loop1
    raid-disk                1
    device                   /dev/loop2
    raid-disk                2
That was for a three-component array with three loop devices as components.
  1. make the array in the usual way with the mkraid utility. For example:
  mkraid --dangerous-no-resync --force /dev/md0
I don't see the point of NOT using --dangerous-no-resync. You can always do the sync a moment or two later.

At this point you can "cat /proc/mdstat" and see how things look. Here is how they should look for the raidstat configuration detailed above.

Personalities : [raid5]
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1] [dev 07:02][2]
        1024 blocks
  1. You can now manipulate the array with the raidsetfaulty, raidhotremove, and raidhotadd tools. Raidstop and raidstart might also be useful.
The only difference here over normal RAID5 is that a raidhotadd will WORK directly after a raidsetfaulty. You don't have to do a raidhotremove first. If you do the raidhotadd directly after a raidsetfaulty, then ONLY THE BLOCKS NOT WRITTEN IN THE INTERVAL are resynced. Not the whole device. So you want to do this! It's a "hot repair". Hot repairs happen automatically when the component device drops off- and back on-line, but you can force it yourself with raidetfaulty directly followed by raidhotadd.

For example, to fault one array component:

raidsetfaulty /dev/md0 /dev/loop1
After this, the output from /proc/mdstat will show a failed component. It won't be written to or read:
Personalities : [raid5]
read_ahead 4 sectors
fr50 : active fr5 [dev 07:00][0] [dev 07:01][1](F) [dev 07:02][2]
        1024 blocks
Then to put the "failed" component back on line:
raidhotadd /dev/md0 /dev/loop1
and the situation will return to normal, immediately. Only a few dirtied blocks will have been written to the newly added device.
Personalities : [raid5]
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1] [dev 07:02][2]
        1024 blocks
If you want to take the "failed" component fully offline, then you must follow the raidsetfaulty with a
raidhotremove /dev/md0 /dev/loop1
After this, you can still put the component back with raidhotadd, but the background resync will be total. You really want to avoid that.

Oh yes. You can now mkfs on the device, mount it, write files to it, etc. To stop (and deconfigure) the device, do

raidstop /dev/md0

If you fault one device, then write to the device, then hotadd the faulted device back in, you should be able to see from the kernel messages (use "dmesg") that the resync is intelligent. Here's some dmesg output:

raid5 resync starts on device 0 component 1 for 1024 blocks
raid5 resynced dirty blocks 0-9
raid5 resync skipped clean blocks 10-1023
raid5 resync terminates with 0 errs on device 0 component 1
raid5 hotadd component 7.1[1] to device 0
This resync only wrote across blocks 0-9, and skipped the rest.

While the resync is happening, /proc/mdstat will show progress, like so:

Personalities : [raid5]
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1](F) [dev 07:02][2]
        1024 blocks
         [=======>.............]  resync=35.5% (364/1024)
Peter T. Breuer  (ptb@it.uc3m.es )  Jan 2003.