You should use this driver instead of standard kernel software RAID5 particularly if you are running with some network block devices as array components. But you can use it in quite normal circumstances too, and it will do nothing but good. The driver is entirely compatible with plain RAID5.
How does it work? The driver keeps a bitmap of missed writes in memory, and updates only those marked to the missing disk when that disk comes back on line. Temporary outages are common for network block devices, where the incident will be caused by a network brownout. Or you may have deliberately taken out a disk temporarily in order to blow the dust off! Or it's part of a backup rotation strategy. Anyway, temporary diconnects of component devices are what this driver deals with. In a network setting, it stops the admin going crazy. Which is good.
The bitmap is created piece by piece on demand, so it's
not expensive in memory. A terabyte sized device with blocks of 4K will cost
max 32MB of memory for the bitmap.
The driver is tolerant wrt memory faults
- it'll work if you run out of memory, just with less precision.
cd /tmp; tar xzvf fr5-1.0.tgz; cd fr5-1.0or similar. Substitute /tmp by the directory where you plan on doing the compiling, and substitute "1.0" by the actual version number on the archive.
To use it, you use the raid tools.
raiddev /dev/md0That was for a three-component array with three loop devices as components.
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
persistent-superblock 1
chunk-size 4
device /dev/loop0
raid-disk 0
device /dev/loop1
raid-disk 1
device /dev/loop2
raid-disk 2
mkraid --dangerous-no-resync --force /dev/md0I don't see the point of NOT using --dangerous-no-resync. You can always do the sync a moment or two later.
At this point you can "cat /proc/mdstat" and see how things look. Here is how they should look for the raidstat configuration detailed above.
Personalities : [raid5]
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1] [dev 07:02][2]
1024 blocks
For example, to fault one array component:
raidsetfaulty /dev/md0 /dev/loop1After this, the output from /proc/mdstat will show a failed component. It won't be written to or read:
Personalities : [raid5]Then to put the "failed" component back on line:
read_ahead 4 sectors
fr50 : active fr5 [dev 07:00][0] [dev 07:01][1](F) [dev 07:02][2]
1024 blocks
raidhotadd /dev/md0 /dev/loop1and the situation will return to normal, immediately. Only a few dirtied blocks will have been written to the newly added device.
Personalities : [raid5]If you want to take the "failed" component fully offline, then you must follow the raidsetfaulty with a
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1] [dev 07:02][2]
1024 blocks
raidhotremove /dev/md0 /dev/loop1After this, you can still put the component back with raidhotadd, but the background resync will be total. You really want to avoid that.
Oh yes. You can now mkfs on the device, mount it, write files to it, etc. To stop (and deconfigure) the device, do
raidstop /dev/md0
If you fault one device, then write to the device, then hotadd the faulted device back in, you should be able to see from the kernel messages (use "dmesg") that the resync is intelligent. Here's some dmesg output:
raid5 resync starts on device 0 component 1 for 1024 blocksThis resync only wrote across blocks 0-9, and skipped the rest.
raid5 resynced dirty blocks 0-9
raid5 resync skipped clean blocks 10-1023
raid5 resync terminates with 0 errs on device 0 component 1
raid5 hotadd component 7.1[1] to device 0
While the resync is happening, /proc/mdstat will show progress, like so:
Personalities : [raid5]Peter T. Breuer (ptb@it.uc3m.es ) Jan 2003.
read_ahead 4 sectors
md0 : active raid5 [dev 07:00][0] [dev 07:01][1](F) [dev 07:02][2]
1024 blocks
[=======>.............] resync=35.5% (364/1024)