Applies to platform: UTM Mercury and Macro 2.4, 2.5, 3.0, UTM Software 2.4, 2.5, UTM 3.0 with RAID
Last updated on: 9th November 2012
All Endian UTM Appliance models "Mercury Pro" and "Macro" have built in RAID-1 support since 2007.
Note
If you want to install an Endian software appliance on a different hardware that has two hard disks and use them as a RAID array, the installer automatically detects them and gives you the choice to enable RAID-1.
This lesson guides you in tackling and resolving common issues that may occur with RAID 1 devices.
Warning
1. Verifying Endian UTM Appliance's RAID status
In order to see if the RAID array of your Endian UTM Appliance is perfectly working or if there is the need for some fix, you can look at the /proc/mdstat special file:
root@endian:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[0]
205760448 blocks [2/2] [UU]
md1 : active raid1 sdb1[1] sda1[0]
38435392 blocks [2/2] [UU]
unused devices: <none>
The above output shows the structure of the RAID in a 2.5 Appliance: md1 is mounted on / as is, while md2 contains a PV (Physical Volume) for LVM containing 4 logical volumes called swap, log, var, and config, mounted as swap space, on /var/log, /var, and /var/efw respectively. Whenever you see the [UU] string, the array is correctly working, while the [_U] strings suggests that the array device is in a degraded state. The output for a 2.4 Appliance may slightly differ from the one above, since the RAID had a different set up.
2. Readding a missing partition from a RAID array
You can detect that a partition (or even a whole disk) in a RAID-1 array is missing by issuing the following command from the console:
root@endian:~# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda3[1]
38893248 blocks [2/1] [UU]
md4 : active raid1 sda4[1]
77762560 blocks [2/1] [_U]
md1 : active raid1 sda1[1]
32064 blocks [2/1] [UU]
In this case, partition sdb4 is damaged. In this case, you should also see also a message like the following one in the /var/log/messages file, showing that the sdb4 partition is no more part of the array :
md: kicking non-fresh sdb4 from array!
While this may usually indicate that the sda4 partition is not working anymore, a first troubleshooting option is to try to rebuild the array, by issuing the following command to readd the partition sdb4:
root@endian:~# mdadm --add /dev/md4 /dev/sdb4
You can then follow the synchronization process by looking again at the /proc/mdstat file, which during the synchronization looks like:
root@endian:~# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda3[2] sdb3[1]
38893248 blocks [2/1] [_U]
[=================>...] recovery = 85.1% (33107136/38893248) finish=2.4min speed=38939K/sec
md4 : active raid1 sda4[2] sdb4[1]
77762560 blocks [2/1] [_U]
resync=DELAYED
md1 : active raid1 sda1[0] sdb1[1]
32064 blocks [2/2] [UU]
You can even follow the process until the end using
root@endian:~# watch cat /proc/mdstat
This command shows every two seconds the actual content of the /proc/mdstat file. To end the show, press CTRL+C.
If this procedure does not help in recreating the array, it becomes necessary to replace the hard disk entirely (see point 2. below).
3. Replacing a faulty hard disk
Replacing an hard disk in a RAID array is a process that takes some time, but is quite easy to achieve in four steps:
- Remove partitions from the array.
- Remove the failed hard disk from the system and plug in the new hard disk.
- Recreate the partition table on the new hard disk and then the RAID array.
- Add the newly created partitions to the RAID array.
Note
We assume that the /dev/sda disk is the good one, while /dev/sdb is the failed one, i.e., the one that must be replaced. If you use LVM partitions, you could have devices named lime /dev/mapper/<somename> instead of /dev/sd*.
- Remove partitions.
- The failed hard disk and all its partitions must be removed from the array. Each partition must be declared as failed and removed from the array, an operation achieved by issuing the following commands:
root@endian:~# mdadm --manage /dev/md0 --fail /dev/sdb1
root@endian:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1When all the partitions are removed from the array -you can check this by looking at file /proc/mdstat and verify that all raid devices (/dev/mdX) are in [_U] state- you can proceed to replace the failed hard disk.
- Replace hard disk.
- The new disk must be physically inserted in its slot, an operation that may require that the system be turned off. Plug also the power cord off, replace the old hard disk with the new one, then plug the power cord in to boot the system.
- Clone the partition table and boot information.
- As soon as the hard disk is recognised by the system, you need to copy the partition table and MBR (Master Boot Record) exactly like it is in the "good" hard disk (i.e., /dev/sda). To achieve this goal, you can use the followingt wo commands:
- Copy MBR (Master Boot Record)
-
root@endian:~# dd if=/dev/sda of=/dev/sdb bs=446 count=1
Copy the partition table:
root@endian:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Add new disk partition to array
When you are done with the copy, you can add the partitions on the new disk to the array using mdadm, proceeding like in step 2. above, "Readding a missing partition from a RAID array".
Commenti