The tab in EVMS GUI (evmsgui) reports any software RAID devices that are defined and whether they are currently active.
A summary of RAID and status information (active/not active) is available in the /proc/mdstat file.
Open a terminal console, then log in as the root user or equivalent.
View the /proc/mdstat file by entering the following at the console prompt:
cat /proc/mdstat
Evaluate the information.
The following table shows an example output and how to interpret the information.
To view the RAID status with the mdadm command, enter the following at a terminal prompt:
mdadm -D /dev/mdx
Replace mdx with the RAID device number.
In the following example, only four of the five devices in the RAID are active (Raid Devices : 5, Total Devices : 4). When it was created, the component devices in the device were numbered 0 to 5 and are ordered according to their alphabetic appearance in the list where they were chosen, such as /dev/sdg1, /dev/sdh1, /dev/sdi1, /dev/sdj1, and /dev/sdk1. From the pattern of filenames of the other devices, you determine that the device that was removed was named /dev/sdh1.
/dev/md0: |
Version : 00.90.03 |
Creation Time : Sun Apr 16 11:37:05 2006 |
Raid Level : raid5 |
Array Size : 35535360 (33.89 GiB 36.39 GB) |
Device Size : 8883840 (8.47 GiB 9.10 GB) |
Raid Devices : 5 |
Total Devices : 4 |
Preferred Minor : 0 |
Persistence : Superblock is persistent |
Update Time : Mon Apr 17 05:50:44 2006 |
State : clean, degraded |
Active Devices : 4 |
Working Devices : 4 |
Failed Devices : 0 |
Spare Devices : 0 |
Layout : left-symmetric |
Chunk Size : 128K |
UUID : 2e686e87:1eb36d02:d3914df8:db197afe |
Events : 0.189 |
Number Major Minor RaidDevice State |
0 8 97 0 active sync /dev/sdg1 |
1 8 0 1 removed |
2 8 129 2 active sync /dev/sdi1 |
3 8 45 3 active sync /dev/sdj1 |
4 8 161 4 active sync /dev/sdk1 |
In the following mdadm report, only 4 of the 5 disks are active and in good condition (Active Devices : 4, Working Devices : 5). The failed disk was automatically detected and removed from the RAID (Failed Devices: 0). The spare was activated as the replacement disk, and has assumed the diskname of the failed disk (/dev/sdh1). The faulty object (the failed disk that was removed from the RAID) is not identified in the report. The RAID is running in degraded mode (State : clean, degraded, recovering). The data is being rebuilt (spare rebuilding /dev/sdh1), and the process is 3% complete (Rebuild Status : 3% complete ).
mdadm -D /dev/md0 |
/dev/md0: |
Version : 00.90.03 |
Creation Time : Sun Apr 16 11:37:05 2006 |
Raid Level : raid5 |
Array Size : 35535360 (33.89 GiB 36.39 GB) |
Device Size : 8883840 (8.47 GiB 9.10 GB) |
Raid Devices : 5 |
Total Devices : 5 |
Preferred Minor : 0 |
Persistence : Superblock is persistent |
Update Time : Mon Apr 17 05:50:44 2006 |
State : clean, degraded, recovering |
Active Devices : 4 |
Working Devices : 5 |
Failed Devices : 0 |
Spare Devices : 1 |
Layout : left-symmetric |
Chunk Size : 128K |
Rebuild Status : 3% complete |
UUID : 2e686e87:1eb36d02:d3914df8:db197afe |
Events : 0.189 |
Number Major Minor RaidDevice State |
0 8 97 0 active sync /dev/sdg1 |
1 8 113 1 spare rebuilding /dev/sdh1 |
2 8 129 2 active sync /dev/sdi1 |
3 8 145 3 active sync /dev/sdj1 |
4 8 161 4 active sync /dev/sdk1 |
You can follow the progress of the synchronization or reconstruction process by examining the /proc/mdstat file.
You can control the speed of synchronization by setting parameters in the /proc/sys/dev/raid/speed_limit_min and /proc/sys/dev/raid/speed_limit_max files. To speed up the process, echo a larger number into the speed_limit_min file.
You might want to configure the mdadm service to send an e-mail alert for software RAID events. Monitoring is only meaningful for RAIDs 1, 4, 5, 6, 10 or multipath arrays because only these have missing, spare, or failed drives to monitor. RAID 0 and Linear RAIDs do not provide fault tolerance so they have no interesting states to monitor.
The following table identifies RAID events and indicates which events trigger e-mail alerts. All events cause the program to run. The program is run with two or three arguments: the event name, the array device (such as /dev/md1), and possibly a second device. For Fail, Fail Spare, and Spare Active, the second device is the relevant component device. For MoveSpare, the second device is the array that the spare was moved from.
Table 6.8. RAID Events in mdadm
|
RAID Event |
Trigger E-Mail Alert |
Description |
|---|---|---|
|
Device Disappeared |
No |
An md array that was previously configured appears to no longer be configured. (syslog priority: Critical) If mdadm was told to monitor an array which is RAID0 or Linear, then it reports DeviceDisappeared with the extra information Wrong-Level. This is because RAID0 and Linear do not support the device-failed, hot-spare, and resynchronize operations that are monitored. |
|
Rebuild Started |
No |
An md array started reconstruction. (syslog priority: Warning) |
|
Rebuild NN |
No |
Where NN is 20, 40, 60, or 80. This indicates the percent completed for the rebuild. (syslog priority: Warning) |
|
Rebuild Finished |
No |
An md array that was rebuilding is no longer rebuilding, either because it finished normally or was aborted. (syslog priority: Warning) |
|
Fail |
Yes |
An active component device of an array has been marked as faulty. (syslog priority: Critical) |
|
Fail Spare |
Yes |
A spare component device that was being rebuilt to replace a faulty device has failed. (syslog priority: Critical) |
|
Spare Active |
No |
A spare component device that was being rebuilt to replace a faulty device has been successfully rebuilt and has been made active. (syslog priority: Info) |
|
New Array |
No |
A new md array has been detected in the |
|
Degraded Array |
Yes |
A newly noticed array appears to be degraded. This message is not generated when mdadm notices a drive failure that causes degradation. It is generated only when mdadm notices that an array is degraded when it first sees the array. (syslog priority: Critical) |
|
Move Spare |
No |
A spare drive has been moved from one array in a spare group to another to allow a failed drive to be replaced. (syslog priority: Info) |
|
Spares Missing |
Yes |
The |
|
Test Message |
Yes |
An array was found at startup, and the |
To configure an e-mail alert:
At a terminal console, log in as the root user.
Edit the /etc/mdadm/mdadm.conf file to add your e-mail address for receiving alerts. For example, specify the MAILADDR value (using your own e-mail address, of course):
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=1c661ae4:818165c3:3f7a4661:af475fda
devices=/dev/sdb3,/dev/sdc3
MAILADDR yourname@example.com
The MAILADDR line gives an e-mail address that alerts should be sent to when mdadm is running in --monitor mode with the --scan option. There should be only one MAILADDR line in mdadm.conf, and it should have only one address.
Start mdadm monitoring by entering the following at the terminal console prompt:
mdadm --monitor --mail=yourname@example.com--delay=1800/dev/md0
The --monitor option causes mdadm to periodically poll a number of md arrays and to report on any events noticed. mdadm never exits once it decides that there are arrays to be checked, so it should normally be run in the background.
In addition to reporting events in this mode, mdadm might move a spare drive from one array to another if they are in the same spare-group and if the destination array has a failed drive but no spares.
Listing the devices to monitor is optional. If any devices are listed on the command line, mdadm monitors only those devices. Otherwise, all arrays listed in the configuration file are monitored. Further, if --scan option is added in the command, then any other md devices that appear in /proc/mdstat are also monitored.
For more information about using mdadm, see the mdadm(8) and mdadm.conf(5) man pages.
To configure the /etc/init.d/mdadmd service as a script:
suse:~ # egrep 'MAIL|RAIDDEVICE' /etc/sysconfig/mdadm
MDADM_MAIL="yourname@example.com"
MDADM_RAIDDEVICES="/dev/md0"
MDADM_SEND_MAIL_ON_START=no
suse:~ # chkconfig mdadmd --list
mdadmd 0:off 1:off 2:off 3:on 4:off 5:on 6:off