RAIDs 1, 4, and 5 can survive a disk failure. A RAID 1 device survives if all but one mirrored array fails. Its read performance is degraded without the multiple data sources available, but its write performance might actually improve when it does not write to the failed mirrors. During the synchronization of the replacement disk, write and read performance are both degraded. A RAID 5 can survive a single disk failure at a time. A RAID 4 can survive a single disk failure at a time if the disk is not the parity disk.
Disks can fail for many reasons such as the following:
Disk crash
Disk pulled from the system
Drive cable removed or loose
I/O errors
When a disk fails, the RAID removes the failed disk from membership in the RAID, and operates in a degraded mode until the failed disk is replaced by a spare. Degraded mode is resolved for a single disk failure in one of the following ways:
Spare Exists: If the RAID has been assigned a spare disk, the MD driver automatically activates the spare disk as a member of the RAID, then the RAID begins synchronizing (RAID 1) or reconstructing (RAID 4 or 5) the missing data.
No Spare Exists: If the RAID does not have a spare disk, the RAID operates in degraded mode until you configure and add a spare. When you add the spare, the MD driver detects the RAID’s degraded mode, automatically activates the spare as a member of the RAID, then begins synchronizing (RAID 1) or reconstructing (RAID 4 or 5) the missing data.
On failure, md automatically removes the failed drive as a component device in the RAID array. To determine which device is a problem, use mdadm and look for the device that has been reported as “removed”.
Enter the following a a terminal console prompt
mdadm -D /dev/md1
Replace /dev/md1 with the actual path for your RAID.
For example, an mdadm report for a RAID 1 device consisting of /dev/sda2 and /dev/sdb2 might look like this:
blue6:~ # mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sun Jul 2 01:14:07 2006
Raid Level : raid1
Array Size : 180201024 (171.85 GiB 184.53 GB)
Device Size : 180201024 (171.85 GiB 184.53 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Aug 15 18:31:09 2006
State : clean, degraded
Active Devices : 1
Working Devices : 1 Failed Devices : 0
Spare Devices : 0
UUID : 8a9f3d46:3ec09d23:86e1ffbc:ee2d0dd8
Events : 0.174164
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
The “Total Devices : 1”, “Active Devices : 1”, and “Working Devices : 1” indicate that only one of the two devices is currently active. The RAID is operating in a “degraded” state.
The “Failed Devices : 0” might be confusing. This setting has a non-zero number only for that brief period where the md driver finds a problem on the drive and prepares to remove it from the RAID. When the failed drive is removed, it reads “0” again.
In the devices list at the end of the report, the device with the “removed” state for Device 0 indicates that the device has been removed from the software RAID definition, not that the device has been physically removed from the system. It does not specifically identify the failed device. However, the working device (or devices) are listed. Hopefully, you have a record of which devices were members of the RAID. By the process of elimination, the failed device is /dev/sda2.
The “Spare Devices : 0” indicates that you do not have a spare assigned to the RAID. You must assign a spare device to the RAID so that it can be automatically added to the array and replace the failed device.
When a component device fails, the md driver replaces the failed device with a spare device assigned to the RAID. You can either keep a spare device assigned to the RAID as a hot standby to use as an automatic replacement, or assign a spare device to the RAID as needed.
![]() | Important |
|---|---|
Even if you correct the problem that caused the problem disk to fail, the RAID does not automatically accept it back into the array because it is a “faulty object” in the RAID and is no longer synchronized with the RAID. | |
If a spare is available, md automatically removes the failed disk, replaces it with the spare disk, then begins to synchronize the data (for RAID 1) or reconstruct the data from parity (for RAIDs 4 or 5).
If a spare is not available, the RAID operates in degraded mode until you assign spare device to the RAID.
To assign a spare device to the RAID:
Prepare the disk as needed to match the other members of the RAID.
In EVMS, select the (the addspare plug-in for the EVMS GUI).
Select the RAID device you want to manage from the list of Regions, then click .
Select the device to use as the spare disk.
Click .
The md driver automatically begins the replacement and reconstruction or synchronization process.
Monitor the status of the RAID to verify the process has begun.
For information about how monitor RAID status, see Section 6.6, “Monitoring Status for a RAID”.
Continue with Section 6.5.4, “Removing the Failed Disk”.
You can remove the failed disk at any time after it has been replaced with the spare disk. EVMS does not make the device available for other use until you remove it from the RAID. After you remove it, the disk appears in the list in the EVMS GUI, where it can be used for any purpose.
![]() | Note |
|---|---|
If you pull a disk or if it is totally unusable, EVMS no longer recognizes the failed disk as part of the RAID. | |
The RAID device can be active and in use when you remove its faulty object.