Node Level Fencing

In SUSEŽ Linux Enterprise High Availability Extension, the fencing implementation is STONITH (Shoot The Other Node in the Head). It provides the node level fencing. The High Availability Extension includes the stonith command line tool, an extensible interface for remotely powering down a node in the cluster. For an overview of the available options, run stonith --help or refer to the man page of stonith for more information.

STONITH Devices

To use node level fencing, you first of all need to have a fencing device. To get a list of STONITH devices which are supported by the High Availability Extension, run the following command as root on any of the nodes:

stonith -L

STONITH devices may be classified into the following categories:

Power Distribution Units (PDU)

Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.

Uninterruptible Power Supplies (UPS)

An uninterruptible power supply provides emergency power to connected equipment by supplying power from a separate source when utility power is not available.

Blade Power Control Devices

If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers.

Lights-out Devices

The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and in future they may even become standard equipment of of-the-shelf computers. However, they are inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. In that case, the CRM will try to fence the node in vain and this will continue forever because all other resource operations would wait for the fencing/STONITH operation to succeed.

Testing Devices

Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced with real fencing devices.

The choice of the STONITH device depends mainly on your budget and the kind of hardware you use.

STONITH Implementation

The STONITH implementation of SUSEŽ Linux Enterprise High Availability Extension consists of two components:

stonithd

stonithd is a daemon which can be accessed by the local processes or over the network. It accepts commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.

The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.

STONITH Plug-ins

For every supported fencing device there is a STONITH plug-in which is capable of controlling that device. A STONITH plug-in is the interface to the fencing device. All STONITH plug-ins reside in /usr/lib/stonith/plugins on each node. All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device.

Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.