Fencing and STONITH

Contents

9.1. Classes of Fencing
9.2. Node Level Fencing
9.3. STONITH Configuration
9.4. Monitoring Fencing Devices
9.5. Special Fencing Devices
9.6. For More Information

Abstract

Fencing is a very important concept in computer clusters for HA (High Availability). A cluster sometimes detects that one of the nodes is behaving strangely and needs to remove it. This is called fencing and is commonly done with a STONITH resource. Fencing may be defined as a method to bring an HA cluster to a known state.

Every resource in a cluster has a state attached. For example: resource r1 is started on node1. In an HA cluster, such a state implies that resource r1 is stopped on all nodes but node1, because an HA cluster must make sure that every resource may be started on at most one node. Every node must report every change that happens to a resource. The cluster state is thus a collection of resource states and node states.

If, for whatever reason, a state of some node or resource cannot be established with certainty, fencing comes in. Even when the cluster is not aware of what is happening on a given node, fencing can ensure that the node does not run any important resources.

Classes of Fencing

There are two classes of fencing: resource level and node level fencing. The latter is the primary subject of this chapter.

Resource Level Fencing

Using resource level fencing the cluster can ensure that a node cannot access one or more resources. One typical example is a SAN, where a fencing operation changes rules on a SAN switch to deny access from the node.

The resource level fencing may be achieved using normal resources on which the resource you want to protect depends. Such a resource would simply refuse to start on this node and therefore resources which depend on will not run on the same node.

Node Level Fencing

Node level fencing ensures that a node does not run any resources at all. This is usually done in a very simple, yet abrupt way: the node is reset using a power switch. This is necessary when the node becomes unresponsive.

Node Level Fencing

In SUSE® Linux Enterprise High Availability Extension, the fencing implementation is STONITH (Shoot The Other Node in the Head). It provides the node level fencing. The High Availability Extension includes the stonith command line tool, an extensible interface for remotely powering down a node in the cluster. For an overview of the available options, run stonith --help or refer to the man page of stonith for more information.

STONITH Devices

To use node level fencing, you first need to have a fencing device. To get a list of STONITH devices which are supported by the High Availability Extension, run the following command as root on any of the nodes:

stonith -L

STONITH devices may be classified into the following categories:

Power Distribution Units (PDU)

Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.

Uninterruptible Power Supplies (UPS)

A stable power supply provides emergency power to connected equipment by supplying power from a separate source in the event of utility power failure.

Blade Power Control Devices

If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers.

Lights-out Devices

Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular, and in the future they may even become standard on off-the-shelf computers. However, they are inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. In that case, the CRM would continue its attempts to fence the node indefinitely, as all other resource operations would wait for the fencing/STONITH operation to complete.

Testing Devices

Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced with real fencing devices.

The choice of the STONITH device depends mainly on your budget and the kind of hardware you use.

STONITH Implementation

The STONITH implementation of SUSE® Linux Enterprise High Availability Extension consists of two components:

stonithd

stonithd is a daemon which can be accessed by local processes or over the network. It accepts the commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.

The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.

STONITH Plug-ins

For every supported fencing device there is a STONITH plug-in which is capable of controlling said device. A STONITH plug-in is the interface to the fencing device. All STONITH plug-ins reside in /usr/lib/stonith/plugins on each node. All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device.

Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.

STONITH Configuration

To set up fencing, you need to configure one or more STONITH resources—the stonithd daemon requires no configuration. All configuration is stored in the CIB. A STONITH resource is a resource of class stonith (see Section 4.2.2, “Supported Resource Agent Classes”). STONITH resources are a representation of STONITH plug-ins in the CIB. Apart from the fencing operations, the STONITH resources can be started, stopped and monitored, just like any other resource. Starting or stopping STONITH resources means enabling and disabling STONITH in this case. Starting and stopping are thus only administrative operations, and do not translate to any operation on the fencing device itself. However, monitoring does translate to device status.

STONITH resources can be configured just like any other resource. For more information about configuring resources, see Section 5.3.2, “Creating STONITH Resources” or Section 6.3.3, “Creating a STONITH Resource”.

The list of parameters (attributes) depends on the respective STONITH type. To view a list of parameters for a specific device, use the stonith command:

stonith -t stonith-device-type -n

For example, to view the parameters for the ibmhmc device type, enter the following:

stonith -t ibmhmc -n

To get a short help text for the device, use the -h option:

stonith -t stonith-device-type -h

Example STONITH Resource Configurations

In the following, find some example configurations written in the syntax of the crm command line tool. To apply them, put the sample in a text file (for example, sample.txt) and run:

crm < sample.txt

For more information about configuring resources with the crm command line tool, refer to Chapter 6, Configuring and Managing Cluster Resources (Command Line).

[Warning]Testing Configurations

Some of the examples below are for demonstration and testing purposes only. Do not use any of the Testing Configuration examples in real-life cluster scenarios.

Example 9.1. Testing Configuration

configure
primitive st-null stonith:null \
params hostlist="node1 node2"
clone fencing st-null
commit
   

Example 9.2. Testing Configuration

An alternative configuration:

configure
 primitive st-node1 stonith:null \
 params hostlist="node1"
 primitive st-node2 stonith:null \
 params hostlist="node2"
 location l-st-node1 st-node1 -inf: node1
 location l-st-node2 st-node2 -inf: node2
 commit

This configuration example is perfectly alright as far as the cluster software is concerned. The only difference to a real world configuration is that no fencing operation takes place.


Example 9.3. Testing Configuration

A more realistic example (but still only for testing) is the following external/ssh configuration:

configure
 primitive st-ssh stonith:external/ssh \
 params hostlist="node1 node2"
 clone fencing st-ssh
 commit

This one can also reset nodes. The configuration is remarkably similar to the first one which features the null STONITH device. In this example, clones are used. They are a CRM/Pacemaker feature. A clone is basically a shortcut: instead of defining n identical, yet differently-named resources, a single cloned resource suffices. By far the most common use of clones is with STONITH resources, as long as the STONITH device is accessible from all nodes.


Example 9.4. Configuration of an IBM RSA Lights-out Device

The real device configuration is not much different, though some devices may require more attributes. An IBM RSA lights-out device might be configured like this:

configure
primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \
params nodename=node1 ipaddr=192.168.0.101 \
userid=USERID passwd=PASSW0RD
primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \
params nodename=node2 ipaddr=192.168.0.102 \
userid=USERID passwd=PASSW0RD
location l-st-node1 st-ibmrsa-1 -inf: node1
location l-st-node2 st-ibmrsa-2 -inf: node2
commit

In this example, location constraints are used because of the following reason: There is always a certain probability that the STONITH operation is going to fail. Therefore, a STONITH operation (on the node which is the executioner, as well) is not reliable. If the node is reset, then it cannot send the notification about the fencing operation outcome. The only way to do that is to assume that the operation is going to succeed and send the notification beforehand. But problems could arise if the operation fails. Therefore, stonithd refuses to kill its host.


Example 9.5. Configuration of an UPS Fencing Device

The configuration of a UPS type of fencing device is similar to the examples above. The details are left (as an exercise) to the reader. All UPS devices employ the same mechanics for fencing, but how the device itself is accessed varies. Old UPS devices used to have just a serial port, in most cases connected at 1200baud using a special serial cable. Many new ones still have a serial port, but often they also utilize a USB or ethernet interface. The kind of connection you can use is dependent on what the plug-in supports.

For example, compare the apcmaster with the apcsmart device by using the stonith -t stonith-device-type -n command:

stonith -t apcmaster -h

returns the following information:

STONITH Device: apcmaster - APC MasterSwitch (via telnet)
NOTE: The APC MasterSwitch accepts only one (telnet)
connection/session a time. When one session is active,
subsequent attempts to connect to the MasterSwitch will fail.
For more information see http://www.apc.com/
List of valid parameter names for apcmaster STONITH device:
ipaddr
login
 password

With

stonith -t apcsmart -h

you get the following output:

STONITH Device: apcsmart - APC Smart UPS
(via serial port - NOT USB!). 
Works with higher-end APC UPSes, like
Back-UPS Pro, Smart-UPS, Matrix-UPS, etc.
(Smart-UPS may have to be >= Smart-UPS 700?).
See http://www.networkupstools.org/protocols/apcsmart.html
for protocol compatibility details.
For more information see http://www.apc.com/
List of valid parameter names for apcsmart STONITH device:
ttydev
hostlist

The first plug-in supports APC UPS with a network port and telnet protocol. The second plug-in uses the APC SMART protocol over the serial line, which is supported by many different APC UPS product lines.


Constraints Versus Clones

In Section 9.3.1, “Example STONITH Resource Configurations” you learned that there are several ways to configure a STONITH resource: using constraints clones or both. The choice of which construct to use for configuration depends on several factors (nature of the fencing device, number of hosts managed by the device, number of cluster nodes, or personal preference).

In short: if clones are safe to use with your configuration and if they reduce the configuration, then use cloned STONITH resources.

Monitoring Fencing Devices

Just like any other resource, the STONITH class agents also support the monitoring operation which is used for checking status.

[Note]Monitoring STONITH Resources

Monitoring STONITH resources is strongly recommended. Monitor them regularly, yet sparingly.

Fencing devices are an indispensable part of an HA cluster, but the less you need to utilize them, the better. Power management equipment is known to be rather fragile on the communication side. Some devices give up if there is too much broadcast traffic. Some cannot handle more than ten or so connections per minute. Some get confused if two clients try to connect at the same time. Most cannot handle more than one session at a time.

Checking the fencing devices once every couple of hours should be enough in most cases. The probability that within those few hours there will be a need for a fencing operation and that the power switch would fail is usually low.

For detailed information on how to configure monitor operations, refer to Procedure 5.3, “Adding or Modifying Meta and Instance Attributes” for the GUI approach or to Section 6.3.8, “Configuring Resource Monitoring” for the command line approach.

Special Fencing Devices

Apart from plug-ins which handle real STONITH devices, some STONITH plug-ins require additional explanation.

[Warning]For Testing Only

Some of the STONITH plug-ins mentioned below are for demonstration and testing purposes only. Do not use any of following devices in real-life scenarios because this may lead to data corruption and unpredictable results:

  • external/ssh

  • ssh

  • null

external/kdumpcheck

This plug-in is useful for checking if a kernel dump is in progress on a node. If that is the case, it will return true, as if the node has been fenced (it cannot run any resources at that time). This avoids fencing a node that is already down but doing a dump, which takes some time. The plug-in must be used in concert with another, real STONITH device. For more details, see /usr/share/doc/packages/cluster-glue/README_kdumpcheck.txt.

external/sbd

This is a self-fencing device. It reacts to a so-called poison pill which can be inserted into a shared disk. On shared-storage connection loss, it also makes the node cease to operate. Learn how to use this STONITH agent to implement storage based fencing in Chapter 15, Storage Protection. See also http://www.linux-ha.org/wiki/SBD_Fencing for more details.

external/ssh

Another software-based fencing mechanism. The nodes must be able to log in to each other as root without passwords. It takes a single parameter, hostlist, specifying the nodes that it will target. As it is not able to reset a truly failed node, it must not be used for real-life clusters—for testing and demonstration purposes only. Using it for shared storage would result in data corruption.

meatware

meatware requires help from a human to operate. Whenever invoked, meatware logs a CRIT severity message which shows up on the node's console. The operator then confirms that the node is down and issue a meatclient(8) command. This tells meatware that it can inform the cluster that it may consider the node dead. See /usr/share/doc/packages/cluster-glue/README.meatware for more information.

null

This is an imaginary device used in various testing scenarios. It always behaves as if and claims that it has shot a node, but never does anything. Do not use it unless you know what you are doing.

suicide

This is a software-only device, which can reboot a node it is running on, using the reboot command. This requires action by the node's operating system and can fail under certain circumstances. Therefore avoid using this device whenever possible. However, it is safe to use on one-node clusters.

suicide and null are the only exceptions to the do not shoot my host rule.

For More Information

/usr/share/doc/packages/cluster-glue

In your installed system, this directory holds README files for many STONITH plug-ins and devices.

http://www.linux-ha.org/wiki/STONITH

Information about STONITH on the home page of the The High Availability Linux Project.

http://www.clusterlabs.org/doc/crm_fencing.html

Information about fencing on the home page of the Pacemaker Project.

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained

Explains the concepts used to configure Pacemaker. Contains comprehensive and very detailed information for reference.

http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html

Article explaining the concepts of split brain, quorum and fencing in HA clusters.