Chapter 5. Manual Configuration of a Cluster

Contents

5.1. Configuration Basics
5.2. Configuring Resources
5.3. Configuring Constraints
5.4. Configuring CRM Options
5.5. For More Information

Manual configuration of a Heartbeat cluster is often the most effective way of creating a reliable cluster that meets specific needs. Because of the extensive configurability of Heartbeat and the range of needs it can meet, it is not possible to document every possible scenario. To introduce several concepts of the Heartbeat configuration and demonstrate basic procedures, consider a real world example of an NFS file server. The goal is to create an NFS server that can be built with very low-cost parts and is as redundant as possible. For this, set up the following cluster:

Before starting with the cluster configuration, set up two nodes as described in Chapter 2, Installation and Setup. In addition to the system installation, both should have a data partition of the same size to setup drbd.

The configuration splits into two main parts. First, all the resources must be configured. After this, create a set of constraints that define the starting rules for those resources.

All the configuration data is written in XML. For convenience, the example relies on snippets that may be loaded into the cluster configuration individually.

5.1. Configuration Basics

The cluster is divided into two main sections, configuration and status. The status section contains the history of each resource on each node and based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for the status section is the local resource manager (lrmd) process on each cluster node. The cluster will occasionally repopulate the entire section. For this reason it is never written to disk and administrators are advised against modifying it in any way.

The configuration section contains the more traditional information like cluster options, lists of resources and indications of where they should be placed. It is the primary focus of this document and is divided into four parts:

  • Configuration options (called crm_config)

  • Nodes

  • Resources

  • Resource relationships (called constraints)

Example 5.1. Structure of an Empty Configuration

<cib generated="true" admin_epoch="0" epoch="0" num_updates="0" have_quorum="false">
  <configuration>
    <crm_config/>
    <nodes/>
    <resources/>
    <constraints/>
  </configuration>
  <status/>
</cib>

5.1.1. The Current State of the Cluster

Before you start to configure a cluster, it is worth explaining how to view the finished product. For this purpose use the crm_mon utility that will display the current state of an active cluster. It can show the cluster status by node or by resource and can be used in either single-shot or dynamically-updating mode. Using this tool, you can examine the state of the cluster for irregularities and see how it responds when you cause or simulate failures.

Details on all the available options can be obtained using the crm_mon --help command.

5.1.2. Updating the Configuration

There is a basic warning for updating the cluster configuration:

[Warning]Rules For Updating the Configuration

Never edit the cib.xml file manually, otherwise the cluster will refuse to use the configuration. Instead, always use the cibadmin tool to change your configuration.

To modify your cluster configuration, use the cibadmin command which talks to a running cluster. With cibadmin, you can query, add, remove, update or replace any part of the configuration. All changes take effect immediately and there is no need to perform a reload-like operation.

The simplest way of using cibadmin is a three-step procedure:

  1. Save the current configuration to a temporary file:

    cibadmin --cib_query > /tmp/tmp.xml
  2. Edit the temporary file with your favorite text or XML editor.

    Some of the better XML editors are able to use the DTD (document type definition) to make sure that any changes you make are valid. The DTD describing the configuration can be found in /usr/lib/heartbeat/crm.dtd on your systems.

  3. Upload the revised configuration:

    cibadmin --cib_replace --xml-file /tmp/tmp.xml

If you only want to modify the resources section, do the following to avoid modifying any other part of the configuration:

cibadmin --cib_query --obj_type resources > /tmp/tmp.xml
vi /tmp/tmp.xml
cibadmin --cib_replace --obj_type resources --xml-file /tmp/tmp.xml

5.1.3. Quickly Deleting Part of the Configuration

Sometimes it is necessary to delete an object quickly. This can be done in three easy steps:

  1. Identify the object you wish to delete, for example:

    cibadmin -Q | grep stonith
     <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/>
     <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="1"/>
     <primitive id="child_DoFencing" class="stonith" type="external/vmware">
     <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:1" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:2" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:0" type="external/vmware" class="stonith">
     <lrm_resource id="child_DoFencing:3" type="external/vmware" class="stonith">
  2. Identify the resource’s tag name and id (in this case primitive and child_DoFencing.

  3. Execute cibadmin:

    cibadmin --cib_delete --crm_xml ‘<primitive id=”child_DoFencing”/>’

5.1.4. Updating the Configuration Without Using XML

Some common tasks can also be performed with one of the higher level tools that avoid the need to read or edit XML. Run the following command to enable STONITH, for example:

crm_attribute --attr-name stonith-enabled --attr-value true

Or to see if somenode is allowed to run resources, there is:

crm_standby --get-value --node-uname somenode

Or to find the current location of my-test-rsc one can use:

crm_resource --locate --resource my-test-rsc

5.1.5. Testing Your Configuration

It is not necessary to modify a real cluster in order to test the effect of the configuration changes. Do the following to test your modifications:

  1. Save the current configuration to a temporary file:

    cibadmin --cib_query > /tmp/tmp.xml
  2. Edit the temporary file with your favorite text or XML editor.

  3. Simulate the effect of the changes with ptest:

    ptest -VVVVV --xml-file /tmp/tmp.xml --save-graph tmp.graph --save-dotfile tmp.dot

The tool uses the same library as the live cluster to show the impact it would have done. Its output, in addition to a significant amount of logging, is stored in two files, tmp.graph and tmp.dot. Both files are representations of the same thing—the cluster’s response to your changes. In the graph file the complete transition is stored, containing a list of all actions, their parameters and their prerequisites. The transition graph is not very easy to read. Therefore, the tool also generates a Graphviz dot-file representing the same information.

5.2. Configuring Resources

There are three types of RAs (Resource Agents) available with Heartbeat. First, there are legacy Heartbeat 1 scripts. Heartbeat can make use of LSB initialization scripts. Finally, Heartbeat has its own set of OCF (Open Cluster Framework) agents. This documentation concentrates on LSB scripts and OCF agents.

5.2.1. LSB Initialization Scripts

All LSB scripts are commonly found in the directory /etc/init.d. They must have several actions implemented, which are at least start, stop, restart, reload, force-reload, and status as explained in http://www.linux-foundation.org/spec/refspecs/LSB_1.3.0/gLSB/gLSB/iniscrptact.html.

The configuration of those services is not standardized. If you intend to use an LSB script with Heartbeat, make sure that you understand how the respective script is configured. Often you can find some documentation to this in the documentation of the respective package in /usr/share/doc/packages/<package_name>.

When used by Heartbeat, the service should not be touched by other means. This means that it should not be started or stopped on boot, reboot, or manually. However, if you want to check if the service is configured properly, start it manually, but make sure that it is stopped again before Heartbeat takes over.

Before using an LSB resource, make sure that the configuration of this resource is present and identical on all cluster nodes. The configuration is not managed by Heartbeat. You must take care of that yourself.

5.2.2. OCF Resource Agents

All OCF agents are located in /usr/lib/ocf/resource.d/heartbeat/. These are small programs that have a functionality similar to that of LSB scripts. However, the configuration is always done with environment variables. All OCF Resource Agents are required to have at least the actions start, stop, status, monitor, and meta-data. The meta-data action retrieves information about how to configure the agent. For example, if you want to know more about the IPaddr agent, use the command:

/usr/lib/ocf/resource.d/heartbeat/IPaddr meta-data

The output is lengthy information in a simple XML format. You can validate the output with the ra-api-1.dtd DTD. Basically this XML format has three sections—first several common descriptions, second all the available parameters, and last the available actions for this agent.

A typical parameter of a OCF RA as shown with the meta-data command looks like this:

<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="apache"> 1
  <!-- Some elements omitted -->
  <parameter name="ip" unique="1" required="1">2
    <longdesc lang="en">3
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".
    </longdesc>
    <shortdesc lang="en">IPv4 address</shortdesc>
    <content type="string" default="" />4
  </parameter>
</resource-agent>

This is part of the IPaddr RA. The information about how to configure the parameter of this RA can be read as follows:

1

Root element for each output.

2

The name of the nvpair to configure is ip. This RA attribute is mandatory for the configuration.

3

The description of the parameter is available in a long and a short description tag.

4

The content of the value of this parameter is a string. There is no default value available for this resource.

Find a configuration example for this RA at Chapter 3, Setting Up a Simple Resource.

5.2.3. Example Configuration for an NFS Server

To set up the NFS server, three resources are needed: a file system resource, a drbd resource, and a group of an NFS server and an IP address. You can write each of the resource configurations to a separate file then load them to the cluster with cibadmin -C -o resources -x resource_configuration_file.

5.2.3.1. Setting Up a File System Resource

The filesystem resource is configured as an OCF primitive resource. It has the task to mount and unmount a device to a directory on start and stop requests. In this case, the device is /dev/drbd0 and the directory to use as mount point is /srv/failover. The file system used is reiserfs.

The configuration for this resource looks like the following:

<primitive id="filesystem_resource" class="ocf" provider="heartbeat" type="Filesystem">
  <instance_attributes id="ia-filesystem_1">
    <attributes>
      <nvpair id="filesystem-nv-1" name="device" value="/dev/drbd0"/>
      <nvpair id="filesystem-nv-2" name="directory" value="/srv/failover"/>
      <nvpair id="filesystem-nv-3" name="fstype" value="reiserfs"/>
    </attributes>
  </instance_attributes>
</primitive>
    

5.2.3.2. Configuring drbd

Before starting with the drbd Heartbeat configuration, set up a drbd device manually. Basically this is configuring drbd in /etc/drbd.conf and letting it synchronize. The exact procedure for configuring drbd is described in the Storage Administration Guide. For now, assume that you configured a resource r0 that may be accessed at the device /dev/drbd0 on both of your cluster nodes.

The drbd resource is an OCF master slave resource. This can be found in the description of the metadata of the drbd RA. However, more important is that there are the actions promote and demote in the actions section of the metadata. These are mandatory for master slave resources and commonly not available to other resources.

For Heartbeat, master slave resources may have multiple masters on different nodes. It is even possible to have a master and slave on the same node. Therefore, configure this resource in a way that there is exactly one master and one slave, each running on different nodes. Do this with the meta attributes of the master_slave resource. Master slave resources are a special kind of clone resources in Heartbeat. Every master and every slave counts as a clone.

<master_slave id="drbd_resource" ordered="false">1
  <meta_attributes>
    <attributes>
      <nvpair id="drbd-nv-1" name="clone_max" value="2"/> 2
      <nvpair id="drbd-nv-2" name="clone_node_max" value="1"/>3
      <nvpair id="drbd-nv-3" name="master_max" value="1"/>4
      <nvpair id="drbd-nv-4" name="master_node_max" value="1"/>5
      <nvpair id="drbd-nv-5" name="notify" value="yes"/>6
    </attributes>
  </meta_attributes>
  <primitive id="drbd_r0" class="ocf" provider="heartbeat" type="drbd">7
    <instance_attributes id="ia-drbd_1">
      <attributes>
        <nvpair id="drbd-nv-5" name="drbd_resource" value="r0"/>8
      </attributes>
    </instance_attributes>
  </primitive>
</master_slave>

1

The master element of this resource is master_slave. The complete resource is later accessed with the ID drbd_resource.

2

clone_max defines how many masters and slaves may be present in the cluster.

3

clone_node_max is the maximum number of clones (masters or slaves) that are allowed to run on a single. node.

4

master_max sets how many masters may be available in the cluster.

5

master_node_max is similar to clone_node_max and defines how many master instances may run on a single node.

6

notify is used to inform the cluster before and after a clone of the master_slave resource is stopped or started. This is used to reconfigure one of the clones to be a master of this resource.

7

The actually working RA inside this master slave resource is the drbd primitive.

8

The most important parameter this resource needs to know about is the name of the drbd resource to handle.

5.2.3.3. NFS Server and IP Address

To make the NFS server always available at the same IP address, use an additional IP address as well as the ones the machines use for their normal operation. This IP address is then assigned to the active NFS server in addition to the system's IP address.

The NFS server and the IP address of the NFS server should always be active on the same machine. In this case, the start sequence is not very important. They may even be started at the same time. These are the typical requirements for a group resource.

Before starting the Heartbeat RA configuration, configure the NFS server with YaST. Do not let the system start the NFS server. Just set up the configuration file. If you want to do that manually, see the manual page exports(5) (man 5 exports. The configuration file is /etc/exports. The NFS server is configured as an LSB resource.

Configure the IP address completely with the Heartbeat RA configuration. No additional modification is necessary in the system. The IP address RA is an OCF RA.

<group id="nfs_group">1
  <primitive id="nfs_resource" class="lsb" type="nfsserver"/>2
  <primitive id="ip_resource" class="ocf" provider="heartbeat" 
    type="IPaddr">3
    <instance_attributes id="ia-ipaddr_1">
      <attributes>
        <nvpair id="ipaddr-nv-1" name="ip" value="10.10.0.1"/>4
      </attributes>
    </instance_attributes>
  </primitive>
</group>

1

In a group resource, there may be several other resources. It must have an ID set.

2

The nfsserver is simple. It is just the LSB script for the NFS server. The service itself must be configured in the system.

3

The IPaddr OCF RA does not need any configuration in the system. It is just configured with the following instance_attributes.

4

There is only one mandatory instance attribute in the IPaddr RA. More possible configuration options are found in the metadata of the RA.

5.3. Configuring Constraints

Having all the resources configured is only part of the job. Even if the clusters knows all needed resources, it might still not be able to handle them correctly. For example, it would be quite useless to try to mount the file system on the slave node of drbd (in fact, this would fail with drbd). To inform the cluster about these things, define constraints.

In Heartbeat, there are three different kinds of constraints available:

  • Locational constraints that define on which nodes a resource may be run (rsc_location).

  • Colocational constraints that tell the cluster which resources may or may not run together on a node (rsc_colocation).

  • Ordering constraints to define the sequence of actions (rsc_order).

5.3.1. Locational Constraints

This type of constraint may be added multiple times for each resource. All rsc_location constraints are evaluated for a given resource. A simple example that increases the probability to run a resource with the ID filesystem_1 on the node with the name earth to 100 would be the following:

<rsc_location id="filesystem_1_location" rsc="filesystem_1">1
   <rule id="pref_filesystem_1" score="100">2
      <expression attribute="#uname" operation="eq" value="earth"/>3
   </rule>
</rsc_location>

1

To take effect, the rsc_location tag must define an rsc attribute. The content of this attribute must be the ID of a resource of the cluster.

2

The score attribute is set by the rule tag depending on the following expression, and is used as a priority to run a resource on a node. The scores are calculated on a per-resource basis and any node with a negative score for a resource can’t run that resource.

3

Whether a rule really is activated, changing the score, depends on the evaluation of an expression. Several different operations are defined and the special attributes #uname and #id may be used in the comparison.

It is also possible to use another rule or a date_expression. For more information, refer to crm.dtd, which is located at /usr/lib/heartbeat/crm.dtd.

5.3.2. Colocational Constraints

The rsc_colocation constraint is used to define what resources should run on the same or on different hosts. It is not possible to give a score other than INFINITY or -INFINITY, defining resources to run together always or never to run together. For example, to run the two resources with the IDs filesystem_resource and nfs_group always on the same host, use the following constraint:

<rsc_colocation 
  id="nfs_on_filesystem"
  to="filesystem_resource"
  from="nfs_group" 
  score="INFINITY"/>

For a master slave configuration, it is necessary to know if the current node is a master in addition to running the resource locally. This can be checked with an additional to_role or from_role attribute.

5.3.3. Ordering Constraints

Sometimes it is necessary to provide an order in which services must start. For example, you cannot mount a file system before the device is available to a system. Ordering constraints can be used to start or stop a service right before or after a different resource meets a special condition, such as being started, stopped, or promoted to master. An ordering constraint looks like the following:

<rsc_order id="nfs_after_filesystem" from="group_nfs" action="start"
      to="filesystem_resource" to_action="start" type="after"/>

With type="after", the action of the from resource is done after the action of the to resource.

5.3.4. Constraints for the Example Configuration

The example used for this chapter is quite useless without additional constraints. It is essential that all resources run on the same machine as the master of the drbd resource. Another thing that is critical is that the drbd resource must be master before any other resource starts. Trying to mount the drbd device when drbd is not master simply fails. The constraints that must be fulfilled look like the following:

  • The file system must always be on the same node as the master of the drbd resource.

    <rsc_colocation id="filesystem_on_master" to="drbd_resource"
           to_role="master" from="filesystem_resource" score="INFINITY"/>
         
  • The file system must be mounted on a node after the drbd resource is promoted to master on this node.

    <rsc_order id="drbd_first" from="filesystem_resource" action="start"
           to="drbd_resource" to_action="promote" type="after"/>
  • The NFS server as well as the IP address start after the file system is mounted.

    <rsc_order id="nfs_second" from="nfs_group" action="start"
           to="filesystem_resource" to_action="start" type="after"/>
  • The NFS server as well as the IP address must be on the same node as the file system.

    <rsc_colocation id="nfs_on_drbd" to="filesystem_resource"
           from="nfs_group" score="INFINITY"/>
  • In addition to that, issue constraint that prevents the NFS server from running on a node where drbd is running in slave mode.

    <rsc_colocation id="nfs_on_slave" to="drbd_resource"
           to_role="slave" from="nfs_group" score="-INFINITY"/>

5.4. Configuring CRM Options

The CRM options define the global behavior of a cluster. In principle, the default values should be acceptable for many environments, but if you want to use special services, like STONITH devices, you must inform the cluster about this. All options of crm_config are made with nvpair and are added to cib.xml. For example, to change the cluster-delay from its default value of 60s to 120s, use the following configuration:

<cluster_property_set>
  <attributes>
     <nvpair id="1" name="cluster-delay" value="120s"/>
  </attributes>
</cluster_property_set>

Write this information to a file and load it to the cluster with the command cibadmin -C -o crm_config -x filename. The following is an overview of all available configuration options:

cluster-delay (interval, default=60s)

This option used to be known as transition_idle_timeout. If no activity is recorded in this time, the transition is deemed failed as are all sent actions that have not yet been confirmed complete. If any operation initiated has an explicit higher time-out, the higher value applies.

symmetric_cluster (boolean, default=TRUE)

If true, resources are permitted to run anywhere by default. Otherwise, explicit constraints must be created to specify where they can run.

stonith_enabled (boolean, default=FALSE)

If true, failed nodes are fenced.

no_quorum_policy (enum, default=stop)
ignore

Pretend to have quorum.

freeze

Do not start any resources not currently in the partition. Resources in the partition may be moved to another node within the partition. Fencing is disabled.

stop

Stop all running resources in the partition. Fencing is disabled.

default_resource_stickiness (integer, default=0)

Select whether resources should prefer to run on the existing node or be moved to a better one?

0

Resources are placed optimally in the system. This may mean they are moved when a better or less-loaded node becomes available. This option is almost equivalent to auto_failback on except that the resource may be moved to nodes other than the one on which it was previously active.

value > 0

Resources prefer to remain in their current location but may be moved if a more suitable node is available. Higher values indicate a stronger preference for resources to stay where they are.

value < 0

Resources prefer to move away from their current location. Higher absolute values indicate a stronger preference for resources to be moved.

INFINITY

Resources always remain in their current locations until forced off because the node is no longer eligible to run the resource (node shutdown, node standby, or configuration change). This option is almost equivalent to auto_failback off except that the resource may be moved to other nodes than the one on which it was previously active.

-INFINITY

Resources always move away from their current location.

is_managed_default (boolean, default=TRUE)

Unless the resource's definition says otherwise:

TRUE

Resources are started, stopped, monitored, and moved as necessary.

FALSE

Resources are not started if stopped, stopped if started, or have any recurring actions scheduled.

stop_orphan_resources (boolean, default=TRUE)

If a resource is found for which there is no definition:

TRUE

Stop the resource.

FALSE

Ignore the resource.

This mostly affects the CRM's behavior when a resource is deleted by an administrator without it first being stopped.

stop_orphan_actions (boolean, default=TRUE)

If a recurring action is found for which there is no definition:

TRUE

Stop the action.

FALSE

Ignore the action.

All available options to the crm_config are summarized in Policy Engine(7).

5.5. For More Information

http://linux-ha.org

Homepage of High Availability Linux


Heartbeat Guide