Contents
Abstract
The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache web server or a database. From the user's point of view, the services do something specific when ordered to do so. To the cluster, however, they are just resources which may be started or stopped—the nature of the service is irrelevant to the cluster.
In this chapter, we will introduce some basic concepts you need to know when configuring resources and administering your cluster. The following chapters show you how to execute the main configuration and administration tasks with each of the management tools the High Availability Extension provides.
Global cluster options control how the cluster behaves when confronted with certain situations. They are grouped into sets and can be viewed and modified with the cluster management tools like Pacemaker GUI and the crm shell. The predefined values can be kept in most cases. However, to make key functions of your cluster work correctly, you need to adjust the following parameters after basic cluster setup:
Learn how to adjust those parameters with the GUI in Procedure 5.1, “Modifying Global Cluster Options”. If you prefer the command line approach, see Section 6.2, “Configuring Global Cluster Options”.
This global option defines what to do when the cluster does not have quorum (no majority of nodes is part of the partition).
Allowed values are:
ignore
The quorum state does not influence the cluster behavior at all, resource management is continued.
This setting is useful for the following scenarios:
Two-node clusters: Since a single node failure would always result in a loss of majority, usually you want the cluster to carry on regardless. Resource integrity is ensured using fencing, which also prevents split-brain scenarios.
Resource-driven clusters: For local clusters with redundant communication channels, a split-brain scenario only has a certain probability. Thus, a loss of communication with a node most likely indicates that the node has crashed, and that the surviving nodes should recover and start serving the resources again.
If no-quorum-policy is set to
ignore, a 4-node cluster can sustain concurrent
failure of three nodes before service is lost, whereas with the
other settings, it would lose quorum after concurrent failure of
two nodes.
freeze
If quorum is lost, the cluster freezes. Resource management is continued: running resources are not stopped (but possibly restarted in response to monitor events), but no further resources are started within the affected partition.
This setting is recommended for clusters where certain resources
depend on communication with other nodes (for example, OCFS2 mounts).
In this case, the default setting
no-quorum-policy=stop is not useful, as it would
lead to the following scenario: Stopping those resources would not be
possible while the peer nodes are unreachable. Instead, an attempt to
stop them would eventually time out and case a stop
failure, triggering escalated recovery and fencing.
stop (default value)If quorum is lost, all resources in the affected cluster partition are stopped in an orderly fashion.
suicide
Fence all nodes in the affected cluster partition.
This global option defines if to apply fencing, allowing STONITH
devices to shoot failed nodes and nodes with resources that cannot be
stopped. By default, this global option is set to
true, because for normal cluster operation it is
necessary to use STONITH devices. According to the default value, the
cluster will refuse to start any resources if no STONITH resources
have been defined.
If you need to disable fencing for any reasons, set
stonith-enabled to false.
For an overview of all global cluster options and their default values, see Pacemaker 1.0—Configuration Explained, available from http://clusterlabs.org/wiki/Documentation. Refer to section Available Cluster Options.
As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.
Before you can use a resource in the cluster, it must be set up. For example, if you want to use an Apache server as a cluster resource, set up the Apache server first and complete the Apache configuration before starting the respective resource in your cluster.
If a resource has specific environment requirements, make sure they are present and identical on all cluster nodes. This kind of configuration is not managed by the High Availability Extension. You must do this yourself.
![]() | Do Not Touch Services Managed by the Cluster |
|---|---|
When managing a resource with the High Availability Extension, the same resource must not be started or stopped otherwise (outside of the cluster, for example manually or on boot or reboot). The High Availability Extension software is responsible for all service start or stop actions. | |
However, if you want to check if the service is configured properly, start it manually, but make sure that it is stopped again before High Availability takes over.
After having configured the resources in the cluster, use the cluster management tools to start, stop, clean up, remove or migrate any resources manually. For details how to do so, refer to Chapter 5, Configuring and Managing Cluster Resources (GUI) or to Chapter 6, Configuring and Managing Cluster Resources (Command Line).
For each cluster resource you add, you need to define the standard that the resource agent conforms to. Resource agents abstract the services they provide and present an accurate status to the cluster, which allows the cluster to be non-committal about the resources it manages. The cluster relies on the resource agent to react appropriately when given a start, stop or monitor command.
Typically, resource agents come in the form of shell scripts. The High Availability Extension supports the following classes of resource agents:
Heartbeat version 1 came with its own style of resource agents. As many people have written their own agents based on its conventions, these resource agents are still supported. However, it is recommended to migrate your configurations to High Availability OCF RAs if possible.
LSB resource agents are generally provided by the operating
system/distribution and are found in
/etc/init.d. To be used with the cluster, they
must conform to the LSB init script specification. For example, they
must have several actions implemented, which are, at minimum,
start, stop,
restart, reload,
force-reload, and status. For
more information, see
http://ldn.linuxfoundation.org/lsb/lsb4-resource-page%23Specification.
The configuration of those services is not standardized. If you
intend to use an LSB script with High Availability, make sure that you understand
how the relevant script is configured. Often you can find information
about this in the documentation of the relevant package in
/usr/share/doc/packages/.
PACKAGENAME
OCF RA agents are best suited for use with High Availability, especially when you
need master resources or special monitoring abilities. The agents are
generally located in
/usr/lib/ocf/resource.d/.
Their functionality is similar to that of LSB scripts. However, the
configuration is always done with environmental variables which allow
them to accept and process parameters easily. The OCF specification
(as it relates to resource agents) can be found at
http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup.
OCF specifications have strict definitions of which exit codes must
be returned by actions, see Section 8.3, “OCF Return Codes and Failure Recovery”. The
cluster follows these specifications exactly. For a detailed list of
all available OCF RAs, refer to
Chapter 19, HA OCF Agents.
provider/
All OCF Resource Agents are required to have at least the actions
start, stop,
status, monitor, and
meta-data. The meta-data action
retrieves information about how to configure the agent. For example,
if you want to know more about the IPaddr agent by
the provider heartbeat, use the following command:
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/IPaddr meta-data
The output is information in XML format, including several sections (general description, available parameters, available actions for the agent).
This class is used exclusively for fencing related resources. For more information, see Chapter 9, Fencing and STONITH.
The agents supplied with the High Availability Extension are written to OCF specifications.
The following types of resources can be created:
A primitive resource, the most basic type of a resource.
Learn how to create primitive resources with the GUI in Procedure 5.2, “Adding Primitive Resources”. If you prefer the command line approach, see Section 6.3.1, “Creating Cluster Resources”.
Groups contain a set of resources that need to be located together, started sequentially and stopped in the reverse order. For more information, refer to Section 4.2.4.1, “Groups”.
Clones are resources that can be active on multiple hosts. Any resource can be cloned, provided the respective resource agent supports it. For more information, refer to Section 4.2.4.2, “Clones”.
Masters are a special type of clone resources, they can have multiple modes. For more information, refer to Section 4.2.4.3, “Masters”.
Whereas primitives are the simplest kind of resources and therefore easy to configure, you will probably also need more advanced resource types for cluster configuration, such as groups, clones or masters.
Some cluster resources are dependent on other components or resources, and require that each component or resource starts in a specific order and runs together on the same server. To simplify this configuration, you can use groups.
Example 4.1. Resource Group for a Web Server
An example of a resource group would be a Web server that requires an IP address and a file system. In this case, each component is a separate cluster resource that is combined into a cluster resource group. The resource group would then run on a server or servers, and in case of a software or hardware malfunction, fail over to another server in the cluster the same as an individual cluster resource.
Groups have the following properties:
Resources are started in the order they appear in and stopped in the reverse order.
If a resource in the group cannot run anywhere, then none of the resources located after that resource in the group is allowed to run.
Groups may only contain a collection of primitive cluster resources. Groups must contain at least one resource, otherwise the configuration is not valid. To refer to the child of a group resource, use the child’s ID instead of the group’s ID.
Although it is possible to reference the group’s children in constraints, it is usually preferable to use the group’s name instead.
Stickiness is additive in groups. Every active
member of the group will contribute its stickiness value to the
group’s total. So if the default
resource-stickiness is 100 and
a group has seven members (five of which are active), then the
group as a whole will prefer its current location with a score of
500.
To enable resource monitoring for a group, you must configure monitoring separately for each resource in the group that you want monitored.
Learn how to create groups with the GUI in Procedure 5.12, “Adding a Resource Group”. If you prefer the command line approach, see Section 6.3.9, “Configuring a Cluster Resource Group”.
You may want certain resources to run simultaneously on multiple nodes in your cluster. To do this you must configure a resource as a clone. Examples of resources that might be configured as clones include STONITH and cluster file systems like OCFS2. You can clone any resource provided. This is supported by the resource’s Resource Agent. Clone resources may even be configured differently depending on which nodes they are hosted.
There are three types of resource clones:
These are the simplest type of clones. They behave identically anywhere they are running. Because of this, there can only be one instance of an anonymous clone active per machine.
These resources are distinct entities. An instance of the clone running on one node is not equivalent to another instance on another node; nor would any two instances on the same node be equivalent.
Active instances of these resources are divided into two states, active and passive. These are also sometimes referred to as primary and secondary, or master and slave. Stateful clones can be either anonymous or globally unique. See also Section 4.2.4.3, “Masters”.
Clones must contain exactly one group or one regular resource.
When configuring resource monitoring or constraints, masters have different requirements than simple resources. For details, see Pacemaker 1.0—Configuration Explained, available from http://clusterlabs.org/wiki/Documentation. Refer to section Clones - Resources That Should be Active on Multiple Hosts.
Learn how to create clones with the GUI in Procedure 5.14, “Adding or Modifying Clones”. If you prefer the command line approach, see Section 6.3.10, “Configuring a Clone Resource”.
Masters are a specialization of clones that allow the instances to be
in one of two operating modes (master or
slave). Masters must contain exactly one group or
one regular resource.
When configuring resource monitoring or constraints, masters have different requirements than simple resources. For details, see Pacemaker 1.0—Configuration Explained, available from http://clusterlabs.org/wiki/Documentation. Refer to section Multi-state - Resources That Have Multiple Modes.
For each resource you add, you can define options. Options are used by the cluster to decide how your resource should behave—they tell the CRM how to treat a specific resource. Resource options can be set with the crm_resource --meta command or with the GUI as described in Procedure 5.3, “Adding or Modifying Meta and Instance Attributes”.
Table 4.1. Options for a Primitive Resource
|
Option |
Description |
|---|---|
|
|
If not all resources can be active, the cluster will stop lower priority resources in order to keep higher priority ones active. |
|
|
In what state should the cluster attempt to keep this resource?
Allowed values: |
|
|
Is the cluster allowed to start and stop the resource? Allowed
values: |
|
|
How much does the resource prefer to stay where it is? Defaults to
the value of |
|
|
How many failures should occur for this resource on a node before
making the node ineligible to host this resource? Default:
|
|
|
What should the cluster do if it ever finds the resource active on
more than one node? Allowed values: |
|
|
How many seconds to wait before acting as if the failure had not
occurred (and potentially allowing the resource back to the node on
which it failed)? Default: |
|
|
Allow resource migration for resources which support
|
The scripts of all resource classes can be given parameters which
determine how they behave and which instance of a service they control.
If your resource agent supports parameters, you can add them with the
crm_resource command or with the GUI as described in
Procedure 5.3, “Adding or Modifying Meta and Instance Attributes”. In the
crm command line utility, instance attributes are
called params. The list of instance attributes
supported by an OCF script can be found by executing the following
command as root:
crm ra info [class:[provider:]]resource_agentor, even shorter:
crm ra info resource_agentThe output lists all the supported attributes, their purpose and default values.
For example, the command
crm ra info Ipaddr
returns the following output:
Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.
Parameters (* denotes required, [] the default):
ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".
nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought
online.
If left empty, the script will try and determine this from the
routing table.
Do NOT specify an alias interface in the form eth0:1 or anything here;
rather, specify the base interface only.
cidr_netmask (string): Netmask
The netmask for the interface in CIDR format. (ie, 24), or in
dotted quad notation 255.255.255.0).
If unspecified, the script will also try to determine this from the
routing table.
broadcast (string): Broadcast address
Broadcast address associated with the IP. If left empty, the script will
determine this from the netmask.
iflabel (string): Interface label
You can specify an additional label for your IP address here.
lvs_support (boolean, [false]): Enable support for LVS DR
Enable support for LVS Direct Routing configurations. In case a IP
address is stopped, only move it to the loopback device to allow the
local node to continue to service requests, but no longer advertise it
on the network.
local_stop_script (string):
Script called when the IP is released
local_start_script (string):
Script called when the IP is added
ARP_INTERVAL_MS (integer, [500]): milliseconds between gratuitous ARPs
milliseconds between ARPs
ARP_REPEAT (integer, [10]): repeat count
How many gratuitous ARPs to send out when bringing up a new address
ARP_BACKGROUND (boolean, [yes]): run in background
run in background (no longer any reason to do this)
ARP_NETMASK (string, [ffffffffffff]): netmask for ARP
netmask for ARP - in nonstandard hexadecimal format.
Operations' defaults (advisory minimum):
start timeout=90
stop timeout=100
monitor_0 interval=5s timeout=20s![]() | Instance Attributes for Groups, Clones or Masters |
|---|---|
Note that groups, clones and masters do not have instance attributes. However, any instance attributes set will be inherited by the group's, clone's or master's children. | |
By default, the cluster will not ensure that your resources are still healthy. To instruct the cluster to do this, you need to add a monitor operation to the resource’s definition. Monitor operations can be added for all classes or resource agents. For more information, refer to Section 4.3, “Resource Monitoring”.
Table 4.2. Resource Operations
|
Operation |
Description |
|---|---|
|
|
Your name for the action. Must be unique. (The ID is not shown). |
|
|
The action to perform. Common values: |
|
|
How frequently to perform the operation. Unit: seconds |
|
|
How long to wait before declaring the action has failed. |
|
|
What conditions need to be satisfied before this action occurs.
Allowed values: |
|
|
The action to take if this action ever fails. Allowed values:
|
|
|
If |
|
|
Run the operation only if the resource has this role. |
|
|
Can be set either globally or for individual resources. Makes the CIB reflect the state of “in-flight” operations on resources. |
|
|
Description of the operation. |
If you want to ensure that a resource is running, you must configure resource monitoring for it.
If the resource monitor detects a failure, the following takes place:
Log file messages are generated, according to the configuration
specified in the logging section of
/etc/corosync/corosync.conf. By default, the logs
are written to syslog, usually /var/log/messages.
The failure is reflected in the cluster management tools (Pacemaker GUI, HA Web Konsole crm_mon), and in the CIB status section.
The cluster initiates noticeable recovery actions which may include stopping the resource to repair the failed state and restarting the resource locally or on another node. The resource also may not be restarted at all, depending on the configuration and state of the cluster.
If you do not configure resource monitoring, resource failures after a successful start will not be communicated, and the cluster will always show the resource as healthy.
Learn how to add monitor operations to resources with the GUI in Procedure 5.11, “Adding or Modifying Monitor Operations”. If you prefer the command line approach, see Section 6.3.8, “Configuring Resource Monitoring”.
Having all the resources configured is only part of the job. Even if the cluster knows all needed resources, it might still not be able to handle them correctly. Resource constraints let you specify which cluster nodes resources can run on, what order resources will load, and what other resources a specific resource is dependent on.
There are three different kinds of constraints available:
Locational constraints that define on which nodes a resource may be run, may not be run or is preferred to be run.
Collocational constraints that tell the cluster which resources may or may not run together on a node.
Ordering constraints to define the sequence of actions.
For more information on configuring constraints and detailed background information about the basic concepts of ordering and collocation, refer to the following documents available at http://clusterlabs.org/wiki/Documentation:
Pacemaker 1.0—Configuration Explained , chapter Resource Constraints
Collocation Explained
Ordering Explained
Learn how to add the various kinds of constraints with the GUI in Section 5.3.3, “Configuring Resource Constraints”. If you prefer the command line approach, see Section 6.3.4, “Configuring Resource Constraints”.
When defining constraints, you also need to deal with scores. Scores of all kinds are integral to how the cluster works. Practically everything from migrating a resource to deciding which resource to stop in a degraded cluster is achieved by manipulating scores in some way. Scores are calculated on a per-resource basis and any node with a negative score for a resource cannot run that resource. After calculating the scores for a resource, the cluster then chooses the node with the highest score.
INFINITY is currently defined as
1,000,000. Additions or subtractions with it stick to
the following three basic rules:
Any value + INFINITY = INFINITY
Any value - INFINITY = -INFINITY
INFINITY - INFINITY = -INFINITY
When defining resource constraints, you specify a score for each constraint. The score indicates the value you are assigning to this resource constraint. Constraints with higher scores are applied before those with lower scores. By creating additional location constraints with different scores for a given resource, you can specify an order for the nodes that a resource will fail over to.
A resource will be automatically restarted if it fails. If that cannot
be achieved on the current node, or it fails N times
on the current node, it will try to fail over to another node. Each time
the resource fails, its failcount is raised. You can define a number of
failures for resources (a migration-threshold), after
which they will migrate to a new node. If you have more than two nodes
in your cluster, the node a particular resource fails over to is chosen
by the High Availability software.
However, you can specify the node a resource will fail over to by
configuring one or several location constraints and a
migration-threshold for that resource. For detailed
instructions how to achieve this with the GUI, refer to
Section 5.3.4, “Specifying Resource Failover Nodes”. If you prefer the
command line approach, see
Section 6.3.5, “Specifying Resource Failover Nodes”.
Example 4.2. Migration Threshold—Process Flow
For example, let us assume you have configured a location constraint
for resource r1 to preferably run on
node1. If it fails there,
migration-threshold is checked and compared to the
failcount. If failcount >= migration-threshold then the resource is
migrated to the node with the next best preference.
By default, once the threshold has been reached, the node will no
longer be allowed to run the failed resource until the resource's
failcount is reset. This can be done manually by the cluster
administrator or by setting a failure-timeout option
for the resource.
For example, a setting of migration-threshold=2 and
failure-timeout=60s would cause the resource to
migrate to a new node after two failures and potentially allow it to
move back (depending on the stickiness and constraint scores) after one
minute.
There are two exceptions to the migration threshold concept, occurring when a resource either fails to start or fails to stop:
Start failures set the failcount to INFINITY and
thus always cause an immediate migration.
Stop failures cause fencing (when stonith-enabled
is set to true which is the default).
In case there is no STONITH resource defined (or
stonith-enabled is set to
false), the resource will not migrate at all.
For details on using migration thresholds and resetting failcounts, refer to Section 5.3.4, “Specifying Resource Failover Nodes”. If you prefer the command line approach, see Section 6.3.5, “Specifying Resource Failover Nodes”.
A resource might fail back to its original node when that node is back
online and in the cluster. If you want to prevent a resource from
failing back to the node it was running on prior to failover, or if you
want to specify a different node for the resource to fail back to, you
must change its resource stickiness value. You can
either specify resource stickiness when you are creating a resource, or
afterwards.
Consider the following implications when specifying resource stickiness values:
0:This is the default. The resource will be placed optimally in the system. This may mean that it is moved when a “better” or less loaded node becomes available. This option is almost equivalent to automatic failback, except that the resource may be moved to a node that is not the one it was previously active on.
0:The resource will prefer to remain in its current location, but may be moved if a more suitable node is available. Higher values indicate a stronger preference for a resource to stay where it is.
0:The resource prefers to move away from its current location. Higher absolute values indicate a stronger preference for a resource to be moved.
INFINITY:
The resource will always remain in its current location unless forced
off because the node is no longer eligible to run the resource (node
shutdown, node standby, reaching the
migration-threshold, or configuration change).
This option is almost equivalent to completely disabling automatic
failback.
-INFINITY:The resource will always move away from its current location.
Not all resources are equal. Some, such as Xen guests, require that the node hosting them meets their capacity requirements. If resources are placed such that their combined need exceed the provided capacity, the resources diminish in performance (or even fail).
To take this into account, the High Availability Extension allows you to specify the following parameters:
The capacity a certain node provides.
The capacity a certain resource requires.
An overall strategy for placement of resources.
Currently, these settings are static and must be configured by the administrator—they are not dynamically discovered or adjusted.
Learn how to configure these settings with the GUI in Section 5.3.6, “Configuring Placement of Resources Based on Load Impact”. If you prefer the command line approach, see Section 6.3.7, “Configuring Placement of Resources Based on Load Impact”.
A node is considered eligible for a resource if it has sufficient free capacity to satisfy the resource's requirements. The nature of the required or provided capacities is completely irrelevant for the High Availability Extension, it just makes sure that all capacity requirements of a resource are satisfied before moving a resource to a node.
To configure the resource's requirements and the capacity a node provides, use utilization attributes. You can name the utilization attributes according to your preferences and define as many name/value pairs as your configuration needs. However, the attribute's values must be integers.
The placement strategy can be specified with the
placement-strategy property (in the global cluster
options). The following values are available:
default (default value)Utilization values are not considered at all. Resources are allocated according to location scoring. If scores are equal, resources are evenly distributed across nodes.
utilization
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resources's requirements. However, load-balancing is still done based on the number of resources allocated to a node.
minimal
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resource's requirements. An attempt is made to concentrate the resources on as few nodes as possible (in order to achieve power savings on the remaining nodes).
balanced
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resource's requirements. An attempt is made to distribute the resources evenly, thus optimizing resource performance.
![]() | Configuring Resource Priorities |
|---|---|
The available placement strategies are best-effort—they do not yet use complex heuristic solvers to always reach optimum allocation results. Thus, set your resource priorities in a way that makes sure that your most important resources are scheduled first. | |
Example 4.3. Example Configuration for Load-Balanced Placing
The following example demonstrates a three-node cluster of equal nodes, with four virtual machines.
node node1 utilization memory="4000"
node node2 utilization memory="4000"
node node3 utilization memory="4000"
primitive xenA ocf:heartbeat:Xen utilization memory="3500" \
meta priority="10"
primitive xenB ocf:heartbeat:Xen utilization memory="2000" \
meta priority="1"
primitive xenC ocf:heartbeat:Xen utilization memory="2000" \
meta priority="1"
primitive xenD ocf:heartbeat:Xen utilization memory="1000" \
meta priority="5"
property placement-strategy="minimal"
With all three nodes up, resource xenA will be
placed onto a node first, followed by xenD.
xenB and xenC would either be
allocated together or one of them with xenD.
If one node failed, too little total memory would be available to host
them all. xenA would be ensured to be allocated, as
would xenD. However, only one of the remaining
resources xenB or xenC could
still be placed. Since their priority is equal, the result would still
be open. To resolve this ambiguity as well, you would need to set a
higher priority for either one.
Home page of Pacemaker, the cluster resource manager shipped with the High Availability Extension.
Home page of the The High Availability Linux Project.
CRM Command Line Interface : Introduction to the crm command line tool.
Pacemaker 1.0—Configuration Explained : Explains the concepts used to configure Pacemaker. Contains comprehensive and very detailed information for reference.