Contents
Abstract
The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache web server or a database. From the user's point of view, the services do something specific when ordered to do so. To the cluster, however, they are just resources which may be started or stopped—the nature of the service is irrelevant to the cluster.
In this chapter, we will introduce some basic concepts you need to know when configuring resources and administering your cluster. The following chapters show you how to execute the main configuration and administration tasks with each of the management tools the High Availability Extension provides.
Global cluster options control how the cluster behaves when confronted with certain situations. They are grouped into sets and can be viewed and modified with the cluster management tools like Pacemaker GUI and the crm shell. The predefined values can be kept in most cases. However, to make key functions of your cluster work correctly, you need to adjust the following parameters after basic cluster setup:
Learn how to adjust those parameters with the cluster management tools of your choice:
no-quorum-policy¶This global option defines what to do when the cluster does not have quorum (no majority of nodes is part of the partition).
Allowed values are:
ignore
The quorum state does not influence the cluster behavior at all, resource management is continued.
This setting is useful for the following scenarios:
Two-node clusters: Since a single node failure would always result in a loss of majority, usually you want the cluster to carry on regardless. Resource integrity is ensured using fencing, which also prevents split brain scenarios.
Resource-driven clusters: For local clusters with redundant communication channels, a split brain scenario only has a certain probability. Thus, a loss of communication with a node most likely indicates that the node has crashed, and that the surviving nodes should recover and start serving the resources again.
If no-quorum-policy is set to
ignore, a 4-node cluster can sustain concurrent
failure of three nodes before service is lost, whereas with the
other settings, it would lose quorum after concurrent failure of
two nodes.
freeze
If quorum is lost, the cluster freezes. Resource management is continued: running resources are not stopped (but possibly restarted in response to monitor events), but no further resources are started within the affected partition.
This setting is recommended for clusters where certain resources
depend on communication with other nodes (for example, OCFS2 mounts).
In this case, the default setting
no-quorum-policy=stop is not useful, as it would
lead to the following scenario: Stopping those resources would not be
possible while the peer nodes are unreachable. Instead, an attempt to
stop them would eventually time out and cause a stop
failure, triggering escalated recovery and fencing.
stop (default value)If quorum is lost, all resources in the affected cluster partition are stopped in an orderly fashion.
suicide
Fence all nodes in the affected cluster partition.
stonith-enabled¶
This global option defines if to apply fencing, allowing STONITH
devices to shoot failed nodes and nodes with resources that cannot be
stopped. By default, this global option is set to
true, because for normal cluster operation it is
necessary to use STONITH devices. According to the default value, the
cluster will refuse to start any resources if no STONITH resources
have been defined.
If you need to disable fencing for any reasons, set
stonith-enabled to false.
![]() | No Support Without STONITH |
|---|---|
A cluster without STONITH is not supported. | |
For an overview of all global cluster options and their default values, see Pacemaker Explained, available from . Refer to section Available Cluster Options.
As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.
Before you can use a resource in the cluster, it must be set up. For example, if you want to use an Apache server as a cluster resource, set up the Apache server first and complete the Apache configuration before starting the respective resource in your cluster.
If a resource has specific environment requirements, make sure they are present and identical on all cluster nodes. This kind of configuration is not managed by the High Availability Extension. You must do this yourself.
![]() | Do Not Touch Services Managed by the Cluster |
|---|---|
When managing a resource with the High Availability Extension, the same resource must not be started or stopped otherwise (outside of the cluster, for example manually or on boot or reboot). The High Availability Extension software is responsible for all service start or stop actions. However, if you want to check if the service is configured properly, start it manually, but make sure that it is stopped again before High Availability takes over. | |
After having configured the resources in the cluster, use the cluster management tools to start, stop, clean up, remove or migrate any resources manually. For details how to do so with your preferred cluster management tool:
For each cluster resource you add, you need to define the standard that the resource agent conforms to. Resource agents abstract the services they provide and present an accurate status to the cluster, which allows the cluster to be non-committal about the resources it manages. The cluster relies on the resource agent to react appropriately when given a start, stop or monitor command.
Typically, resource agents come in the form of shell scripts. The High Availability Extension supports the following classes of resource agents:
LSB resource agents are generally provided by the operating
system/distribution and are found in
/etc/init.d. To be used with the cluster, they
must conform to the LSB init script specification. For example, they
must have several actions implemented, which are, at minimum,
start, stop,
restart, reload,
force-reload, and status. For
more information, see
http://refspecs.linuxbase.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html.
The configuration of those services is not standardized. If you
intend to use an LSB script with High Availability, make sure that you understand
how the relevant script is configured. Often you can find information
about this in the documentation of the relevant package in
/usr/share/doc/packages/.
PACKAGENAME
OCF RA agents are best suited for use with High Availability, especially when you
need master resources or special monitoring abilities. The agents are
generally located in
/usr/lib/ocf/resource.d/.
Their functionality is similar to that of LSB scripts. However, the
configuration is always done with environmental variables which allow
them to accept and process parameters easily. The OCF specification
(as it relates to resource agents) can be found at
http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup.
OCF specifications have strict definitions of which exit codes must
be returned by actions, see Section 8.3, “OCF Return Codes and Failure Recovery”. The
cluster follows these specifications exactly. For a detailed list of
all available OCF RAs, refer to
Chapter 21, HA OCF Agents.
provider/
All OCF Resource Agents are required to have at least the actions
start, stop,
status, monitor, and
meta-data. The meta-data action
retrieves information about how to configure the agent. For example,
if you want to know more about the IPaddr agent by
the provider heartbeat, use the following command:
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/IPaddr meta-data
The output is information in XML format, including several sections (general description, available parameters, available actions for the agent).
This class is used exclusively for fencing related resources. For more information, see Chapter 9, Fencing and STONITH.
The agents supplied with the High Availability Extension are written to OCF specifications.
The following types of resources can be created:
A primitive resource, the most basic type of a resource.
Learn how to create primitive resources with your preferred cluster management tool:
Pacemaker GUI: Procedure 6.2, “Adding Primitive Resources”
crm shell: Section 7.3.1, “Creating Cluster Resources”
Groups contain a set of resources that need to be located together, started sequentially and stopped in the reverse order. For more information, refer to Section 4.2.5.1, “Groups”.
Clones are resources that can be active on multiple hosts. Any resource can be cloned, provided the respective resource agent supports it. For more information, refer to Section 4.2.5.2, “Clones”.
Masters are a special type of clone resources, they can have multiple modes. For more information, refer to Section 4.2.5.3, “Masters”.
If you want to create lots of resources with similar configurations, defining a resource template is the easiest way. Once defined, it can be referenced in primitives—or in certain types of constraints, as described in Section 4.5.3, “Resource Templates and Constraints”.
If a template is referenced in a primitive, the primitive will inherit all operations, instance attributes (parameters), meta attributes, and utilization attributes defined in the template. Additionally, you can define specific operations or attributes for your primitive. If any of these are defined in both the template and the primitive, the values defined in the primitive will take precedence over the ones defined in the template.
Learn how to define resource templates with your preferred cluster configuration tool:
Whereas primitives are the simplest kind of resources and therefore easy to configure, you will probably also need more advanced resource types for cluster configuration, such as groups, clones or masters.
Some cluster resources are dependent on other components or resources and require that each component or resource starts in a specific order and runs together on the same server with resources it depends on. To simplify this configuration, you can use groups.
Example 4.1. Resource Group for a Web Server¶
An example of a resource group would be a Web server that requires an IP address and a file system. In this case, each component is a separate cluster resource that is combined into a cluster resource group. The resource group would then run on a server or servers, and in case of a software or hardware malfunction, fail over to another server in the cluster the same as an individual cluster resource.
Groups have the following properties:
Resources are started in the order they appear in and stopped in the reverse order.
If a resource in the group cannot run anywhere, then none of the resources located after that resource in the group is allowed to run.
Groups may only contain a collection of primitive cluster resources. Groups must contain at least one resource, otherwise the configuration is not valid. To refer to the child of a group resource, use the child’s ID instead of the group’s ID.
Although it is possible to reference the group’s children in constraints, it is usually preferable to use the group’s name instead.
Stickiness is additive in groups. Every active
member of the group will contribute its stickiness value to the
group’s total. So if the default
resource-stickiness is 100 and
a group has seven members (five of which are active), then the
group as a whole will prefer its current location with a score of
500.
To enable resource monitoring for a group, you must configure monitoring separately for each resource in the group that you want monitored.
Learn how to create groups with your preferred cluster management tool:
Pacemaker GUI: Procedure 6.13, “Adding a Resource Group”
crm shell: Section 7.3.9, “Configuring a Cluster Resource Group”
You may want certain resources to run simultaneously on multiple nodes in your cluster. To do this you must configure a resource as a clone. Examples of resources that might be configured as clones include STONITH and cluster file systems like OCFS2. You can clone any resource provided. This is supported by the resource’s Resource Agent. Clone resources may even be configured differently depending on which nodes they are hosted.
There are three types of resource clones:
These are the simplest type of clones. They behave identically anywhere they are running. Because of this, there can only be one instance of an anonymous clone active per machine.
These resources are distinct entities. An instance of the clone running on one node is not equivalent to another instance on another node; nor would any two instances on the same node be equivalent.
Active instances of these resources are divided into two states, active and passive. These are also sometimes referred to as primary and secondary, or master and slave. Stateful clones can be either anonymous or globally unique. See also Section 4.2.5.3, “Masters”.
Clones must contain exactly one group or one regular resource.
When configuring resource monitoring or constraints, masters have different requirements than simple resources. For details, see Pacemaker Explained, available from . Refer to section Clones - Resources That Get Active on Multiple Hosts.
Learn how to create clones with your preferred cluster management tool:
Pacemaker GUI: Procedure 6.15, “Adding or Modifying Clones”
crm shell: Section 7.3.10, “Configuring a Clone Resource”.
Masters are a specialization of clones that allow the instances to be
in one of two operating modes (master or
slave). Masters must contain exactly one group or
one regular resource.
When configuring resource monitoring or constraints, masters have different requirements than simple resources. For details, see Pacemaker Explained, available from . Refer to section Multi-state - Resources That Have Multiple Modes.
For each resource you add, you can define options. Options are used by the cluster to decide how your resource should behave—they tell the CRM how to treat a specific resource. Resource options can be set with the crm_resource --meta command or with the Pacemaker GUI as described in Procedure 6.3, “Adding or Modifying Meta and Instance Attributes”. Alternatively, use Hawk: Procedure 5.4, “Adding Primitive Resources”.
Table 4.1. Options for a Primitive Resource
|
Option |
Description |
Default |
|---|---|---|
|
|
If not all resources can be active, the cluster will stop lower priority resources in order to keep higher priority ones active. |
|
|
|
In what state should the cluster attempt to keep this resource?
Allowed values: |
|
|
|
Is the cluster allowed to start and stop the resource? Allowed
values: |
|
|
|
How much does the resource prefer to stay where it is? Defaults to
the value of |
calculated |
|
|
How many failures should occur for this resource on a node before making the node ineligible to host this resource? |
|
|
|
What should the cluster do if it ever finds the resource active on
more than one node? Allowed values: |
|
|
|
How many seconds to wait before acting as if the failure had not occurred (and potentially allowing the resource back to the node on which it failed)? |
|
|
|
Allow resource migration for resources which support
|
|
The scripts of all resource classes can be given parameters which
determine how they behave and which instance of a service they control.
If your resource agent supports parameters, you can add them with the
crm_resource command or with the GUI as described in
Procedure 6.3, “Adding or Modifying Meta and Instance Attributes”. Alternatively, use
Hawk: Procedure 5.4, “Adding Primitive Resources”. In the
crm command line utility and in Hawk, instance
attributes are called params or
Parameter, respectively. The list of instance
attributes supported by an OCF script can be found by executing the
following command as root:
crm ra info [class:[provider:]]resource_agentor (without the optional parts):
crm ra info resource_agentThe output lists all the supported attributes, their purpose and default values.
For example, the command
crm ra info IPaddr
returns the following output:
Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.
Parameters (* denotes required, [] the default):
ip* (string): IPv4 address
The IPv4 address to be configured in dotted quad notation, for example
"192.168.1.1".
nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought
online.
If left empty, the script will try and determine this from the
routing table.
Do NOT specify an alias interface in the form eth0:1 or anything here;
rather, specify the base interface only.
cidr_netmask (string): Netmask
The netmask for the interface in CIDR format. (ie, 24), or in
dotted quad notation 255.255.255.0).
If unspecified, the script will also try to determine this from the
routing table.
broadcast (string): Broadcast address
Broadcast address associated with the IP. If left empty, the script will
determine this from the netmask.
iflabel (string): Interface label
You can specify an additional label for your IP address here.
lvs_support (boolean, [false]): Enable support for LVS DR
Enable support for LVS Direct Routing configurations. In case a IP
address is stopped, only move it to the loopback device to allow the
local node to continue to service requests, but no longer advertise it
on the network.
local_stop_script (string):
Script called when the IP is released
local_start_script (string):
Script called when the IP is added
ARP_INTERVAL_MS (integer, [500]): milliseconds between gratuitous ARPs
milliseconds between ARPs
ARP_REPEAT (integer, [10]): repeat count
How many gratuitous ARPs to send out when bringing up a new address
ARP_BACKGROUND (boolean, [yes]): run in background
run in background (no longer any reason to do this)
ARP_NETMASK (string, [ffffffffffff]): netmask for ARP
netmask for ARP - in nonstandard hexadecimal format.
Operations' defaults (advisory minimum):
start timeout=90
stop timeout=100
monitor_0 interval=5s timeout=20s![]() | Instance Attributes for Groups, Clones or Masters |
|---|---|
Note that groups, clones and masters do not have instance attributes. However, any instance attributes set will be inherited by the group's, clone's or master's children. | |
By default, the cluster will not ensure that your resources are still healthy. To instruct the cluster to do this, you need to add a monitor operation to the resource’s definition. Monitor operations can be added for all classes or resource agents. For more information, refer to Section 4.3, “Resource Monitoring”.
Table 4.2. Resource Operation Properties
|
Operation |
Description |
|---|---|
|
|
Your name for the action. Must be unique. (The ID is not shown). |
|
|
The action to perform. Common values: |
|
|
How frequently to perform the operation. Unit: seconds |
|
|
How long to wait before declaring the action has failed. |
|
|
What conditions need to be satisfied before this action occurs.
Allowed values: |
|
|
The action to take if this action ever fails. Allowed values:
|
|
|
If |
|
|
Run the operation only if the resource has this role. |
|
|
Can be set either globally or for individual resources. Makes the CIB reflect the state of “in-flight” operations on resources. |
|
|
Description of the operation. |
Timeouts values for resources can be influenced by the following parameters:
default-action-timeout (global cluster option),
op_defaults (global defaults for operations),
a specific timeout value defined in a resource template,
a specific timeout value defined for a resource.
Of the default values, op_defaults takes precedence
over default-action-timeout. If a specific value is
defined for a resource, it always takes precedence over any of the
defaults (and over a value defined in a resource template).
For information on how to set the default parameters, refer to the technical information document default action timeout and default op timeout. It is available at http://www.suse.com/support/kb/doc.php?id=7009584. You can also adjust the default parameters with Hawk as described in Procedure 5.2, “Modifying Global Cluster Options”.
Getting timeout values right is very important. Setting them too low will result in a lot of (unnecessary) fencing operations for the following reasons:
If a resource runs into a timeout, it fails and the cluster will try to stop it.
If stopping the resource also fails (for example because the timeout for stopping is set too low), the cluster will fence the node (it considers the node where this happens to be out of control).
The CRM executes an initial monitoring for each resource on every node,
the so-called probe, which is also executed after the
cleanup of a resource. If no specific timeout is configured for the
resource's monitoring operation, the CRM will automatically check for
any other monitoring operations. If multiple monitoring operations are
defined for a resource, the CRM will select the one with the smallest
interval and will use its timeout value as default timeout for probing.
If no monitor operation is configured at all, the cluster-wide default,
defined in op_defaults, applies. If you do not
want to rely on the automatic calculation or the
op_defaults values, define a specific timeout
for this monitoring by adding a monitoring operation to the respective
resource, with the timeout set to
0, for example:
crm configure primitive rsc1 ocf:pacemaker:Dummy \
op monitor interval="10" timeout="60"
The probe of rsc1 will time out in
60s, independent of the global timeout defined in
op_defaults, or any other operation timeouts
configured.
The best practice for setting timeout values is as follows:
Check how long it takes your resources to start and stop (under load).
Adjust the (default) timeout values accordingly:
For example, set the default-action-timeout to
120 seconds.
For resources that need longer periods of time, define individual timeout values.
When configuring operations for a resource, add separate
start and stop operations. When
configuring operations with Hawk or the Pacemaker GUI, both will provide
useful timeout proposals for those operations.
If you want to ensure that a resource is running, you must configure resource monitoring for it.
If the resource monitor detects a failure, the following takes place:
Log file messages are generated, according to the configuration
specified in the logging section of
/etc/corosync/corosync.conf. By default, the logs
are written to syslog, usually /var/log/messages.
The failure is reflected in the cluster management tools (Pacemaker GUI, Hawk, crm_mon), and in the CIB status section.
The cluster initiates noticeable recovery actions which may include stopping the resource to repair the failed state and restarting the resource locally or on another node. The resource also may not be restarted at all, depending on the configuration and state of the cluster.
If you do not configure resource monitoring, resource failures after a successful start will not be communicated, and the cluster will always show the resource as healthy.
Usually, resources are only monitored by the cluster as long as they are running. However, to detect concurrency violations, also configure monitoring for resources which are stopped. For example:
primitive dummy1 ocf:heartbeat:Dummy \ op monitor interval="300s" role="Stopped" timeout="10s" \ op monitor interval="30s" timeout="10s"
This configuration triggers a monitoring operation every
300 seconds for the resource dummy1
as soon as it is in role="Stopped". When running, it
will be monitored every 30 seconds.
Learn how to add monitor operations to resources with your preferred cluster management tool:
To avoid a node running out of disk space and thus being no longer able
to adequately manage any resources that have been assigned to it, the
High Availability Extension provides a resource agent,
ocf:pacemaker:SysInfo. Use it to monitor a
node's health with respect to disk partitions.
The SysInfo RA creates a node attribute named
#health_disk which will be set to
red if any of the monitored disks' free space is below
a specified limit.
To define how the CRM should react in case a node's health reaches a
critical state, use the global cluster option
node-health-strategy.
Procedure 4.1. Configuring System Health Monitoring¶
To automatically move resources away from a node in case the node runs out of disk space, proceed as follows:
Configure an ocf:pacemaker:SysInfo resource:
primitive sysinfo ocf:pacemaker:SysInfo \ params disks="/tmp /var"min_disk_free="100M"
disk_unit="M"
\ op monitor interval="15s"
Which disk partitions to monitor. For example,
| |||||
The minimum free disk space required for those partitions.
Optionally, you can specify the unit to use for measurement (in the
example above, | |||||
The unit in which to report the disk space. | |||||
To complete the resource configuration, create a clone of
ocf:pacemaker:SysInfo and start it on each
cluster node.
Set the node-health-strategy to
migrate-on-red:
property node-health-strategy="migrate-on-red"
In case of a #health_disk attribute set to
red, the policy engine adds -INF
to the resources' score for that node. This will cause any resources to
move away from this node. The STONITH resource will be the last one
to be stopped but even if the STONITH resource is not running any
more, the node can still be fenced. Fencing has direct access to the
CIB and will continue to work.
After a node's health status has turned to red, solve
the issue that led to the problem. Then clear the red
status to make the node eligible again for running resources. Log in to
the cluster node and use one of the following methods:
Execute the following command:
crm node status-attr NODE delete #health_diskRestart OpenAIS on that node.
Reboot the node.
The node will be returned to service and can run resources again.
Having all the resources configured is only part of the job. Even if the cluster knows all needed resources, it might still not be able to handle them correctly. Resource constraints let you specify which cluster nodes resources can run on, what order resources will load, and what other resources a specific resource is dependent on.
There are three different kinds of constraints available:
Locational constraints that define on which nodes a resource may be run, may not be run or is preferred to be run.
Colocational constraints that tell the cluster which resources may or may not run together on a node.
Ordering constraints to define the sequence of actions.
For more information on configuring constraints and detailed background information about the basic concepts of ordering and colocation, refer to the following documents. They are available at and , respectively:
Pacemaker Explained , chapter Resource Constraints
Colocation Explained
Ordering Explained
Learn how to add the various kinds of constraints with your preferred cluster management tool:
When defining constraints, you also need to deal with scores. Scores of all kinds are integral to how the cluster works. Practically everything from migrating a resource to deciding which resource to stop in a degraded cluster is achieved by manipulating scores in some way. Scores are calculated on a per-resource basis and any node with a negative score for a resource cannot run that resource. After calculating the scores for a resource, the cluster then chooses the node with the highest score.
INFINITY is currently defined as
1,000,000. Additions or subtractions with it stick to
the following three basic rules:
Any value + INFINITY = INFINITY
Any value - INFINITY = -INFINITY
INFINITY - INFINITY = -INFINITY
When defining resource constraints, you specify a score for each constraint. The score indicates the value you are assigning to this resource constraint. Constraints with higher scores are applied before those with lower scores. By creating additional location constraints with different scores for a given resource, you can specify an order for the nodes that a resource will fail over to.
If you have defined a resource template, it can be referenced in the following types of constraints:
order constraints,
colocation constraints,
rsc_ticket constraints (for multi-site clusters).
However, colocation constraints must not contain more than one reference to a template. Resource sets must not contain a reference to a template.
Resource templates referenced in constraints stand for all primitives which are derived from that template. This means, the constraint applies to all primitive resources referencing the resource template. Referencing resource templates in constraints is an alternative to resource sets and can simplify the cluster configuration considerably. For details about resource sets, refer to Procedure 5.10, “Using Resource Sets for Colocation or Order Constraints”.
A resource will be automatically restarted if it fails. If that cannot
be achieved on the current node, or it fails N times
on the current node, it will try to fail over to another node. Each time
the resource fails, its failcount is raised. You can define a number of
failures for resources (a migration-threshold), after
which they will migrate to a new node. If you have more than two nodes
in your cluster, the node a particular resource fails over to is chosen
by the High Availability software.
However, you can specify the node a resource will fail over to by
configuring one or several location constraints and a
migration-threshold for that resource.
Learn how to specify failover nodes with your preferred cluster management tool:
Example 4.2. Migration Threshold—Process Flow¶
For example, let us assume you have configured a location constraint
for resource r1 to preferably run on
node1. If it fails there,
migration-threshold is checked and compared to the
failcount. If failcount >= migration-threshold then the resource is
migrated to the node with the next best preference.
By default, once the threshold has been reached, the node will no
longer be allowed to run the failed resource until the resource's
failcount is reset. This can be done manually by the cluster
administrator or by setting a failure-timeout option
for the resource.
For example, a setting of migration-threshold=2 and
failure-timeout=60s would cause the resource to
migrate to a new node after two failures and potentially allow it to
move back (depending on the stickiness and constraint scores) after one
minute.
There are two exceptions to the migration threshold concept, occurring when a resource either fails to start or fails to stop:
Start failures set the failcount to INFINITY and
thus always cause an immediate migration.
Stop failures cause fencing (when stonith-enabled
is set to true which is the default).
In case there is no STONITH resource defined (or
stonith-enabled is set to
false), the resource will not migrate at all.
For details on using migration thresholds and resetting failcounts with your preferred cluster management tool:
A resource might fail back to its original node when that node is back
online and in the cluster. If you want to prevent a resource from
failing back to the node it was running on prior to failover, or if you
want to specify a different node for the resource to fail back to, you
must change its resource stickiness value. You can
either specify resource stickiness when you are creating a resource, or
afterwards.
Consider the following implications when specifying resource stickiness values:
0:This is the default. The resource will be placed optimally in the system. This may mean that it is moved when a “better” or less loaded node becomes available. This option is almost equivalent to automatic failback, except that the resource may be moved to a node that is not the one it was previously active on.
0:The resource will prefer to remain in its current location, but may be moved if a more suitable node is available. Higher values indicate a stronger preference for a resource to stay where it is.
0:The resource prefers to move away from its current location. Higher absolute values indicate a stronger preference for a resource to be moved.
INFINITY:
The resource will always remain in its current location unless forced
off because the node is no longer eligible to run the resource (node
shutdown, node standby, reaching the
migration-threshold, or configuration change).
This option is almost equivalent to completely disabling automatic
failback.
-INFINITY:The resource will always move away from its current location.
Not all resources are equal. Some, such as Xen guests, require that the node hosting them meets their capacity requirements. If resources are placed such that their combined need exceed the provided capacity, the resources diminish in performance (or even fail).
To take this into account, the High Availability Extension allows you to specify the following parameters:
The capacity a certain node provides.
The capacity a certain resource requires.
An overall strategy for placement of resources.
Learn how to configure these settings with your preferred cluster management tool:
A node is considered eligible for a resource if it has sufficient free capacity to satisfy the resource's requirements. The nature of the required or provided capacities is completely irrelevant for the High Availability Extension, it just makes sure that all capacity requirements of a resource are satisfied before moving a resource to a node.
To manually configure the resource's requirements and the capacity a node provides, use utilization attributes. You can name the utilization attributes according to your preferences and define as many name/value pairs as your configuration needs. However, the attribute's values must be integers.
If multiple resources with utilization attributes are grouped or have colocation constraints, the High Availability Extension takes that into account. If possible, the resources will be placed on a node that can fulfill all capacity requirements.
![]() | Utilization Attributes for Groups |
|---|---|
It is impossible to set utilization attributes directly for a resource group. However, to simplify the configuration for a group, you can add a utilization attribute with the total capacity needed to any of the resources in the group. | |
The High Availability Extension also provides means to detect and configure both node capacity and resource requirements automatically:
The NodeUtilization resource agent checks the
capacity of a node (regarding CPU and RAM).
To configure automatic detection, create a clone resource of the
following class, provider, and type:
ofc:pacemaker:NodeUtilization. One instance of the
clone should be running on each node. After the instance has started, a
utilization section will be added to the node's configuration in CIB.
For automatic detection of a resource's minimal requirements (regarding
RAM and CPU) the Xen resource agent has been
improved. Upon start of a Xen resource, it will
reflect the consumption of RAM and CPU. Utilization attributes will
automatically be added to the resource configuration.
Apart from detecting the minimal requirements, the High Availability Extension also allows to
monitor the current utilization via the
VirtualDomain resource agent. It detects CPU
and RAM use of the virtual machine. To use this feature, configure a
resource of the following class, provider and type:
ofc:heartbeat:VirtualDomain. Add the
dynamic_utilization instance attribute (parameter)
and set its value to 1. This updates the utilization
values in the CIB during each monitoring cycle.
Independent of manually or automatically configuring capacity and
requirements, the placement strategy must be specified with the
placement-strategy property (in the global cluster
options). The following values are available:
default (default value)Utilization values are not considered at all. Resources are allocated according to location scoring. If scores are equal, resources are evenly distributed across nodes.
utilization
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resource's requirements. However, load-balancing is still done based on the number of resources allocated to a node.
minimal
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resource's requirements. An attempt is made to concentrate the resources on as few nodes as possible (in order to achieve power savings on the remaining nodes).
balanced
Utilization values are considered when deciding if a node has enough free capacity to satisfy a resource's requirements. An attempt is made to distribute the resources evenly, thus optimizing resource performance.
![]() | Configuring Resource Priorities |
|---|---|
The available placement strategies are best-effort—they do not yet use complex heuristic solvers to always reach optimum allocation results. Thus, set your resource priorities in a way that makes sure that your most important resources are scheduled first. | |
Example 4.3. Example Configuration for Load-Balanced Placing¶
The following example demonstrates a three-node cluster of equal nodes, with four virtual machines.
node node1 utilization memory="4000"
node node2 utilization memory="4000"
node node3 utilization memory="4000"
primitive xenA ocf:heartbeat:Xen utilization memory="3500" \
meta priority="10"
primitive xenB ocf:heartbeat:Xen utilization memory="2000" \
meta priority="1"
primitive xenC ocf:heartbeat:Xen utilization memory="2000" \
meta priority="1"
primitive xenD ocf:heartbeat:Xen utilization memory="1000" \
meta priority="5"
property placement-strategy="minimal"
With all three nodes up, resource xenA will be
placed onto a node first, followed by xenD.
xenB and xenC would either be
allocated together or one of them with xenD.
If one node failed, too little total memory would be available to host
them all. xenA would be ensured to be allocated, as
would xenD. However, only one of the remaining
resources xenB or xenC could
still be placed. Since their priority is equal, the result would still
be open. To resolve this ambiguity as well, you would need to set a
higher priority for either one.
Monitoring of virtual machines can be done with the VM agent (which only checks if the guest shows up in the hypervisor), or by external scripts called from the VirtualDomain or Xen agent. Up to now, more fine-grained monitoring was only possible with a full setup of the High Availability stack within the virtual machines.
By providing support for Nagios plug-ins, the High Availability Extension now also allows you to monitor services on remote hosts. You can collect external statuses on the guests without modifying the guest image. For example, VM guests might run Web services or simple network resources that need to be accessible. With the Nagios resource agents, you can now monitor the Web service or the network resource on the guest. In case these services are not reachable anymore, the High Availability Extension will trigger a restart or migration of the respective guest.
If your guests depend on a service (for example, an NFS server to be used by the guest), the service can either be an ordinary resource managed by the cluster or an external service that is not managed by the cluster but monitored with Nagios resources instead.
To configure the Nagios resources, the following packages must be installed on the host:
nagios-plugins
nagios-plugins-metadata
YaST or zypper will resolve any dependencies on further packages, if required.
A typical use case is to configure the Nagios plug-ins as resources belonging to a resource container, which usually is a VM. The container will be restarted if any of its resources has failed. Refer to Example 4.4, “Configuring Resources for Nagios Plug-ins” for a configuration example. Alternatively, Nagios resource agents can also be configured as ordinary resources if you want to use them for monitoring hosts or services via the network.
Example 4.4. Configuring Resources for Nagios Plug-ins¶
primitive vm1 ocf:heartbeat:VirtualDomain \ params hypervisor="qemu:///system" config="/etc/libvirt/qemu/vm1.xml" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op monitor interval="10" timeout="30" primitive vm1-sshd nagios:check_tcp \ params hostname="vm1" port="22" \op start interval="0" timeout="120" \
op monitor interval="10" group vm1-and-services vm1 vm1-sshd \ meta container="vm1"
The supported parameters are same as the long options of a Nagios
plug-in. Nagios plug-ins connect to services with the parameter
| |
As it takes some time to get the guest operating system up and its services running, the start timeout of the Nagios resource must be long enough. | |
A cluster resource container of type
|
The example above contains only one Nagios resource for the
check_tcpplug-in, but multiple Nagios resources for
different plug-in types can be configured (for example,
check_http or check_udp).
If the hostnames of the services are the same, the
hostname parameter can also be specified for the
group, instead of adding it to the individual primitives. For example:
group vm1-and-services vm1 vm1-sshd vm1-httpd \
meta container="vm1" \
params hostname="vm1"
If any of the services monitored by the Nagios plug-ins fail within the
VM, the cluster will detect that and restart the container resource
(the VM). Which action to take in this case can be configured by
specifying the on-fail attribute for the service's
monitoring operation. It defaults to
restart-container.
Failure counts of services will be taken into account when considering the VM's migration-threshold.
Home page of Pacemaker, the cluster resource manager shipped with the High Availability Extension.
Home page of the The High Availability Linux Project.
Holds a number of comprehensive manuals, for example:
Pacemaker Explained: Explains the concepts used to configure Pacemaker. Contains comprehensive and very detailed information for reference.
Fencing and Stonith: How to configure and use STONITH devices.
CRM Command Line Interface: Introduction to the crm command line tool.
Features some more useful documentation, such as:
Colocation Explained
Ordering Explained