Contents
The most important software modifications from version to version are outlined in the following sections. This summary indicates, for example, whether basic settings have been completely reconfigured, configuration files have been moved to other places, or other significant changes happened.
For more details and the most recent information, refer to the release
notes of the respective product version. They are available in the
installed system at /usr/share/doc/release-notes, or
online at https://www.suse.com/releasenotes/.
With SUSE Linux Enterprise Server 11, the cluster stack has changed from Heartbeat to OpenAIS. OpenAIS implements an industry standard API, the Application Interface Specification (AIS), published by the Service Availability Forum. The cluster resource manager from SUSE Linux Enterprise Server 10 has been retained but has been significantly enhanced, ported to OpenAIS and is now known as Pacemaker.
For more details what changed in the High Availability components from SUSE® Linux Enterprise Server 10 SP3 to SUSE Linux Enterprise Server 11, refer to the following sections.
The High Availability Extension now comes with the concept of a migration threshold and
failure timeout. You can define a number of failures for resources,
after which they will migrate to a new node. By default, the node
will no longer be allowed to run the failed resource until the
administrator manually resets the resource’s failcount. However it
is also possible to expire them by setting the resource’s
failure-timeout option.
You can now set global defaults for resource options and operations.
Often it is desirable to preview the effects of a series of changes before updating the configuration atomically. You can now create a “shadow” copy of the configuration that can be edited with the command line interface, before committing it and thus changing the active cluster configuration atomically.
Rules, instance_attributes, meta_attributes and sets of operations can be defined once and referenced in multiple places.
The CIB now accepts XPath-based create,
modify, delete operations. For
more information, refer to the cibadmin help text.
For creating a set of collocated resources, previously you could
either define a resource group (which could not always accurately
express the design) or you could define each relationship as an
individual constraint—causing a constraint explosion as the
number of resources and combinations grew. Now you can also use an
alternate form of colocation constraints by defining
resource_sets.
Provided Pacemaker is installed on a machine, it is possible to connect to the cluster even if the machine itself is not a part of it.
By default, recurring actions are scheduled relative to when the resource started, but this is not always desirable. To specify a date/time that the operation should be relative to, set the operation’s interval-origin. The cluster uses this point to calculate the correct start-delay such that the operation will occur at origin + (interval * N).
All resource and cluster options now use dashes (-) instead of
underscores (_). For example, the master_max meta
option has been renamed to master-max.
master_slave Resource
The master_slave resource has been renamed to
master. Master resources are a special type of
clone that can operate in one of two modes.
The attributes container tag has been removed.
The pre-req operation field has been renamed
requires.
All operations must have an interval. For start/stop actions the
interval must be set to 0 (zero).
The attributes of colocation and ordering constraints were renamed for clarity.
The resource-failure-stickiness cluster option has
been replaced by the migration-threshold cluster
option. See also Migration Threshold and Failure Timeouts.
The arguments for command-line tools have been made consistent. See also Naming Conventions for Resource and Custer Options.
The cluster configuration is written in XML. Instead of a Document
Type Definition (DTD), now a more powerful RELAX NG schema is
used to define the pattern for the structure and content.
libxml2 is used as parser.
id Fields
id fields are now XML IDs which have the following
limitations:
IDs cannot contain colons.
IDs cannot begin with a number.
IDs must be globally unique (not just unique for that tag).
Some fields (such as those in constraints that refer to resources) are IDREFs. This means that they must reference existing resources or objects in order for the configuration to be valid. Removing an object which is referenced elsewhere will therefore fail.
It is no longer possible to set resource meta-options as top-level attributes. Use meta attributes instead. See also the crm_resource man page.
Resource and operation defaults are no longer read from
crm_config.
The main cluster configuration file has changed from
/etc/ais/openais.conf to
/etc/corosync/corosync.conf. Both files are very
similar. When upgrading from SUSE Linux Enterprise High Availability Extension 11 to SP1, a script takes
care of the minor differences between those files.
In order to migrate existing clusters with minimal downtime, SUSE Linux Enterprise High Availability Extension allows you to perform a “rolling upgrade” from SUSE Linux Enterprise High Availability Extension 11 to 11 SP1. The cluster is still online while you upgrade one node after the other.
For easier cluster deployment, AutoYaST allows you to clone existing nodes. AutoYaST is a system for installing one or more SUSE Linux Enterprise systems automatically and without user intervention, using an AutoYaST profile that contains installation and configuration data. The profile tells AutoYaST what to install and how to configure the installed system to get a completely ready-to-use system in the end. This profile can be used for mass deployment in different ways.
SUSE Linux Enterprise High Availability Extension ships with Csync2, a tool for replication of configuration files across all nodes in the cluster. It can handle any number of hosts and it is also possible to synchronize files among certain subgroups of hosts only. Use YaST to configure the host names and the files that should be synchronized with Csync2.
The High Availability Extension now also includes the HA Web Konsole (Hawk), a Web-based user interface for management tasks. It allows you to monitor and administer your Linux cluster also from non-Linux machines. It is also an ideal solution in case your system does not provide or allow a graphical user interface.
When using the command line interface to create and configure resources, you can now choose from various resource templates for quicker and easier configuration.
By defining the capacity a certain node provides, the capacity a certain resource requires and by choosing one of several placement strategies in the cluster, resources can be placed according to their load impact to prevent decrease of cluster performance.
It is now possible to create disaster-resilient storage configurations
from two independent SANs, using
cmirrord.
For easier migration from GFS2 to OCFS2, you can mount your GFS2 file systems in read-only mode to copy the data to an OCFS2 file system. OCFS2 is fully supported by SUSE Linux Enterprise High Availability Extension.
If redundant rings are configured, OCFS2 and DLM can automatically use redundant communication paths via SCTP, independent of network device bonding.
For additional layers of security in protecting your storage from data
corruption, you can use a combination of IO fencing (with the
external/sbd fencing device) and the
sfex resource agent
to ensure exclusive storage access.
The High Availability Extension now supports CTDB, the cluster implementation of the trivial database. This allows you configure a clustered Samba server—providing an High Availability solution also for heterogeneous environments.
The new module allows configuration of Kernel-based load balancing
with a graphical user interface. It is a front-end for
ldirectord, a
user-space daemon for managing Linux Virtual Server and monitoring the real servers.
Apart from local clusters and metro area clusters, SUSE® Linux Enterprise High Availability Extension
11 SP4 also supports multi-site clusters. That means you can
have multiple, geographically dispersed sites with a local cluster
each. Failover between these clusters is coordinated by a higher level
entity, the so-called booth. Support for multi-site
clusters is available as a separate option to SUSE Linux Enterprise High Availability Extension.
For defining fine-grained access rights to any part of the cluster configuration ACLs are supported. If this feature is enabled in the CRM, the available functions in the cluster management tools depend on the role and access rights assigned to a user.
For quick and easy cluster setup, use the bootstrap scripts
sleha-init and sleha-join to
get a one-node cluster up and running in a few minutes and to make
other nodes join, respectively. Any options set during the bootstrap
process can be modified later with the YaST cluster module.
While multicast is still default, using unicast for the communication between nodes is now also supported. For more information, refer to Section 3.5.2, “Defining the Communication Channels”.
Hawk's functionality has been considerably extended. Now you can
configure global cluster properties, basic and advanced types of
resources, constraints and resource monitoring. For detailed analysis
of the cluster status, Hawk generates a cluster report
(hb_report). View the cluster history or explore
potential failure scenarios with the simulator. For details, refer to
Chapter 5, Configuring and Managing Cluster Resources (Web Interface).
To ease configuration of similar resources, all cluster management tools now let you define resource templates that can be referenced in primitives or certain types of constraints.
For placing resources based on load impact, the High Availability Extension now offers automatic detection of both the capacity of a node and the capacities a resource requires. The minimal requirements of a virtual machine (for example, the memory assigned to a Xen or KVM guest or the number of CPU cores) can be detected by a resource agent. Utilization attributes (used to define the requirements or capacity) will automatically be added to the CIB. For more information, refer to Section 4.4.6, “Placing Resources Based on Their Load Impact”.
To protect a node's network connection from being overloaded by a
large number of parallel Xen or KVM live migrations, a new global
cluster property has been introduced:
migration-limit. It allows you to limit the number
of migration jobs that the TE may execute in parallel on a node. By
default, it is set to -1, which means the number of
parallel migrations is unlimited.
To synchronize the connection status between cluster nodes, the High Availability Extension
uses the
conntrack-tools.
They allow interaction with the in-kernel Connection Tracking System
for enabling stateful packet inspection for
iptables. For more information, refer to
Section 3.5.5, “Synchronizing Connection Status Between Cluster Nodes”.
To execute commands on all cluster nodes without having to log in to each node, use pssh. For more information, refer to Section 20.5, “Miscellaneous”.
To set passwords for STONITH or other resources independent of
cib.xml, use crm resource
secret. For more information, refer to
Section 7.5, “Setting Passwords Independent of cib.xml”.
The CTDB functionality to join Active Directory Domains has been improved. For more information, refer to Section 18.3, “Joining an Active Directory Domain”.
Rear (Relax and Recover) is an administrator tool-set for creating disaster recovery images. The disaster recovery information can either be stored via the network or locally on hard disks, USB devices, DVD/CD-R, tape or similar. The backup data is stored on a network file system (NFS).
To use quotas on OCFS2 file systems, create and mount the files system
with the appropriate quota features or mount options, respectively:
ursquota (quota for individual users) or
grpquota (quota for groups).
Hawk's functionality has again been extended. For example, you can monitor multiple clusters with Hawk's new . Hawk's simulator now also allows you to change the status of nodes, add or edit resources and constraints, or change the cluster configuration for a simulator run. Apart from this, many other details in the HA Web Konsole have been enhanced.
The X11-based Pacemaker GUI is now in maintenance mode and is not scheduled to receive new functionality during the SUSE Linux Enterprise High Availability Extension 11 lifecycle. For SUSE Linux Enterprise High Availability Extension 12, the HA Web Konsole (Hawk) will become the default graphical user interface for the High Availability Extension.
The sleha-remove bootstrap script now makes it easier to remove single nodes from a cluster.
Sometimes it is necessary to put single nodes into maintenance mode. If your cluster consists of more than 3 nodes, you can easily set one node to maintenance mode, while the other nodes continue their normal operation.
The group command in the crm shell has been extended to allow modification of groups: it is possible to add resources to a group, to delete resources from a group, and to change the order of group members.
If multiple resources with utilization attributes are grouped or have colocation constraints, the High Availability Extension takes that into account. If possible, the resources will be placed on a node that can fulfill all capacity requirements. For details about utilization attributes, refer to Section 4.4.6, “Placing Resources Based on Their Load Impact”.
If no specific timeout is configured for the resource's monitoring
operation, the CRM will now automatically calculate a timeout for
probing. For details, refer to
Section 4.2.9, “Timeout Values”. Up to
now, the default timeout for probing had been inherited from the
cluster-wide default operation timeout if no
specific timeout was configured.
To avoid a node running out of disk space and thus being no longer
able to adequately manage any resources that have been assigned to it,
the High Availability Extension provides a resource agent,
ocf:pacemaker:SysInfo. Use it to monitor a
node's health with respect to disk partitions.
Both the crm shell and Hawk now offer the possibility to display a graphical representation of the nodes and the resources configured in the CIB. For details, refer to Section 7.1.7, “Cluster Diagram” and Section 5.1.2, “Main Screen: Cluster Status”.
Sometimes it is necessary to replace a bonding slave interface with another one, for example, if the respective network device constantly fails. The solution is to set up hotplugging of bonding slaves, which is now supported by YaST.
The High Availability Extension supports RAID 10 in cmirrord: it
is now possible to add multiple physical volumes per mirror leg. Also
the mirrored option of the
lvcreate command is supported, which means that
temporarily broken mirrors are much faster to resynchronize.
YaST now supports the CTDB functionality to join Active Directory domains.
Introduced a consistent naming scheme for cluster names, node names, cluster resources, and constraints and applied it to the documentation. See Appendix A, Naming Conventions. (Fate#314938).
Improved the consistency of crm shell examples.
Removed chapter HA OCF Agents with the agent list. As a result, removed the part Troubleshooting and Reference, too, and moved the chapter Troubleshooting to the appendix.
Moved documentation for Geo Clustering for SUSE Linux Enterprise High Availability Extension into a separate document
(Fate#316120). For details on how to use and set up geographically dispersed
clusters, refer to the Quick Start Geo Clustering for SUSE Linux Enterprise High Availability Extension. It is available from
http://www.suse.com/documentation/ or in your installed system under
/usr/share/doc/manual/sle-ha-geo-manuals_en/.
Changed terminology for master-slave resources,
which are now called multi-state resources in the
upstream documentation.
Updated the screenshots.
Mentioned both hb_report and crm_report as command line tools for creating detailed cluster reports.
Numerous bug fixes and additions to the manual based on technical feedback.
Changed terminology from multi-site clusters to geographically dispersed (or Geo) clusters for consistency reasons.
Added section about availability of SUSE Linux Enterprise High Availability Extension and Geo Clustering for SUSE Linux Enterprise High Availability Extension as add-on products: Section 1.1, “Availability as Add-On/Extension”.
Restructured contents.
Mentioned how to create a cluster report when using a non-standard SSH port (Fate#314906). See SSH.
Added note No-Start-on-Boot Parameter (Fate#317778).
Section 4.1.3, “Option stonith-enabled” mentions policy
change in DLM services when the global cluster option
stonith-enabled is set to false
(Fate#315195).
Section 4.4.7, “Grouping Resources by Using Tags” describes a new option to group conceptually related resources, without creating any colocation or ordering relationship between them (Fate#318109).
Section 4.4.1.1, “Resource Sets” explains the concept of resource sets as an alternative format for defining constraints.
Restructured Section 4.7, “Maintenance Mode” to also cover the option of setting a whole cluster to maintenance mode.
Added attributes for pacemaker_remote service to
Table 4.1, “Options for a Primitive Resource” and added new section: Section 4.5, “Managing Services on Remote Hosts”.
Updated the chapter to reflect the new features that have been described in Chapter 4, Configuration and Administration Basics.
The description of all Hawk functions that are related to Geo clusters has been moved to a separate document. See the new Geo Clustering for SUSE Linux Enterprise High Availability Extension Quick Start, available from http://www.suse.com/documentation/.
Added Section 7.3.4.3, “Collocating Sets for Resources Without Dependency” (Fate#314917).
Added a section about tagging resources (Fate#318109): Section 7.4.5, “Grouping/Tagging Resources”.
Updated the chapter to reflect the new features that have been described in Chapter 4, Configuration and Administration Basics.
Removed cloning of STONITH resources from examples as this is no longer needed.
Removed some STONITH devices that are for testing purposes only.
Removed external/kdumpcheck resource agent and
added example configuration for the fence_kdump
resource agent instead.
Updated chapter according to the new ACL features that become available after upgrading the CIB validation version. For details, see Upgrading the CIB Syntax Version.
If you have upgraded from SUSE Linux Enterprise High Availability Extension 11 SP3 and kept your former CIB version, refer to the Access Control List chapter in the High Availability Guide for SUSE Linux Enterprise High Availability Extension 11 SP3. It is available from http://www.suse.com/documentation/.
Section 17.1.3.2, “Setting Up the Software Watchdog”: Added note about watchdog and other software that accesses the watchdog timer (https://bugzilla.suse.com/show_bug.cgi?id=891340).
Section 17.1.3.2, “Setting Up the Software Watchdog”: Added how to load the watchdog driver at boot time (https://bugzilla.suse.com/show_bug.cgi?id=892344).
Section 17.1.3.5, “Configuring the Fencing Resource”: Added advice
about length of stonith-timeout in relation to
msgwait timeout (https://bugzilla.suse.com/show_bug.cgi?id=891346).
Adjusted Section 17.1.3.3, “Starting the SBD Daemon” (https://bugzilla.suse.com/show_bug.cgi?id=891499).
Described how to avoid double fencing in clusters with
no-quorum-policy=ignore by using the
pcmk_delay_max parameter for the STONITH
resource configuration (Fate#31713).
The chapter has been completely revised and updated to Rear
version 1.16.
SUSE Linux Enterprise High Availability Extension 11 SP4 ships two Rear versions in parallel:
version 1.10.0 (included in RPM package: rear) and version 1.16
(included in RPM package rear116). For the documentation of Rear version
1.10.0, see the High Availability Guide for SUSE Linux Enterprise High Availability Extension 11
SP3. It is available from http://www.suse.com/documentation/.
Mentioned how to create a cluster report when using a non-standard SSH port (Fate#314906). See How can I create a report with an analysis of all my cluster nodes?.
This chapter has been removed. The latest information about the OCF resource agents can be viewed in the installed system as described in Section 7.1.3, “Displaying Information about OCF Resource Agents”.
New appendix explaining the naming scheme used in the documentation.