Contents
Abstract
Heartbeat is an open source server clustering system that ensures high availability and manageability of critical network resources including data, applications, and services. It is a multinode clustering product for Linux that supports failover, failback, and migration (load balancing) of individually managed cluster resources. Heartbeat is shipped with SUSE Linux Enterprise Server 10 and provides you with the means to make virtual machines (containing services) highly available.
This chapter introduces the main product features and benefits of Heartbeat. Find several exemplary scenarios for configuring clusters and learn about the components making up a Heartbeat version 2 cluster. The last section provides an overview of the Heartbeat architecture, describing the individual architecture layers and processes within the cluster.
Heartbeat includes several important features to help you ensure and manage the availability of your network resources. These include:
Support for Fibre Channel or iSCSI storage area networks
Multi-node active cluster, containing up to 16 Linux servers. Any server in the cluster can restart resources (applications, services, IP addresses, and file systems) from a failed server in the cluster.
A single point of administration through either a graphical Heartbeat tool or a command line tool. Both tools let you configure and monitor your Heartbeat cluster.
The ability to tailor a cluster to the specific applications and hardware infrastructure that fit your organization.
Dynamic assignment and reassignment of server storage on an as-needed basis.
Time-dependent configuration enables resources to fail back to repaired nodes at specified times.
Support for shared disk systems. Although shared disk systems are supported, they are not required.
Support for cluster file systems like OCFS 2.
Support for cluster-aware logical volume managers like EVMS.
Heartbeat allows you to configure up to 16 Linux servers into a high-availability cluster, where resources can be dynamically switched or moved to any server in the cluster. Resources can be configured to automatically switch or be moved in the event of a resource server failure, or they can be moved manually to troubleshoot hardware or balance the workload.
Heartbeat provides high availability from commodity components. Lower costs are obtained through the consolidation of applications and operations onto a cluster. Heartbeat also allows you to centrally manage the complete cluster and to adjust resources to meet changing workload requirements (thus, manually “load balance” the cluster).
An equally important benefit is the potential reduction of unplanned service outages as well as planned outages for software and hardware maintenance and upgrades.
Reasons that you would want to implement a Heartbeat cluster include:
Increased availability
Improved performance
Low cost of operation
Scalability
Disaster recovery
Data protection
Server consolidation
Storage consolidation
Shared disk fault tolerance can be obtained by implementing RAID on the shared disk subsystem.
The following scenario illustrates some of the benefits Heartbeat can provide.
Suppose you have configured a three-server cluster, with a Web server installed on each of the three servers in the cluster. Each of the servers in the cluster hosts two Web sites. All the data, graphics, and Web page content for each Web site are stored on a shared disk subsystem connected to each of the servers in the cluster. The following figure depicts how this setup might look.
During normal cluster operation, each server is in constant communication with the other servers in the cluster and performs periodic polling of all registered resources to detect failure.
Suppose Web Server 1 experiences hardware or software problems and the users depending on Web Server 1 for Internet access, e-mail, and information lose their connections. The following figure shows how resources are moved when Web Server 1 fails.
Web Site A moves to Web Server 2 and Web Site B moves to Web Server 3. IP addresses and certificates also move to Web Server 2 and Web Server 3.
When you configured the cluster, you decided where the Web sites hosted on each Web server would go should a failure occur. In the previous example, you configured Web Site A to move to Web Server 2 and Web Site B to move to Web Server 3. This way, the workload once handled by Web Server 1 continues to be available and is evenly distributed between any surviving cluster members.
When Web Server 1 failed, Heartbeat software
Detected a failure
Remounted the shared data directories that were formerly mounted on Web server 1 on Web Server 2 and Web Server 3.
Restarted applications that were running on Web Server 1 on Web Server 2 and Web Server 3
Transferred IP addresses to Web Server 2 and Web Server 3
In this example, the failover process happened quickly and users regained access to Web site information within seconds, and in most cases, without needing to log in again.
Now suppose the problems with Web Server 1 are resolved, and Web Server 1 is returned to a normal operating state. Web Site A and Web Site B can either automatically fail back (move back) to Web Server 1, or they can stay where they are. This is dependent on how you configured the resources for them. There are advantages and disadvantages to both alternatives. Migrating the services back to Web Server 1 will incur some down-time. Heartbeat also allows you to defer the migration until a period when it will cause little or no service interruption.
Heartbeat also provides resource migration capabilities. You can move applications, Web sites, etc. to other servers in your cluster without waiting for a server to fail.
For example, you could have manually moved Web Site A or Web Site B from Web Server 1 to either of the other servers in the cluster. You might want to do this to upgrade or perform scheduled maintenance on Web Server 1, or just to increase performance or accessibility of the Web sites.
Heartbeat cluster configurations might or might not include a shared disk subsystem. The shared disk subsystem can be connected via high-speed Fibre Channel cards, cables, and switches, or it can be configured to use iSCSI. If a server fails, another designated server in the cluster automatically mounts the shared disk directories previously mounted on the failed server. This gives network users continuous access to the directories on the shared disk subsystem.
![]() | Shared Disk Subsystem with EVMS |
|---|---|
When using a shared disk subsystem with EVMS, that subsystem must be connected to all servers in the cluster. | |
Typical Heartbeat resources might include data, applications, and services. The following figure shows how a typical Fibre Channel cluster configuration might look.
Although Fibre Channel provides the best performance, you can also configure your cluster to use iSCSI. iSCSI is an alternative to Fibre Channel that can be used to create a low-cost SAN. The following figure shows how a typical iSCSI cluster configuration might look.
Although most clusters include a shared disk subsystem, it is also possible to create a Heartbeat cluster without a share disk subsystem. The following figure shows how a Heartbeat cluster without a shared disk subsystem might look.
The following components make up a Heartbeat version 2 cluster:
From 2 to 16 Linux servers, each containing at least one local disk device.
Heartbeat software running on each Linux server in the cluster.
Optional: A shared disk subsystem connected to all servers in the cluster.
Optional: High-speed Fibre Channel cards, cables, and switch used to connect the servers to the shared disk subsystem.
At least two communications mediums over which Heartbeat servers can communicate. These currently include Ethernet (mcast, ucast, or bcast) or Serial.
A STONITH device. A STONITH device is a power switch which the cluster uses to reset nodes that are considered dead. Resetting non-heartbeating nodes is the only reliable way to ensure that no data corruption is performed by nodes that hang and only appear to be dead.
See http://linux-ha.org/wiki/STONITH for more information on STONITH.
This section provides a brief overview of the Heartbeat architecture. It identifies and provides information on the Heartbeat architectural components, and describes how those components interoperate.
Heartbeat has a layered architecture. Figure 1.6, “Heartbeat Architecture” illustrates the different layers and their associated components.
The primary or first layer is the messaging/infrastructure layer, also known as the Heartbeat layer. This layer contains components that send out the Heartbeat messages containing “I'm alive” signals, as well as other information. The Heartbeat program resides in the messaging/infrastructure layer.
The second layer is the membership layer. The membership layer is responsible for calculating the largest fully connected set of cluster nodes and synchronizing this view to all of its members. It performs this task based on the information it gets from the Heartbeat layer. The logic that takes care of this task is contained in the Cluster Consensus Membership service, which provides an organized cluster topology overview (node-wise) to cluster components that are the higher layers.
The third layer is the resource allocation layer. This layer is the most complex, and consists of the following components:
Every action taken in the resource allocation layer passes through the Cluster Resource Manager. If any other components of the resource allocation layer, or other components which are in a higher layer need to communicate, they do so through the local Cluster Resource Manager.
On every node, the Cluster Resource Manager maintains the Cluster Information Base, or CIB (see Cluster Information Base below). One Cluster Resource Manager in the cluster is elected as the Designated Coordinator (DC), meaning that it has the master CIB. All other CIBs in the cluster are a replicas of the master CIB. Normal read and write operations on the CIB are serialized through the master CIB. The DC is the only entity in the cluster that can decide that a cluster-wide change needs to be performed, such as fencing a node or moving resources around.
The Cluster Information Base or CIB is an in-memory XML
representation of the entire cluster configuration and status,
including node membership, resources, constraints, etc. There is one
master CIB in the cluster, maintained by the DC. All the other nodes
contain a CIB replica. If an administrator wants to manipulate the
cluster's behavior, he can use either the cibadmin command line tool
or the Heartbeat GUI tool.
![]() | Usage of Heartbeat GUI Tool and cibadmin |
|---|---|
The Heartbeat GUI tool can be used from any machine with a connection to the cluster. The cibadmin command must be used on a cluster node, and is not restricted to only the DC node. | |
Whenever the Designated Coordinator needs to make a cluster-wide change (react to a new CIB), the Policy Engine is used to calculate the next state of the cluster and the list of (resource) actions required to achieve it. The commands computed by the Policy Engine are then executed by the Transition Engine. The DC will send out messages to the relevant Cluster Resource Managers in the cluster, who then use their Local Resource Managers (see Local Resource Manager (LRM) below) to perform the necessary resource manipulations. The PE/TE pair only runs on the DC node.
The Local Resource Manager calls the local Resource Agents (see Section “Resource Layer” below) on behalf of the CRM. It can thus perform start / stop / monitor operations and report the result to the CRM. The LRM is the authoritative source for all resource related information on its local node.
The fourth and highest layer is the Resource Layer. The Resource Layer includes one or more Resource Agents (RA). A Resource Agent is a program, usually a shell script, that has been written to start, stop, and monitor a certain kind of service (a resource). The most common Resource Agents are LSB init scripts. However, Heartbeat also supports the more flexible and powerful Open Clustering Framework Resource Agent API. The agents supplied with Heartbeat are written to OCF specifications. Resource Agents are called only by the Local Resource Manager. Third parties can include their own agents in a defined location in the file system and thus provide out-of-the-box cluster integration for their own software.
Many actions performed in the cluster will cause a cluster-wide change. These actions can include things like adding or removing a cluster resource or changing resource constraints. It is important to understand what happens in the cluster when you perform such an action.
For example, suppose you want to add a cluster IP address resource. To do this, you use either the cibadmin command line tool or the Heartbeat GUI tool to modify the master CIB. It is not required to use the cibadmin command or the GUI tool on the Designated Coordinator. You can use either tool on any node in the cluster, and the local CIB will relay the requested changes to the Designated Coordinator. The Designated Coordinator will then replicate the CIB change to all cluster nodes and will start the transition procedure.
With help of the Policy Engine and the Transition Engine, the Designated Coordinator obtains a series of steps that need to be performed in the cluster, possibly on several nodes. The Designated Coordinator sends commands out via the messaging/infrastructure layer which are received by the other Cluster Resource Managers.
If necessary, the other Cluster Resource Managers use their Local Resource Manager to perform resource modifications and report back to the Designated Coordinator about the result. Once the Transition Engine on the Designated Coordinator concludes that all necessary operations are successfully performed in the cluster, the cluster will go back to the idle state and wait for further events.
If any operation was not carried out as planned, the Policy Engine is invoked again with the new information recorded in the CIB.
When a service or a node dies, the same thing happens. The Designated Coordinator is informed by the Cluster Consensus Membership service (in case of a node death) or by a Local Resource Manager (in case of a failed monitor operation). The Designated Coordinator determines that actions need to be taken in order to come to a new cluster state. The new cluster state will be represented by a new CIB.