Chapter 13. Multi-Site Clusters

Contents

13.1. Challenges for Multi-Site Clusters
13.2. Conceptual Overview
13.3. Requirements
13.4. Basic Setup
13.5. Managing Multi-Site Clusters

Abstract

Apart from local clusters and metro area clusters, SUSE® Linux Enterprise High Availability Extension 11 SP2 also supports multi-site clusters. That means you can have multiple, geographically dispersed sites with a local cluster each. Failover between these clusters is coordinated by a higher level entity, the so-called booth. Support for multi-site clusters is available as a separate option to SUSE Linux Enterprise High Availability Extension.

13.1. Challenges for Multi-Site Clusters

Typically, multi-site environments are too far apart to support synchronous communication between the sites and synchronous data replication. That leads to the following challenges:

  • How to make sure that a cluster site is up and running?

  • How to make sure that resources are only started once?

  • How to make sure that quorum can be reached between the different sites and a split brain scenario can be avoided?

  • How to manage failover between the sites?

  • How to deal with high latency in case of resources that need to be stopped?

In the following sections, learn how to meet these challenges with SUSE Linux Enterprise High Availability Extension.

13.2. Conceptual Overview

Multi-site clusters based on SUSE Linux Enterprise High Availability Extension can be considered as overlay clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster is managed by the booth mechanism. It guarantees that the cluster resources will be highly available across different cluster sites. This is achieved by using so-called tickets that are treated as failover domain between cluster sites, in case a site should be down.

The following list explains the individual components and mechanisms that were introduced for multi-site clusters in more detail.

Components and Concepts

Ticket

A ticket grants the right to run certain resources on a specific cluster site. A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket—each ticket must be granted once by the cluster administrator. After that, tickets are managed by the booth for automatic failover of resources. But administrators may also intervene and grant or revoke tickets manually.

Resources can be bound to a certain ticket by dependencies. Only if the defined ticket is available at a site, the respective resources are started. Vice versa, if the ticket is removed, the resources depending on that ticket are automatically stopped.

The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regards to a certain ticket, there are only two states for a site: true (the site has the ticket) or false (the site does not have the ticket). The absence of a certain ticket (during the initial state of the multi-site cluster) is not treated differently from the situation after the ticket has been revoked: both are reflected by the value false.

A ticket within an overlay cluster is similar to a resource in a traditional cluster. But in contrast to traditional clusters, tickets are the only type of resource in an overlay cluster. They are primitive resources that do not need to be configured nor cloned.

Booth

The booth is the instance managing the ticket distribution and thus, the failover process between the sites of a multi-site cluster. Each of the participating clusters and arbitrators runs a service, the boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details. Once a ticket is granted to a site, the booth mechanism will manage the ticket automatically: If the site which holds the ticket is out of service, the booth daemons will vote which of the other sites will get the ticket. To protect against brief connection failures, sites that lose the vote (either explicitly or implicitly by being disconnected from the voting body) need to relinquish the ticket after a time-out. Thus, it is made sure that a ticket will only be re-distributed after it has been relinquished by the previous site. See also Dead Man Dependency (loss-policy="fence").

Arbitrator

Each site runs one booth instance that is responsible for communicating with the other sites. If you have a setup with an even number of sites, you need an additional instance to reach consensus about decisions such as failover of resources across sites. In this case, add one or more arbitrators running at additional sites. Arbitrators are single machines that run a booth instance in a special mode. As all booth instances communicate with each other, arbitrators help to make more reliable decisions about granting or revoking tickets.

An arbitrator is especially important for a two-site scenario: For example, if site A can no longer communicate with site B, there are two possible causes for that:

  • A network failure between A and B.

  • Site B is down.

However, if site C (the arbitrator) can still communicate with site A, site A must still be up and running.

Dead Man Dependency (loss-policy="fence")

After a ticket is revoked, it can take a long time until all resources depending on that ticket are stopped, especially in case of cascaded resources. To cut that process short, the cluster administrator can configure a loss-policy (together with the ticket dependencies) for the case that a ticket gets revoked from a site. If the loss-policy is set to fence, the nodes that are hosting dependent resources are fenced. This considerably speeds up the recovery process of the cluster and makes sure that resources can be migrated more quickly.

Figure 13.1. Example Scenario: A Two-Site Cluster (4 Nodes + Arbitrator)

Example Scenario: A Two-Site Cluster (4 Nodes + Arbitrator)

As usual, the CIB is synchronized within each cluster, but it is not synchronized across cluster sites of a multi-site cluster. You have to configure the resources that will be highly available across the multi-site cluster for every site accordingly.

13.3. Requirements

Software Requirements

  • All clusters that will be part of the multi-site cluster must be based on SUSE Linux Enterprise High Availability Extension 11 SP2.

  • SUSE® Linux Enterprise Server 11 SP2 must be installed on all arbitrators.

  • The booth package must be installed on all cluster nodes and on all arbitrators that will be part of the multi-site cluster.

The most common scenario is probably a multi-site cluster with two sites and a single arbitrator on a third site. However, technically, there are no limitations with regards to the number of sites and the number of arbitrators involved.

Nodes belonging to the same cluster site should be synchronized via NTP. However, time synchronization is not required between the individual cluster sites.

13.4. Basic Setup

Configuring a multi-site cluster takes the following basic steps:

13.4.1. Configuring Cluster Resources and Constraints

Procedure 13.1. Configuring Ticket Dependencies

The crm configure rsc_ticket command lets you specify the resources depending on a certain ticket. Together with the constraint, you can set a loss-policy that defines what should happen to the respective resources if the ticket is revoked. The attribute loss-policy can have the following values:

  • fence: Fence the nodes that are running the relevant resources.

  • stop: Stop the relevant resources.

  • freeze: Do nothing to the relevant resources.

  • demote: Demote relevant resources that are running in master mode to slave mode.

  1. On one of the cluster nodes, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive shell.

  3. Configure a constraint that defines which resources depend on a certain ticket. For example:

    crm(live)configure#
    rsc_ticket rsc1-req-ticketA ticketA: rsc1 loss-policy="fence"

    This creates a constraint with the ID rsc1-req-ticketA. It defines that the resource rsc1 depends on ticketA and that the node running the resource should be fenced in case ticketA is revoked.

    If resource rsc1 was not a primitive, but a special clone resource that can run in master or slave mode, you could also configure that rsc1 is automatically demoted to slave mode if ticketA is revoked:

    crm(live)configure#
    rsc_ticket rsc1-req-ticketA ticketA: rsc1:Master loss-policy="demote"
  4. If you want other resources to depend on further tickets, create as many constraints as necessary with rsc_ticket.

  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

    The constraints are saved to the CIB.

Procedure 13.2. Configuring a Resource Group for boothd

Each site needs to run one instance of boothd that communicates with the other booth daemons. The daemon can be started on any node, therefore it should be configured as primitive resource. As each daemon needs a persistent IP address, configure another primitive with a virtual IP address. Group booth primitives:

  1. On one of the cluster nodes, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive shell.

  3. To create both primitive resources and to add them to one group, g-booth:

    crm(live)configure#
    primitive booth-ip ocf:heartbeat:IPaddr2 params ip="IP_ADDRESS"
    primitive booth ocf:pacemaker:booth-site
    group g-booth booth-ip booth
  4. Review your changes with show.

  5. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

  6. Repeat the resource group configuration on the other cluster sites, using a different IP address for each boothd resource group.

    With this configuration, each booth daemon will be available at its individual IP address, independent of the node the daemon is running on.

13.4.2. Setting Up the Booth Services

After having configured the resource group for the boothd and the ticket dependencies, complete the booth setup:

Procedure 13.3. Editing The Booth Configuration File

  1. Log in to a cluster node as root or equivalent.

  2. Create /etc/sysconfig/booth and edit it according to the example below:

    Example 13.1. Example Booth Configuration File

    transport="UDP" 1
    port="6666" 2
    arbitrator="147.2.207.14" 3
    site="147.4.215.19" 4
    site="147.18.2.1"  4
    ticket="ticketA;510006"
    ticket="ticketB;510006"     

    1

    Defines the transport protocol used for communication between the sites. For SP2, only UDP is supported, other transport layers will follow.

    2

    Defines the port used for communication between the sites. Choose any port that is not already used for different services. Make sure to open the port in the nodes' and arbitrators' firewalls.

    3

    Defines the IP address of the arbitrator. Insert an entry for each arbitrator you use in your setup.

    4

    Defines the IP address used for the boothd on each site. Make sure to insert the correct virtual IP addresses (IPaddr2) for each site, otherwise the booth mechanism will not work correctly.

    5

    Defines the ticket to be managed by the booth. For each ticket, add a ticket entry.

    5

    Defines the ticket's expiry time in seconds. A site that has been granted a ticket will renew the ticket regularly. If the booth does not receive any information about renewal of the ticket within the defined expiry time, the ticket will be revoked and granted to another site.


    An example booth configuration file is available at /etc/booth/booth.conf.example.

  3. Verify your changes and save the file.

  4. Copy /etc/sysconfig/booth to all sites and arbitrators. In case of any changes, make sure to update the file accordingly on all parties.

    [Note]Synchronize Booth Configuration to All Sites and Arbitrators

    All cluster nodes and arbitrators within the multi-site cluster must use the same booth configuration. While you may need to copy the files manually to the arbitrators and to one cluster node per site, you can use Csync2 within each cluster site to synchronize the file to all nodes.

Procedure 13.4. Starting the Booth Services

  1. Start the booth resource group on each other cluster site. It will start one instance of the booth service per site.

  2. Log in to each arbitrator and start the booth service:

    /etc/init.d/booth-arbitrator start

    This starts the booth service in arbitrator mode. It can communicate with all other booth daemons but in contrast to the booth daemons running on the cluster sites, it cannot be granted a ticket.

After finishing the booth configuration and starting the booth services, you are now ready to start the ticket process.

13.5. Managing Multi-Site Clusters

Before the booth can manage a certain ticket within the multi-site cluster, you initially need to grant it to a site manually. Use the booth client command line tool to grant, list, or revoke tickets. The booth client commands work on any machine where the booth daemon is running.

[Warning]crm_ticket and crm site ticket

In case the booth service is not running for any reasons, you may also manage tickets manually with crm_ticket or crm site ticket. Both commands are only available on cluster nodes. In case of manual intervention, use them with great care as they cannot verify if the same ticket is already granted elsewhere. For basic information about the commands, refer to their man pages.

As long as booth is up and running, only use booth client for manual intervention.

Procedure 13.5. Managing Tickets Manually

  1. To list all tickets on all sites:

    booth client list
  2. To grant a ticket to a site (for example, ticketA to the site 147.2.207.14):

    booth client grant -t ticketA -s 147.2.207.14

    The command executes a sanity check and will warn you if the same ticket is already granted to another site.

    When granting tickets, you can also specify the expiry time after which a ticket will fail over to another site if it has not been renewed. The default expiry time is 600 seconds. To specify a different value, use the -e option:

    booth client grant -t ticketA -s 147.2.207.14 -e 1000
  3. To revoke a ticket from a site (for example, ticketA from the site 147.2.207.14):

    booth client revoke -t ticketA -s 147.2.207.14

After you have initially granted a ticket to a site, the booth mechanism will take over and manage the ticket automatically. If the site holding a ticket should be out of service, the ticket will automatically be revoked after the expiry time and granted to another site. The resources that depend on that ticket will fail over to the new site holding the ticket. The nodes that have run the resources before will be treated according to the loss-policy you set within the constraint.