SUSE Linux Enterprise High Availability Extension

High Availability Guide

Legal Notice

Contents

About This Guide
1. Feedback
2. Documentation Conventions
I. Installation and Setup
1. Product Overview
1.1. Key Features
1.2. Benefits
1.3. Cluster Configurations: Storage
1.4. Architecture
2. Getting Started
2.1. Hardware Requirements
2.2. Software Requirements
2.3. Shared Disk System Requirements
2.4. Preparations
2.5. Overview: Installing and Setting Up a Cluster
3. Installation and Basic Setup with YaST
3.1. Installing the High Availability Extension
3.2. Initial Cluster Setup
3.3. Bringing the Cluster Online
3.4. Mass Deployment with AutoYaST
II. Configuration and Administration
4. Configuration and Administration Basics
4.1. Global Cluster Options
4.2. Cluster Resources
4.3. Resource Monitoring
4.4. Resource Constraints
4.5. For More Information
5. Configuring and Managing Cluster Resources (GUI)
5.1. Pacemaker GUI—Overview
5.2. Configuring Global Cluster Options
5.3. Configuring Cluster Resources
5.4. Managing Cluster Resources
6. Configuring and Managing Cluster Resources (Command Line)
6.1. crm Command Line Tool—Overview
6.2. Configuring Global Cluster Options
6.3. Configuring Cluster Resources
6.4. Managing Cluster Resources
7. Managing Cluster Resources with the Web Interface
7.1. Starting the HA Web Konsole and Logging In
7.2. Using HA Web Konsole
7.3. Troubleshooting
8. Adding or Modifying Resource Agents
8.1. STONITH Agents
8.2. Writing OCF Resource Agents
8.3. OCF Return Codes and Failure Recovery
9. Fencing and STONITH
9.1. Classes of Fencing
9.2. Node Level Fencing
9.3. STONITH Configuration
9.4. Monitoring Fencing Devices
9.5. Special Fencing Devices
9.6. For More Information
10. Load Balancing with Linux Virtual Server
10.1. Conceptual Overview
10.2. Configuring IP Load Balancing with YaST
10.3. Further Setup
10.4. For More Information
11. Network Device Bonding
11.1. Configuring Bonding Devices with YaST
11.2. For More Information
III. Storage and Data Replication
12. Oracle Cluster File System 2
12.1. Features and Benefits
12.2. OCFS2 Packages and Management Utilities
12.3. Configuring OCFS2 Services
12.4. Creating OCFS2 Volumes
12.5. Mounting OCFS2 Volumes
12.6. For More Information
13. Distributed Replicated Block Device (DRBD)
13.1. Conceptual Overview
13.2. Installing DRBD Services
13.3. Configuring the DRBD Service
13.4. Testing the DRBD Service
13.5. Tuning DRBD
13.6. Troubleshooting DRBD
13.7. For More Information
14. Cluster LVM
14.1. Conceptual Overview
14.2. Configuration of cLVM
14.3. Configuring Eligible LVM2 Devices Explicitly
14.4. For More Information
15. Storage Protection
15.1. Storage-based Fencing
15.2. Ensuring Exclusive Storage Activation
16. Samba Clustering
16.1. Conceptual Overview
16.2. Basic Configuration
16.3. Debugging and Testing Clustered Samba
16.4. For More Information
IV. Troubleshooting and Reference
17. Troubleshooting
17.1. Installation Problems
17.2. Debugging a HA Cluster
17.3. FAQs
17.4. Fore More Information
18. Cluster Management Tools
cibadmin — Provides direct access to the cluster configuration
crmadmin — controls the Cluster Resource Manager
crm_attribute — Allows node attributes and cluster options to be queried, modified and deleted
crm_diff — identify changes to the cluster configuration and apply patches to the configuration files
crm_failcount — Manage the counter recording each resource's failures
crm_master — Manage a master/slave resource's preference for being promoted on a given node
crm_mon — monitor the cluster's status
crm_node — Lists the members of a cluster
crm_resource — Perform tasks related to cluster resources
crm_shadow — Perform Configuration Changes in a Sandbox Before Updating The Live Cluster
crm_standby — manipulate a node's standby attribute to determine whether resources can be run on this node
crm_verify — check the CIB for consistency
19. HA OCF Agents
ocf:anything — Manages an arbitrary service
ocf:AoEtarget — Manages ATA-over-Ethernet (AoE) target exports
ocf:apache — Manages an Apache web server instance
ocf:AudibleAlarm — Emits audible beeps at a configurable interval
ocf:ClusterMon — Runs crm_mon in the background, recording the cluster status to an HTML file
ocf:CTDB — CTDB Resource Agent
ocf:db2 — Manages an IBM DB2 Universal Database instance
ocf:Delay — Waits for a defined timespan
ocf:drbd — Manages a DRBD resource (deprecated)
ocf:Dummy — Example stateless resource agent
ocf:eDir88 — Manages a Novell eDirectory directory server
ocf:Evmsd — Controls clustered EVMS volume management (deprecated)
ocf:EvmsSCC — Manages EVMS Shared Cluster Containers (SCCs) (deprecated)
ocf:Filesystem — Manages filesystem mounts
ocf:ICP — Manages an ICP Vortex clustered host drive
ocf:ids — Manages an Informix Dynamic Server (IDS) instance
ocf:IPaddr2 — Manages virtual IPv4 addresses (Linux specific version)
ocf:IPaddr — Manages virtual IPv4 addresses (portable version)
ocf:IPsrcaddr — Manages the preferred source address for outgoing IP packets
ocf:IPv6addr — Manages IPv6 aliases
ocf:iSCSILogicalUnit — Manages iSCSI Logical Units (LUs)
ocf:iSCSITarget — iSCSI target export agent
ocf:iscsi — Manages a local iSCSI initiator and its connections to iSCSI targets
ocf:ldirectord — Wrapper OCF Resource Agent for ldirectord
ocf:LinuxSCSI — Enables and disables SCSI devices through the kernel SCSI hot-plug subsystem (deprecated)
ocf:LVM — Controls the availability of an LVM Volume Group
ocf:MailTo — Notifies recipients by email in the event of resource takeover
ocf:ManageRAID — Manages RAID devices
ocf:ManageVE — Manages an OpenVZ Virtual Environment (VE)
ocf:mysql-proxy — Manages a MySQL Proxy daemon
ocf:mysql — Manages a MySQL database instance
ocf:nfsserver — Manages an NFS server
ocf:oracle — Manages an Oracle Database instance
ocf:oralsnr — Manages an Oracle TNS listener
ocf:pgsql — Manages a PostgreSQL database instance
ocf:pingd — Monitors connectivity to specific hosts or IP addresses ("ping nodes") (deprecated)
ocf:portblock — Block and unblocks access to TCP and UDP ports
ocf:proftpd — OCF Resource Agent compliant FTP script.
ocf:Pure-FTPd — Manages a Pure-FTPd FTP server instance
ocf:Raid1 — Manages a software RAID1 device on shared storage
ocf:Route — Manages network routes
ocf:rsyncd — Manages an rsync daemon
ocf:SAPDatabase — Manages any SAP database (based on Oracle, MaxDB, or DB2)
ocf:SAPInstance — Manages a SAP instance
ocf:scsi2reservation — scsi-2 reservation
ocf:SendArp — Broadcasts unsolicited ARP announcements
ocf:ServeRAID — Enables and disables shared ServeRAID merge groups
ocf:sfex — Manages exclusive acess to shared storage using Shared Disk File EXclusiveness (SF-EX)
ocf:SphinxSearchDaemon — Manages the Sphinx search daemon.
ocf:Squid — Manages a Squid proxy server instance
ocf:Stateful — Example stateful resource agent
ocf:SysInfo — Records various node attributes in the CIB
ocf:syslog-ng — Syslog-ng resource agent
ocf:tomcat — Manages a Tomcat servlet environment instance
ocf:VIPArip — Manages a virtual IP address through RIP2
ocf:VirtualDomain — Manages virtual domains through the libvirt virtualization framework
ocf:vmware — Manages VMWare Server 2.0 virtual machines
ocf:WAS6 — Manages a WebSphere Application Server 6 instance
ocf:WAS — Manages a WebSphere Application Server instance
ocf:WinPopup — Sends an SMB notification message to selected hosts
ocf:Xen — Manages Xen unprivileged domains (DomUs)
ocf:Xinetd — Manages an Xinetd service
V. Appendix
A. Example of Setting Up a Simple Testing Resource
A.1. Configuring a Resource with the GUI
A.2. Configuring a Resource Manually
B. Upgrading Your Cluster to the Latest Product Version
B.1. Upgrading from SLES 10 to SLEHA 11
B.2. Upgrading from SLEHA 11 to SLEHA 11 SP1
C. What's New?
C.1. Version 10 SP3 to Version 11
C.2. Version 11 to Version 11 SP1
D. GNU Licenses
D.1. GNU General Public License
D.2. GNU Free Documentation License
Terminology

List of Figures

1.1. Three-Server Cluster
1.2. Three-Server Cluster after One Server Fails
1.3. Typical Fibre Channel Cluster Configuration
1.4. Typical iSCSI Cluster Configuration
1.5. Typical Cluster Configuration Without Shared Storage
1.6. Architecture
4.1. Group Resource
5.1. Connecting to the Cluster
5.2. Pacemaker GUI - Main Window
5.3. Pacemaker GUI - Constraints
5.4. Example Configuration for Node Capacity
5.5. Example Configuration for Resource Capacity
5.6. Viewing a Resource's Failcount
5.7. Pacemaker GUI - Groups
5.8. Pacemaker GUI - Management
7.1. HA Web Konsole—Cluster Status
10.1. YaST IP Load Balancing—Global Parameters
10.2. YaST IP Load Balancing—Virtual Services
13.1. Position of DRBD within Linux
14.1. Setup of iSCSI with cLVM
16.1. Structure of a CTDB Cluster

List of Tables

4.1. Options for a Primitive Resource
4.2. Resource Operations
8.1. Failure Recovery Types
8.2. OCF Return Codes
12.1. OCFS2 Utilities
12.2. Important OCFS2 Parameters
13.1. DRBD RPM Packages
18.1. Overview of Internal Commands

List of Examples

4.1. Resource Group for a Web Server
4.2. Migration Threshold—Process Flow
4.3. Example Configuration for Load-Balanced Placing
9.1. Testing Configuration
9.2. Testing Configuration
9.3. Testing Configuration
9.4. Configuration of an IBM RSA Lights-out Device
9.5. Configuration of an UPS Fencing Device
10.1. Simple ldirectord Configuration
17.1. Stopped Resources