SUSE Linux Enterprise High Availability Extension

High Availability Guide

Publication Date 27 Jan 2012

AuthorsTanja Roth, Thomas Schraitle

Copyright © 2006–2012 Novell, Inc. and contributors. All rights reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled GNU Free Documentation License.

For Novell trademarks, see the Novell Trademark and Service Mark list http://www.novell.com/company/legal/trademarks/tmlist.html. All other third party trademarks are the property of their respective owners. A trademark symbol (®, ™ etc.) denotes a Novell trademark; an asterisk (*) denotes a third party trademark.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither Novell, Inc., SUSE LINUX Products GmbH, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.


Contents

About This Guide
1. Feedback
2. Documentation Conventions
3. About the Making of This Manual
I. Installation and Setup
1. Product Overview
1.1. Key Features
1.2. Benefits
1.3. Cluster Configurations: Storage
1.4. Architecture
2. System Requirements
2.1. Hardware Requirements
2.2. Software Requirements
2.3. Shared Disk System Requirements
2.4. Other Requirements
3. Installation and Basic Setup
3.1. Definition of Terms
3.2. Overview
3.3. Installation as Add-on
3.4. Automatic Cluster Setup (sleha-bootstrap)
3.5. Manual Cluster Setup (YaST)
3.6. Mass Deployment with AutoYaST
II. Configuration and Administration
4. Configuration and Administration Basics
4.1. Global Cluster Options
4.2. Cluster Resources
4.3. Resource Monitoring
4.4. Resource Constraints
4.5. For More Information
5. Configuring and Managing Cluster Resources (GUI)
5.1. Pacemaker GUI—Overview
5.2. Configuring Global Cluster Options
5.3. Configuring Cluster Resources
5.4. Managing Cluster Resources
6. Configuring and Managing Cluster Resources (Web Interface)
6.1. Hawk—Overview
6.2. Configuring Global Cluster Options
6.3. Configuring Cluster Resources
6.4. Managing Cluster Resources
6.5. Troubleshooting
7. Configuring and Managing Cluster Resources (Command Line)
7.1. crm Shell—Overview
7.2. Configuring Global Cluster Options
7.3. Configuring Cluster Resources
7.4. Managing Cluster Resources
7.5. Setting Passwords Independent of cib.xml
7.6. Retrieving History Information
7.7. For More Information
8. Adding or Modifying Resource Agents
8.1. STONITH Agents
8.2. Writing OCF Resource Agents
8.3. OCF Return Codes and Failure Recovery
9. Fencing and STONITH
9.1. Classes of Fencing
9.2. Node Level Fencing
9.3. STONITH Configuration
9.4. Monitoring Fencing Devices
9.5. Special Fencing Devices
9.6. Basic Recommendations
9.7. For More Information
10. Access Control Lists
10.1. Requirements and Prerequisites
10.2. The Basics of ACLs
10.3. Configuring ACLs with the Pacemaker GUI
10.4. Configuring ACLs with the crm Shell
10.5. For More Information
11. Network Device Bonding
11.1. Configuring Bonding Devices with YaST
11.2. For More Information
12. Load Balancing with Linux Virtual Server
12.1. Conceptual Overview
12.2. Configuring IP Load Balancing with YaST
12.3. Further Setup
12.4. For More Information
13. Multi-Site Clusters
13.1. Challenges for Multi-Site Clusters
13.2. Conceptual Overview
13.3. Requirements
13.4. Basic Setup
13.5. Managing Multi-Site Clusters
III. Storage and Data Replication
14. OCFS2
14.1. Features and Benefits
14.2. OCFS2 Packages and Management Utilities
14.3. Configuring OCFS2 Services and a STONITH Resource
14.4. Creating OCFS2 Volumes
14.5. Mounting OCFS2 Volumes
14.6. Using Quotas on OCFS2 File Systems
14.7. For More Information
15. Distributed Replicated Block Device (DRBD)
15.1. Conceptual Overview
15.2. Installing DRBD Services
15.3. Configuring the DRBD Service
15.4. Testing the DRBD Service
15.5. Tuning DRBD
15.6. Troubleshooting DRBD
15.7. For More Information
16. Cluster Logical Volume Manager (cLVM)
16.1. Conceptual Overview
16.2. Configuration of cLVM
16.3. Configuring Eligible LVM2 Devices Explicitly
16.4. For More Information
17. Storage Protection
17.1. Storage-based Fencing
17.2. Ensuring Exclusive Storage Activation
18. Samba Clustering
18.1. Conceptual Overview
18.2. Basic Configuration
18.3. Debugging and Testing Clustered Samba
18.4. Joining Active Directory Domains
18.5. For More Information
19. Disaster Recovery with ReaR
19.1. Terminology
19.2. Conceptual Overview
19.3. Preparing for the Worst Scenarios: Disaster Recovery Plans
19.4. Setting Up ReaR
19.5. Setting Up rear-SUSE with AutoYaST
19.6. For More Information
IV. Troubleshooting and Reference
20. Troubleshooting
20.1. Installation and First Steps
20.2. Logging
20.3. Resources
20.4. STONITH and Fencing
20.5. Miscellaneous
20.6. Fore More Information
21. HA OCF Agents
ocf:anything — Manages an arbitrary service
ocf:AoEtarget — Manages ATA-over-Ethernet (AoE) target exports
ocf:apache — Manages an Apache web server instance
ocf:AudibleAlarm — Emits audible beeps at a configurable interval
ocf:ClusterMon — Runs crm_mon in the background, recording the cluster status to an HTML file
ocf:conntrackd — This resource agent manages conntrackd
ocf:CTDB — CTDB Resource Agent
ocf:db2 — Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles as master/slave configuration. Multiple partitions are supported.
ocf:Delay — Waits for a defined timespan
ocf:drbd — Manages a DRBD resource (deprecated)
ocf:Dummy — Example stateless resource agent
ocf:eDir88 — Manages a Novell eDirectory directory server
ocf:ethmonitor — Monitors network interfaces
ocf:Evmsd — Controls clustered EVMS volume management (deprecated)
ocf:EvmsSCC — Manages EVMS Shared Cluster Containers (SCCs) (deprecated)
ocf:exportfs — Manages NFS exports
ocf:Filesystem — Manages filesystem mounts
ocf:fio — fio IO load generator
ocf:ICP — Manages an ICP Vortex clustered host drive
ocf:ids — Manages an Informix Dynamic Server (IDS) instance
ocf:IPaddr2 — Manages virtual IPv4 addresses (Linux specific version)
ocf:IPaddr — Manages virtual IPv4 addresses (portable version)
ocf:IPsrcaddr — Manages the preferred source address for outgoing IP packets
ocf:IPv6addr — Manages IPv6 aliases
ocf:iSCSILogicalUnit — Manages iSCSI Logical Units (LUs)
ocf:iSCSITarget — iSCSI target export agent
ocf:iscsi — Manages a local iSCSI initiator and its connections to iSCSI targets
ocf:jboss — Manages a JBoss application server instance
ocf:ldirectord — Wrapper OCF Resource Agent for ldirectord
ocf:LinuxSCSI — Enables and disables SCSI devices through the kernel SCSI hot-plug subsystem (deprecated)
ocf:LVM — Controls the availability of an LVM Volume Group
ocf:lxc — Manages LXC containers
ocf:MailTo — Notifies recipients by email in the event of resource takeover
ocf:ManageRAID — Manages RAID devices
ocf:ManageVE — Manages an OpenVZ Virtual Environment (VE)
ocf:mysql-proxy — Manages a MySQL Proxy daemon
ocf:mysql — Manages a MySQL database instance
ocf:named — Manages a named server
ocf:nfsserver — Manages an NFS server
ocf:nginx — Manages an Nginx web/proxy server instance
ocf:oracle — Manages an Oracle Database instance
ocf:oralsnr — Manages an Oracle TNS listener
ocf:pgsql — Manages a PostgreSQL database instance
ocf:pingd — Monitors connectivity to specific hosts or IP addresses ("ping nodes") (deprecated)
ocf:portblock — Block and unblocks access to TCP and UDP ports
ocf:postfix — Manages a highly available Postfix mail server instance
ocf:proftpd — OCF Resource Agent compliant FTP script.
ocf:Pure-FTPd — Manages a Pure-FTPd FTP server instance
ocf:Raid1 — Manages a software RAID1 device on shared storage
ocf:Route — Manages network routes
ocf:rsyncd — Manages an rsync daemon
ocf:rsyslog — rsyslog resource agent
ocf:SAPDatabase — Manages any SAP database (based on Oracle, MaxDB, or DB2)
ocf:scsi2reservation — scsi-2 reservation
ocf:SendArp — Broadcasts unsolicited ARP announcements
ocf:ServeRAID — Enables and disables shared ServeRAID merge groups
ocf:sfex — Manages exclusive access to shared storage using Shared Disk File EXclusiveness (SF-EX)
ocf:slapd — Manages a Stand-alone LDAP Daemon (slapd) instance
ocf:SphinxSearchDaemon — Manages the Sphinx search daemon.
ocf:Squid — Manages a Squid proxy server instance
ocf:Stateful — Example stateful resource agent
ocf:symlink — Manages a symbolic link
ocf:SysInfo — Records various node attributes in the CIB
ocf:syslog-ng — Syslog-ng resource agent
ocf:tomcat — Manages a Tomcat servlet environment instance
ocf:VIPArip — Manages a virtual IP address through RIP2
ocf:VirtualDomain — Manages virtual domains through the libvirt virtualization framework
ocf:vmware — Manages VMWare Server 2.0 virtual machines
ocf:WAS6 — Manages a WebSphere Application Server 6 instance
ocf:WAS — Manages a WebSphere Application Server instance
ocf:WinPopup — Sends an SMB notification message to selected hosts
ocf:Xen — Manages Xen unprivileged domains (DomUs)
ocf:Xinetd — Manages an Xinetd service
V. Appendix
A. Example of Setting Up a Simple Testing Resource
A.1. Configuring a Resource with the GUI
A.2. Configuring a Resource Manually
B. Example Configuration for OCFS2 and cLVM
C. Cluster Management Tools
D. Upgrading Your Cluster to the Latest Product Version
D.1. Upgrading from SLES 10 to SLE HA 11
D.2. Upgrading from SLE HA 11 to SLE HA 11 SP1
D.3. Upgrading from SLE HA 11 SP1 to SLE HA 11 SP2
E. What's New?
E.1. Version 10 SP3 to Version 11
E.2. Version 11 to Version 11 SP1
E.3. Version 11 SP1 to Version 11 SP2
Terminology
F. GNU Licenses
F.1. GNU General Public License
F.2. GNU Free Documentation License

List of Figures

1.1. Three-Server Cluster
1.2. Three-Server Cluster after One Server Fails
1.3. Typical Fibre Channel Cluster Configuration
1.4. Typical iSCSI Cluster Configuration
1.5. Typical Cluster Configuration Without Shared Storage
1.6. Architecture
3.1. YaST Cluster Module—Overview
3.2. YaST Cluster—Multicast Configuration
3.3. YaST Cluster—Unicast Configuration
3.4. YaST Cluster—Security
3.5. YaST Cluster—Services
3.6. YaST Cluster—Csync2
3.7. YaST Cluster—conntrackd
4.1. Group Resource
5.1. Connecting to the Cluster
5.2. Pacemaker GUI - Main Window
5.3. Pacemaker GUI - Constraints
5.4. Example Configuration for Node Capacity
5.5. Example Configuration for Resource Capacity
5.6. Viewing a Resource's Failcount
5.7. Pacemaker GUI - Groups
5.8. Pacemaker GUI - Management
6.1. Hawk—Cluster Status (Summary View)
6.2. Hawk—Global Cluster Properties
6.3. Hawk—Setup Wizard
6.4. Hawk—Primitive Resource
6.5. Hawk—Resource Template
6.6. Hawk—Location Constraint
6.7. Hawk—Colocation Constraint
6.8. Hawk—Viewing a Node's Capacity Values
6.9. Hawk—Resource Group
6.10. Hawk—Clone Resource
6.11. Hawk—History Report
6.12. Hawk History Report—Transition Graph
6.13. Hawk—Simulator
12.1. YaST IP Load Balancing—Global Parameters
12.2. YaST IP Load Balancing—Virtual Services
13.1. Example Scenario: A Two-Site Cluster (4 Nodes + Arbitrator)
15.1. Position of DRBD within Linux
16.1. Setup of iSCSI with cLVM
18.1. Structure of a CTDB Cluster

List of Tables

4.1. Options for a Primitive Resource
4.2. Resource Operations
8.1. Failure Recovery Types
8.2. OCF Return Codes
10.1. Types and XPath Expression for an Operator Role
14.1. OCFS2 Utilities
14.2. Important OCFS2 Parameters
15.1. DRBD RPM Packages

List of Examples

4.1. Resource Group for a Web Server
4.2. Migration Threshold—Process Flow
4.3. Example Configuration for Load-Balanced Placing
9.1. Testing Configuration
9.2. Testing Configuration
9.3. Testing Configuration
9.4. Configuration of an IBM RSA Lights-out Device
9.5. Configuration of an UPS Fencing Device
10.1. Excerpt of a Cluster Configuration in XML
12.1. Simple ldirectord Configuration
13.1. Example Booth Configuration File
20.1. Stopped Resources
B.1. Cluster Configuration for OCFS2 and cLVM