Especially when starting to experiment with Heartbeat, strange problems may occur that are not easy to understand. However, there are several utilities that may be used to take a closer look at the Heartbeat internal processes.
To check the current state of your
cluster, use the program crm_mon. This
displays the current DC as well as all of the nodes and resources
that are known to the current node.
For some reason, the connection between your nodes is broken. Most often, this is the result of a badly configured firewall. This also may be the reason for a split brain condition, where the cluster is partitioned.
Use the command crm_resource -L to learn about your current resources.
Try to run the resource agent manually. With
LSB, just run scriptname
start and scriptname
stop.
To check an OCF script, set the needed environment
variables first. For example, when testing the
IPaddr OCF script, you have to set the value
for the variable ip by setting
an environment variable that prefixes the name of the variable
with OCF_RESKEY_. For this example, run the command:
export OCF_RESKEY_ip=<your_ip_address> /usr/lib/ocf/resource.d/heartbeat/IPaddr validate-all /usr/lib/ocf/resource.d/heartbeat/IPaddr start /usr/lib/ocf/resource.d/heartbeat/IPaddr stop
If this fails, it is very likely that you missed some mandatory variable or just mistyped a parameter.
You may always add the -V parameter to your
commands. If you do that multiple times, the debug output becomes
very verbose.
If you know the IDs of your resources, which you can get with
crm_resource -L, remove a specific
resource with crm_resource -C -r resource
id.
For additional information about high availability on Linux and Heartbeat including configuring cluster resources and managing and customizing a Heartbeat cluster, see http://www.linux-ha.org.