Disk Heartbeat

OCFS2 requires the nodes to be alive on the network. The O2CB cluster service sends regular keepalive packages to ensure that they are alive. It uses a private connection between nodes instead of the LAN to avoid network delays that might be interpreted as a node disappearing and thus, lead to a node’s self-fencing.

The OC2B cluster service communicates the node status via a disk heartbeat. The heartbeat system file resides on the Storage Area Network (SAN), where it is available to all nodes in the cluster. The block assignments in the file correspond sequentially to each node’s slot assignment.

Each node reads the file and writes to its assigned block in the file at two-second intervals. Changes to a node’s time stamp indicates the node is alive. A node is dead if it does not write to the heartbeat file for a specified number of sequential intervals, called the heartbeat threshold. Even if only a single node is alive, the O2CB cluster service must perform this check, because another node could be added dynamically at any time.

You can modify the disk heartbeat threshold in the /etc/sysconfig/o2cb file, using the O2CB_HEARTBEAT_THRESHOLD parameter. The wait time is calculated as follows:

(O2CB_HEARTBEAT_THRESHOLD value - 1) * 2 = threshold in seconds

For example, if the O2CB_HEARTBEAT_THRESHOLD value is set at the default value of 7, the wait time is 12 seconds ((7 - 1) * 2 = 12).