Chapter 14. Oracle Cluster File System 2

Contents

14.1. O2CB Cluster Service
14.2. Disk Heartbeat
14.3. In-Memory File Systems
14.4. Management Utilities and Commands
14.5. OCFS2 Packages
14.6. Creating an OCFS2 Volume
14.7. Mounting an OCFS2 Volume
14.8. Additional Information

Oracle Cluster File System 2 (OCFS2) is a general-purpose journaling file system that is fully integrated in the Linux 2.6 kernel and later. OCFS2 allows you to store application binary files, data files, and databases on devices in a SAN. All nodes in a cluster have concurrent read and write access to the file system. A distributed lock manager helps prevent file access conflicts. OCFS2 supports up to 32,000 subdirectories and millions of files in each directory. The O2CB cluster service (a driver) runs on each node to manage the cluster.

OCFS2 was added to SUSE Linux Enterprise Server 9 to support Oracle Real Application Cluster (RAC) databases and its application files, Oracle Home. In SUSE Linux Enterprise Server 10 and later, OCFS2 can be used for any of the following storage solutions:

In addition, it is fully integrated with Heartbeat 2.

As a high-performance, symmetric, parallel cluster file system, OCFS2 supports the following functions:

OCFS2 also provides the following capabilities:

14.1. O2CB Cluster Service

The O2CB cluster service is a set of modules and in-memory file systems that are required to manage OCFS2 services and volumes. You can enable these modules to be loaded and mounted during system boot. For instructions, see Section 14.6.2, “Configuring OCFS2 Services”.

Table 14.1. O2CB Cluster Service Stack

Service

Description

Node Manager (NM)

Keeps track of all the nodes in the /etc/ocfs2/cluster.conf file

Heartbeat (HB)

Issues up/down notifications when nodes join or leave the cluster

TCP

Handles communications between the nodes with the TCP protocol

Distributed Lock Manager (DLM)

Keeps track of all locks and their owners and status

CONFIGFS

User space configuration file system. For details, see Section 14.3, “In-Memory File Systems”

DLMFS

User space interface to the kernel space DLM. For details, see Section 14.3, “In-Memory File Systems”


14.2. Disk Heartbeat

OCFS2 requires the nodes to be alive on the network. The O2CB cluster service sends regular keepalive packages to ensure that they are alive. It uses a private connection between nodes instead of the LAN to avoid network delays that might be interpreted as a node disappearing and thus, lead to a node’s self-fencing.

The OC2B cluster service communicates the node status via a disk heartbeat. The heartbeat system file resides on the Storage Area Network (SAN), where it is available to all nodes in the cluster. The block assignments in the file correspond sequentially to each node’s slot assignment.

Each node reads the file and writes to its assigned block in the file at two-second intervals. Changes to a node’s time stamp indicates the node is alive. A node is dead if it does not write to the heartbeat file for a specified number of sequential intervals, called the heartbeat threshold. Even if only a single node is alive, the O2CB cluster service must perform this check, because another node could be added dynamically at any time.

You can modify the disk heartbeat threshold in the /etc/sysconfig/o2cb file, using the O2CB_HEARTBEAT_THRESHOLD parameter. The wait time is calculated as follows:

(O2CB_HEARTBEAT_THRESHOLD value - 1) * 2 = threshold in seconds

For example, if the O2CB_HEARTBEAT_THRESHOLD value is set at the default value of 7, the wait time is 12 seconds ((7 - 1) * 2 = 12).

14.3. In-Memory File Systems

OCFS2 uses two in-memory file systems for communications:

Table 14.2. In-Memory File Systems Used by OCFS2

In-Memory File System

Description

Mount Point

configfs

Communicates the list of nodes in the cluster to the in-kernel node manager, and communicates the resource used for the heartbeat to the in-kernel heartbeat thread

/config

ocfs2_dlmfs

Communicates locking and unlocking for clusterwide locks on resources to the in-kernel distributed lock manager that keeps track of all locks and their owners and status

/dlm

14.4. Management Utilities and Commands

OCFS2 stores node-specific parameter files on the node. The cluster configuration file ( /etc/ocfs2/cluster.conf) resides on each node assigned to the cluster.

The ocfs2console utility is a GTK GUI-based interface for managing the configuration of the OCFS2 services in the cluster. Use this utility to set up and save the /etc/ocfs2/cluster.conf file to all member nodes of the cluster. In addition, you can use it to format, tune, mount, and umount OCFS2 volumes.

[Important]

The file browser column in the ocfs2console utility is prohibitively slow and inconsistent across the cluster. We recommend that you use the ls(1) command to list files instead.

Additional OCFS2 utilities are described in the following table. For information about syntax for these commands, see their man pages.

Table 14.3. OCFS2 Utilities

OCFS2 Utility

Description

debugfs.ocfs2

Examines the state of the OCFS file system for the purpose of debugging.

fsck.ocfs2

Checks the file system for errors and optionally repairs errors.

mkfs.ocfs2

Creates an OCFS2 file system on a device, usually a partition on a shared physical or logical disk. This tool requires the O2CB cluster service to be up.

mounted.ocfs2

Detects and lists all OCFS2 volumes on a clustered system. Detects and lists all nodes on the system that have mounted an OCFS2 device or lists all OCFS2 devices.

ocfs2cdsl

Creates a context-dependent symbolic link (CDSL) for a specified filename (file or directory) for a node. A CDSL filename has its own image for a specific node, but has a common name in the OCFS2.

tune.ocfs2

Changes OCFS2 file system parameters, including the volume label, number of node slots, journal size for all node slots, and volume size.


Use the following commands to manage O2CB services. For more information about the o2cb command syntax, see its man page.

Table 14.4. O2CB Commands

Command

Description

/etc/init.d/o2cb status

Reports whether the o2cb services are loaded and mounted

/etc/init.d/o2cb load

Loads the O2CB modules and in-memory file systems

/etc/init.d/o2cb online ocfs2

The cluster named ocfs2 gets online

At least one node in the cluster must be active for the cluster to be online.

/etc/init.d/o2cb offline ocfs2

The cluster named ocfs2 gets offline

/etc/init.d/o2cb unload

Unloads the O2CB modules and in-memory file systems

/etc/init.d/o2cb start ocfs2

If the cluster is set up to load on boot, starts the cluster named ocfs2 by loading o2cb and onlining the cluster

At least one node in the cluster must be active for the cluster to be online.

/etc/init.d/o2cb stop ocfs2

If the cluster is set up to load on boot, stops the cluster named ocfs2 by offlining the cluster and unloading the O2CB modules and in-memory file systems


14.5. OCFS2 Packages

The OCFS2 kernel module ( ocfs2) is installed automatically in SUSE Linux Enterprise Server 10 and later. To use OCFS2, use YaST (or the command line if you prefer) to install the ocfs2-tools and ocfs2console packages on each node in the cluster.

  1. Log in as the root user, then open the YaST Control Center.

  2. Select Software+Software Management.

  3. In the Search field, enter ocfs2.

    The software packages ocfs2-tools and ocfs2console should be listed in the right panel. If they are selected, the packages are already installed.

  4. If you need to install the packages, select them, then click Install and follow the on-screen instructions.

14.6. Creating an OCFS2 Volume

Follow the procedures in this section to configure your system to use OCFS2 and to create OCFS2 volumes.

14.6.1. Prerequisites

Before you begin, do the following:

  • Initialize, carve, or configure RAIDs (Redundant Array of Independent Disks) on the SAN disks, as needed, to prepare the devices you plan to use for your OCFS2 volumes. Leave the devices as free space.

    We recommend that you store application files and data files on different OCFS2 volumes, but it is only mandatory to do so if your application volumes and data volumes have different requirements for mounting. For example, the Oracle RAC database volume requires the datavolume and nointr mounting options, but the Oracle Home volume should never use these options.

  • Make sure that the ocfs2console, and ocfs2-tools packages are installed. Use YaST or command line methods to install them if they are not. For YaST instructions, see Section 14.5, “OCFS2 Packages”.

14.6.2. Configuring OCFS2 Services

Before you can create OCFS2 volumes, you must configure OCFS2 services. In the following procedure, you generate the /etc/ocfs2/cluster.conf file, save the cluster.conf file on all nodes, and create and start the O2CB cluster service ( o2cb).

Follow the procedure in this section for one node in the cluster.

  1. Open a terminal window and log in as the root user.

  2. If the o2cb cluster service is not already enabled, enter chkconfig --add o2cb.

    When you add a new service, chkconfig ensures that the service has either a start or a kill entry in every run level.

  3. If the ocfs2 service is not already enabled, enter chkconfig --add ocfs2.

  4. Configure the o2cb cluster service driver to load on boot.

    1. Enter /etc/init.d/o2cb configure

    2. At the Load O2CB driver on boot (y/n) [n] prompt, enter y (yes) to enable load on boot.

    3. At the Cluster to start on boot (Enter “none” to clear) [ocfs2] prompt, enter none. This choice presumes that you are setting up OCFS2 for the first time or resetting the service. You specify a cluster name in the next step when you set up the /etc/ocfs2/cluster.conf file.

  5. Use the ocfs2console utility to set up and save the /etc/ocfs2/cluster.conf file to all member nodes of the cluster.

    This file should be the same on all the nodes in the cluster. Use the following steps to set up the first node. Later, you can use the ocfs2console to add new nodes to the cluster dynamically and to propagate the modified cluster.conf file to all nodes.

    However, if you change other settings, such as the cluster name and IP address, you must restart the cluster for the changes to take effect, as described in Step 6.

    1. Open the ocfs2console GUI by entering ocfs2console.

    2. In the ocfs2console, select Cluster+Cluster Nodes.

      If cluster.conf is not present, the console will create one with a default cluster name of ocfs2. Modify the cluster name as desired.

    3. In the Node Configuration dialog box, click Add to open the Add Node dialog box.

    4. In the Add Node dialog box, specify the unique name of your primary node, a unique IP address (such as 192.168.1.1), and the port number (optional, default is 7777), then click OK.

      The ocfs2console console assigns node slot numbers sequentially from 0 to 254.

    5. In the Node Configuration dialog box, click Apply, then click Close to dismiss the Add Node dialog box.

    6. Click Cluster+Propagate Configuration to save the cluster.conf file to all nodes.

  6. If you need to restart the OCFS2 cluster for the changes to take effect, enter the following lines, waiting in between for the process to return a status of OK.

    /etc/init.d/o2cb stop
    /etc/init.d/o2cb start
    

14.6.3. Creating an OCFS2 Volume

Creating an OCFS2 file system and adding new nodes to the cluster should be performed on only one of the nodes in the cluster.

  1. Open a terminal window and log in as the root user.

  2. If the O2CB cluster service is offline, start it by entering the following command then wait for the process to return a status of OK.

    /etc/init.d/o2cb online ocfs2
    

    Replace ocfs2 with the actual cluster name of your OCFS2 cluster.

    The OCFS2 cluster must be online, because the format operation must first ensure that the volume is not mounted on any node in the cluster.

  3. Create and format the volume using one of the following methods:

    • In EVMSGUI, go to the Volumes page, select Make a file system+OCFS2, then specify the configuration settings.

    • Use the mkfs.ocfs2 utility. For information about the syntax for this command, refer to the mkfs.ocfs2 man page.

    • In the ocfs2console, click Tasks+Format, select a device in the Available Devices list that you want to use for your OCFS2 volume, specify the configuration settings for the volume, then click OK to format the volume.

    See the following table for recommended settings.

    OCFS2 Parameter

    Description and Recommendation

    Volume label

    A descriptive name for the volume to make it uniquely identifiable when it is mounted on different nodes.

    Use the tunefs.ocfs2 utility to modify the label as needed.

    Cluster size

    Cluster size is the smallest unit of space allocated to a file to hold the data.

    Options are 4, 8, 16, 32, 64, 128, 256, 512, and 1024 KB. Cluster size cannot be modified after the volume is formatted.

    Oracle recommends a cluster size of 128 KB or larger for database volumes. Oracle also recommends a cluster size of 32 or 64 KB for Oracle Home.

    Number of node slots

    The maximum number of nodes that can concurrently mount a volume. On mounting, OCFS2 creates separate system files, such as the journals, for each of the nodes. Nodes that access the volume can be a combination of little-endian architectures (such as x86, x86-64, and ia64) and big-endian architectures (such as ppc64 and s390x).

    Node-specific files are referred to as local files. A node slot number is appended to the local file. For example: journal:0000 belongs to whatever node is assigned to slot number 0.

    Set each volume’s maximum number of node slots when you create it, according to how many nodes that you expect to concurrently mount the volume. Use the tunefs.ocfs2 utility to increase the number of node slots as needed; the value cannot be decreased.

    Block size

    The smallest unit of space addressable by the file system. Specify the block size when you create the volume.

    Options are 512 bytes (not recommended), 1 KB, 2 KB, or 4 KB (recommended for most volumes). Block size cannot be modified after the volume is formatted.

14.7. Mounting an OCFS2 Volume

  1. Open a terminal window and log in as the root user.

  2. If the O2CB cluster service is offline, start it by entering the following command, then wait for the process to return a status of OK.

    /etc/init.d/o2cb online ocfs2
    

    Replace ocfs2 with the actual cluster name of your OCFS2 cluster.

    The OCFS2 cluster must be online, because the format operation must ensure that the volume is not mounted on any node in the cluster.

  3. Use one of the following methods to mount the volume.

    • In the ocfs2console, select a device in the Available Devices list and click Mount. Optionally, specify the directory mount point and mount options and click OK.

    • Mount the volume from the command line, using the mount command.

    • Mount the volume from the /etc/fstab file on system boot.

    Mounting an OCFS2 volume takes about 5 seconds, depending on how long it takes for the heartbeat thread to stabilize. On a successful mount, the device list in the ocfs2console shows the mount point along with the device.

    [Tip]Adding New Nodes

    When new nodes try to connect to the cluster, they are not allowed to join because the nodes have not added them to their connection list. To solve this issue, manually go to each node and issue the following command to update the respective connection list:

    o2cb_ctl -H -n ocfs2 -t cluster -a online=yes

    For information about mounting an OCFS2 volume using any of these methods, see the OCFS2 User Guide on the OCFS2 project at Oracle.

    When running Oracle RAC, make sure to use the datavolume and nointr mounting options for OCFS2 volumes that contain the Voting diskfile (CRS), Cluster registry (OCR), Data files, Redo logs, Archive logs, and Control files. Do not use these options when mounting the Oracle Home volume.

    Option

    Description

    datavolume
    

    Ensures that the Oracle processes open the files with the o_direct flag.

    nointr
    

    No interruptions. Ensures the IO is not interrupted by signals.

14.8. Additional Information

For information about using OCFS2, see the OCFS2 User Guide on the OCFS2 project at Oracle.