Contents
Abstract
Oracle Cluster File System 2 (OCFS2) is a general-purpose journaling file system that has been fully integrated since the Linux 2.6 kernel. OCFS2 allows you to store application binary files, data files, and databases on devices on shared storage. All nodes in a cluster have concurrent read and write access to the file system. A user-space control daemon, managed via a clone resource, provides the integration with the HA stack, in particular with OpenAIS/Corosync and the Distributed Lock Manager (DLM).
OCFS2 can be used for the following storage solutions for example:
General applications and workloads.
Xen image store in a cluster. Xen virtual machines and virtual servers can be stored on OCFS2 volumes that are mounted by cluster servers. This provides quick and easy portability of Xen virtual machines between servers.
LAMP (Linux, Apache, MySQL, and PHP | Perl | Python) stacks.
As a high-performance, symmetric and parallel cluster file system, OCFS2 supports the following functions:
An application's files are available to all nodes in the cluster. Users simply install it once on an OCFS2 volume in the cluster.
All nodes can concurrently read and write directly to storage via the standard file system interface, enabling easy management of applications that run across the cluster.
File access is coordinated through DLM. DLM control is good for most cases, but an application's design might limit scalability if it contends with the DLM to coordinate file access.
Storage backup functionality is available on all back-end storage. An image of the shared application files can be easily created, which can help provide effective disaster recovery.
OCFS2 also provides the following capabilities:
Metadata caching.
Metadata journaling.
Cross-node file data consistency.
Support for multiple-block sizes up to 4 KB, cluster sizes up to 1 MB, for a maximum volume size of 4 PB (Petabyte).
Support for up to 16 cluster nodes.
Asynchronous and direct I/O support for database files for improved database performance.
The OCFS2 kernel module (ocfs2) is installed
automatically in the High Availability Extension on SUSE® Linux Enterprise Server 11 SP1. To use OCFS2,
make sure the following packages are installed on each node in the
cluster: ocfs2-tools and the
matching ocfs2-kmp-* packages
for your kernel.
The ocfs2-tools package
provides the following utilities for management of OFS2 volumes. For
syntax information, see their man pages.
Table 12.1. OCFS2 Utilities
|
OCFS2 Utility |
Description |
|---|---|
|
debugfs.ocfs2 |
Examines the state of the OCFS file system for the purpose of debugging. |
|
fsck.ocfs2 |
Checks the file system for errors and optionally repairs errors. |
|
mkfs.ocfs2 |
Creates an OCFS2 file system on a device, usually a partition on a shared physical or logical disk. |
|
mounted.ocfs2 |
Detects and lists all OCFS2 volumes on a clustered system. Detects and lists all nodes on the system that have mounted an OCFS2 device or lists all OCFS2 devices. |
|
tunefs.ocfs2 |
Changes OCFS2 file system parameters, including the volume label, number of node slots, journal size for all node slots, and volume size. |
Before you can create OCFS2 volumes, you must configure the following resources as services in the cluster: DLM and O2CB. OCFS2 uses the cluster membership services from Pacemaker which run in user space. Therefore, DLM and O2CB need to be configured as clone resources that are present on each node in the cluster.
Procedure 12.1. Configuring DLM and O2CB Resources
The following procedure uses the crm shell to configure the cluster resources. Follow the steps below for one node in the cluster. Alternatively, you can also use the Heartbeat to configure the resources.
Open a terminal window and log in as root or equivalent.
To add the DLM (Distributed Lock Manager) as a resource:
Start the crm shell and create a new configuration from scratch:
crm cib new stack-glue
Create the DLM service and make it run on all machines in the cluster:
configure primitive dlm ocf:pacemaker:controld op monitor interval=120s clone dlm-clone dlm meta globally-unique=false interleave=true end
The dlm clone resource controls the distributed
lock manager service, and makes sure this service is started on all
nodes in the cluster.
Verify the changes you made before committing them to the CIB:
cib diff configure verify
Upload the configuration to the cluster and exit the shell:
cib commit stack-glue quit
To add the O2CB configuration:
Start the crm shell and create a new configuration from scratch:
crm cib new oracle-glue
Make the O2CB service start on every node in the cluster:
configure primitive o2cb ocf:ocfs2:o2cb op monitor interval=120s clone o2cb-clone o2cb meta globally-unique=false interleave=true
To make sure that the O2CB service is only started on nodes that also have a copy of the dlm service already running, add a collocational constraint:
colocation o2cb-with-dlm INFINITY: o2cb-clone dlm-clone order start-o2cb-after-dlm mandatory: dlm-clone o2cb-clone
Upload the configuration to the cluster and exit the shell:
cib commit oracle-glue quit
To configure a fencing device:
Start the crm shell and create a new configuration from scratch:
crm cib new fencing
Configure external/sdb as fencing device with
/dev/sdb2 being a dedicated partition on the
shared storage for heartbeating and fencing:
configure primitive sbd_stonith stonith:external/sbd \ meta target-role="Started"op monitor \ interval=15 timeout=15 start-delay=15 \ params sbd_device=/dev/sdb2
Upload the configuration to the cluster and exit the shell:
cib commit fencing quit
After you have configured DLM and O2CB as cluster resources as described in Section 12.3, “Configuring OCFS2 Services”, configure your system to use OCFS2 and create OCFs2 volumes.
![]() | OCFS2 Volumes for Application and Data Files |
|---|---|
We recommend that you generally store application files and data files on different OCFS2 volumes. If your application volumes and data volumes have different requirements for mounting, it is mandatory to store them on different volumes. | |
Before you begin, prepare the block devices you plan to use for your OCFS2 volumes. Leave the devices as free space.
Then create and format the OCFS2 volume with the mkfs.ocfs2 as described in Procedure 12.2, “Creating and Formatting an OCFS2 Volume”. The most important parameters for the command are listed in Table 12.2, “Important OCFS2 Parameters”. For more information and the command syntax, refer to the mkfs.ocfs2 man page.
Table 12.2. Important OCFS2 Parameters
|
OCFS2 Parameter |
Description and Recommendation |
|---|---|
|
Volume Label ( |
A descriptive name for the volume to make it uniquely identifiable when it is mounted on different nodes. Use the tunefs.ocfs2 utility to modify the label as needed. |
|
Cluster Size ( |
Cluster size is the smallest unit of space allocated to a file to hold the data. For the available options and recommendations, refer to the mkfs.ocfs2 man page. |
|
Number of Node Slots ( |
The maximum number of nodes that can concurrently mount a volume. For each of the nodes, OCFS2 creates separate system files, such as the journals, for each of the nodes. Nodes that access the volume can be a combination of little-endian architectures (such as x86, x86-64, and ia64) and big-endian architectures (such as ppc64 and s390x).
Node-specific files are referred to as local files. A node slot
number is appended to the local file. For example:
Set each volume's maximum number of node slots when you create it, according to how many nodes that you expect to concurrently mount the volume. Use the tunefs.ocfs2 utility to increase the number of node slots as needed. Note that the value cannot be decreased. |
|
Block Size ( |
The smallest unit of space addressable by the file system. Specify the block size when you create the volume. For the available options and recommendations, refer to the mkfs.ocfs2 man page. |
|
Specific Features On/Off ( |
A comma separated list of feature flags can be provided, and
For on overview of all available flags, refer to the mkfs.ocfs2 man page. |
|
Pre-Defined Features ( |
Allows you to choose from a set of pre-determined file system features. For the available options, refer to the mkfs.ocfs2 man page. |
If you do not specify any specific features when creating and formatting
the volume with mkfs.ocfs2, the following features are
enabled by default: backup-super,
sparse, inline-data,
unwritten, metaecc,
indexed-dirs, and xattr.
Procedure 12.2. Creating and Formatting an OCFS2 Volume
Execute the following steps only on one of the cluster nodes.
Open a terminal window and log in as root.
Check if the cluster is online with the command crm_mon.
Create and format the volume using the mkfs.ocfs2 utility. For information about the syntax for this command, refer to the mkfs.ocfs2 man page.
For example, to create a new OCFS2 file system on
/dev/sdb1 that supports up to 16 cluster nodes,
use the following command:
mkfs.ocfs2 -N 16 /dev/sdb1
You can either mount an OCFS2 volume manually or with the cluster manager, as described in Procedure 12.4, “Mounting an OCFS2 Volume with the Cluster Manager”.
Procedure 12.3. Manually Mounting an OCFS2 Volume
Open a terminal window and log in as root.
Check if the cluster is online with the command crm_mon.
Mount the volume from the command line, using the mount command.
![]() | Manually Mounted OCFS2 Devices |
|---|---|
If you mount the OCFS2 file system manually for testing purposes, make sure to unmount it again before starting to use it by means of OpenAIS. | |
Procedure 12.4. Mounting an OCFS2 Volume with the Cluster Manager
To mount an OCFS2 volume with the High Availability software, configure an ocf
File System resource in the
cluster. The following procedure uses the crm shell
to configure the cluster resources. Alternatively, you can also use the
Heartbeat to configure the resources.
Start the crm shell and create a new configuration from scratch:
crm
cib new filesystem
Configure Pacemaker to mount the OCFS2 file system on every node in the cluster:
configure
primitive fs ocf:heartbeat:Filesystem \
params device="/dev/sdb1" directory="/mnt/shared" fstype="ocfs2" \
op monitor interval=120s
clone fs-clone fs meta interleave="true" ordered="true"
To make sure that Pacemaker only starts the
fs clone resource on
nodes that also have a clone of the o2cb resource already running, add
a collocational constraint:
colocation fs-with-o2cb INFINITY: fs-clone o2cb-clone order start-fs-after-o2cb mandatory: o2cb-clone fs-clone
Upload the configuration to the CIB and exit the shell:
cib commit filesystem
quit
For more information about OCFS2, see the following links:
OCFS2 project home page at Oracle.
OCFS2 User's Guide, available from the project documentation home page.