Contents
Abstract
When managing shared storage on a cluster, every node must be informed about changes that are done to the storage subsystem. The Linux Volume Manager 2 (LVM2), which is widely used to manage local storage, has been extended to support transparent management of volume groups across the whole cluster. Clustered volume groups can be managed using the same commands as local storage.
Clustered LVM is coordinated with different tools:
Coordinates disk access for cLVM.
Enables flexible distribution of one file system over several disks. LVM provides a virtual pool of disk space.
Coordinates access to the LVM2 metadata so every node knows about changes. cLVM does not coordinate access to the shared data itself; to enable cLVM to do so, you must configure OCFS2 or other cluster-aware applications on top of the cLVM-managed storage.
Depending on your scenario it is possible to create a RAID 1 device with cLVM with the following layers:
LVM. This is a very flexible solution if you want to increase or decrease your file system size, add more physical storage, or create snapshots of your file systems. This method is described in Section 14.2.1, “Scenario: cLVM With iSCSI on SANs”.
DRBD. This solution only provides RAID 0 (striping) and RAID 1 (mirroring). The last method is described in Section 14.2.2, “Scenario: cLVM With DRBD”.
MD Devices (Linux Software RAID or mdadm). Although this solution provides all RAID levels, it does not support clusters yet.
Make sure you have fulfilled the following prerequisites:
A shared storage device is available, such as provided by a Fibre Channel, FCoE, SCSI, iSCSI SAN, or DRBD.
In case of DRBD, both nodes must be primary (as described in the following procedure).
Check if the locking type of LVM2 is cluster-aware. The keyword
locking_type in
/etc/lvm/lvm.conf must contain the value 3 (should
be the default.) Copy the configuration to all nodes, if necessary.
![]() | Create Cluster Resources First |
|---|---|
First create your cluster resources, and then your LVM volumes. Otherwise it is impossible to remove the volumes later. | |
The following scenario uses two SAN boxes which export their iSCSI targets to several clients. The general idea is displayed in Figure 14.1, “Setup of iSCSI with cLVM”.
![]() | Data Loss |
|---|---|
The following procedures will destroy any data on your disks! | |
Configure only one SAN box first. Each SAN box has to export its own iSCSI target. Proceed as follows:
Procedure 14.1. Configuring iSCSI Targets (SAN)
Run YaST and click + to start the iSCSI Server module.
If you want to start the iSCSI target whenever your computer is booted, choose , otherwise choose .
If you have a firewall running, enable .
Switch to the tab. If you need authentication enable incoming or outgoing authentication or both. In this example, we select .
Add a new iSCSI target:
Switch to the tab.
Click .
Enter a target name. The name has to be formatted like this:
iqn.DATE.DOMAIN
If you want a more descriptive name, you can change it as long as your identifier is unique between your different targets.
Click .
Enter the device name in and use a .
Click two times.
Confirm the warning box with .
Open the configuration file /etc/iscsi/iscsi.conf
and change the parameter node.startup to
automatic.
Now set up your iSCSI initiators as follows:
Procedure 14.2. Configuring iSCSI Initiators
Run YaST and click +.
If you want to start the iSCSI initiator whenever your computer is booted, choose , otherwise set .
Change to the tab and click the button.
Add your IP address and your port of your iSCSI target (see Procedure 14.1, “Configuring iSCSI Targets (SAN)”). Normally, you can leave the port as it is and use the default value.
If you use authentication, insert the incoming and outgoing username and password, otherwise activate .
Select . The found connections are displayed in the list.
Proceed with .
Open a shell, log in as root.
Test if the iSCSI initiator has been started successfully:
iscsiadm -m discovery -t st -p 192.168.3.100 192.168.3.100:3260,1 iqn.2010-03.de.jupiter:san1
Establish a session:
iscsiadm -m node -l Logging in to [iface: default, target: iqn.2010-03.de.jupiter:san2, portal: 192.168.3.100,3260] Logging in to [iface: default, target: iqn.2010-03.de.venus:san1, portal: 192.168.3.101,3260] Login to [iface: default, target: iqn.2010-03.de.jupiter:san2, portal: 192.168.3.100,3260]: successful Login to [iface: default, target: iqn.2010-03.de.venus:san1, portal: 192.168.3.101,3260]: successful
See the device names with lsscsi:
... [4:0:0:2] disk IET ... 0 /dev/sdd [5:0:0:1] disk IET ... 0 /dev/sde
Look for entries with IET in their third column. In
this case, the devices are /dev/sdd and
/dev/sde.
Procedure 14.3. Creating a DLM Resource
Start a shell and log in as root.
Run crm configure.
Enter the following commands:
primitive dlm ocf:pacemaker:controld primitive clvm ocf:lvm2:clvmd \ params daemon_timeout="30" group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave="true" ordered="true"
Review your changes with show.
If everything is correct, enter commit and leave crm with exit.
Procedure 14.4. Creating the LVM Volume Groups
Open a root shell on one of the nodes you have run the iSCSI
initiator from
Procedure 14.2, “Configuring iSCSI Initiators”.
Prepare the physical volume for LVM with the command
pvcreate on the disks /dev/sdd
and /dev/sde:
pvcreate /dev/sdd pvcreate /dev/sde
Check if everything is correct with pvdisplay:
--- Physical volume --- PV Name /dev/sdd VG Name clustervg PV Size 509,88 MB / not usable 1,88 MB Allocatable yes PE Size (KByte) 4096 Total PE 127 Free PE 127 Allocated PE 0 PV UUID 52okH4-nv3z-2AUL-GhAN-8DAZ-GMtU-Xrn9Kh --- Physical volume --- PV Name /dev/sde VG Name clustervg PV Size 509,84 MB / not usable 1,84 MB Allocatable yes PE Size (KByte) 4096 Total PE 127 Free PE 127 Allocated PE 0 PV UUID Ouj3Xm-AI58-lxB1-mWm2-xn51-agM2-0UuHFC
Create the cluster-aware volume group on both disks:
vgcreate --clustered y clustervg /dev/sdd /dev/sde
Check if everything is correct with vgdisplay:
--- Volume group --- VG Name clustervg System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 1 VG Access read/write VG Status resizable Clustered yes Shared no MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 1016,00 MB PE Size 4,00 MB Total PE 254 Alloc PE / Size 0 / 0 Free PE / Size 254 / 1016,00 MB VG UUID UCyWw8-2jqV-enuT-KH4d-NXQI-JhH3-J24anD
Create logical volumes as needed:
lvcreate --name clusterlv --size 500M clustervg
After you have created the volumes and started your resources you should
have a new device named /dev/dm-0
. It is recommended to use a clustered file system on top of your LVM
resource, for example OCFS. For more information, see
Chapter 12, Oracle Cluster File System 2
The following scenarios can be used if you have data centers located in different parts of your city, country, or continent.
Procedure 14.5. Creating a Cluster-Aware Volume Group With DRBD
Create a primary/primary DRBD resource:
First, set up a DRBD device as primary/secondary as described in
Procedure 13.1, “Manually Configure DRBD”. Make sure the disk state is
up-to-date on both nodes. Check this with
cat /proc/drbd or with
rcdrbd status.
Add the following options to your configuration file (usually
something like /etc/drbd.d/r0.res):
resource r0 {
startup {
become-primary-on both;
}
net {
allow-two-primaries;
}
...
}Copy the changed configuration file to the other node, for example:
scp /etc/drbd.d/r0.res venus:/etc/drbd.d/Run the following commands on both nodes:
drbdadm disconnect r0 drbdadm connect r0 drbdadm primary r0
Check the status of your nodes:
cat /proc/drbd ... 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
Include the clvmd resource as a clone in the pacemaker configuration, and make it depend on the DLM clone resource. See Procedure 14.3, “Creating a DLM Resource” for detailed instructions. Before proceeding, confirm that these resources have started successfully on your cluster. You may use crm_mon or the GUI to check the running services.
Prepare the physical volume for LVM with the command
pvcreate. For example, on the device
/dev/drbd_r0 the command would look like this:
pvcreate /dev/drbd_r0
Create a cluster-aware volume group:
vgcreate --clustered y myclusterfs /dev/drbd_r0
Create logical volumes as needed. You may probably want to change the size of the logical volume. For example, create a 4 Gigabyte logcial volume with the following command:
lvcreate --name testlv -L 4G myclusterfs
To ensure that the volume group is activated cluster-wide, configure a LVM resource as follows:
primitive vg1 ocf:heartbeat:LVM \ params volgrpname="myclusterfs" clone vg1-clone vg1 \ meta interleave="true" ordered="true" colocation colo-vg1 inf: vg1-clone dlm-clvm-clone order order-vg1 inf: dlm-clvm-clone vg1-clone
If you want the volume group to only be activated exclusively on one node, use the following example; in this case, cLVM will protect all logical volumes within the VG from being activated on multiple nodes, as an additional measure of protection for non-clustered applications:
primitive vg1 ocf:heartbeat:LVM \ params volgrpname="myclusterfs" exclusive="yes" colocation colo-vg1 inf: vg1 dlm-clvm-clone order order-vg1 inf: dlm-clvm-clone vg1
The logical volumes within the VG are now available as file system mounts or raw usage. Ensure that services using them must have proper dependencies to collocate them with and order them after the VG has been activated.
After finishing these configuration steps, the LVM2 configuration can be done just like on any standalone workstation.
When several devices seemingly share the same physical volume signature (as can be the case for multipath devices or DRBD), it is recommended to explicitly configure the devices which LVM2 scans for PVs.
For example, if the command vgcreate uses the physical device instead of using the mirrored block device, DRBD will be confused which may result in a split brain condition for DRBD.
To deactivate a single device for LVM2, do the following:
Edit the file /etc/lvm/lvm.conf and search for the
line starting with filter.
The patterns there are handled as regular expressions. A leading “a” means to accept a device pattern to the scan, a leading “r” rejects the devices that follow the device pattern.
To remove a device named /dev/sdb1, add the
following expression to the filter rule:
"r|^/dev/sdb1$|"
The complete filter line will look like the following:
filter = [ "r|^/dev/sdb1$|", "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ]
A filter line, that accepts DRBD and MPIO devices but rejects all other devices would look like this:
filter = [ "a|/dev/drbd.*|", "a|/dev/.*/by-id/dm-uuid-mpath-.*|", "r/.*/" ]
Write the configuration file and copy it to all cluster nodes.
Thorough information is available from the pacemaker mailing list, available at http://www.clusterlabs.org/wiki/Help:Contents.
The official cLVM FAQ can be found at http://sources.redhat.com/cluster/wiki/FAQ/CLVM.