Configuring Path Failover Policies and Priorities

In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device, and results in multiple block devices for single LUN. The Device Mapper Multipath service detects multiple paths with the same LUN ID, and creates a new multipath device with that ID. For example, a host with two HBAs attached to a storage controller with two ports via a single unzoned Fibre Channel switch sees four block devices: /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd. The Device Mapper Multipath service creates a single block device, /dev/mpath/mpath1 that reroutes I/O through those four underlying block devices.

This section describes how to specify policies for failover and configure priorities for the paths.

Configuring the Path Failover Policies

Use the multipath command with the -p option to set the path failover policy:

multipath devicename -p policy 

Replace policy with one of the following policy options:

Table 5.4. Group Policy Options for the multipath -p Command

Policy Option

Description

failover

One path per priority group.

multibus

All paths in one priority group.

group_by_serial

One priority group per detected serial number.

group_by_prio

One priority group per path priority value. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the /etc/multipath.conf configuration file.

group_by_node_name

One priority group per target node name. Target node names are fetched in the /sys/class/fc_transport/target*/node_name location.


Configuring Failover Priorities

You must manually enter the failover priorities for the device in the /etc/multipath.conf file. Examples for all settings and options can be found in the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

Understanding Priority Groups and Attributes

A priority group is a collection of paths that go to the same physical LUN. By default, I/O is distributed in a round-robin fashion across all paths in the group. The multipath command automatically creates priority groups for each LUN in the SAN based on the path_grouping_policy setting for that SAN. The multipath command multiplies the number of paths in a group by the group’s priority to determine which group is the primary. The group with the highest calculated value is the primary. When all paths in the primary group are failed, the priority group with the next highest value becomes active.

A path priority is an integer value assigned to a path. The higher the value, the higher is the priority. An external program is used to assign priorities for each path. For a given device, its paths with the same priorities belong to the same priority group.

Table 5.5. Multipath Attributes

Multipath Attribute

Description

Values

user_friendly_names

Specifies whether to use IDs or to use the /var/lib/multipath/bindings file to assign a persistent and unique alias to the multipath devices in the form of /dev/mapper/mpathN.

yes. Autogenerate user-friendly names as aliases for the multipath devices instead of the actual ID.

no. Default. Use the WWIDs shown in the /dev/disk/by-id/ location.

blacklist

Specifies the list of device names to ignore as non-multipathed devices, such as cciss, fd, hd, md, dm, sr, scd, st, ram, raw, loop.

For an example, see Section 5.4.5.4, “Blacklisting Non-Multipathed Devices in /etc/multipath.conf”.

blacklist_exceptions

Specifies the list of device names to treat as multipath devices even if they are included in the blacklist.

For an example, see the /usr/share/doc/packages/multipath-tools/multipath.conf.annotated file.

getuid_callout

The default program and argumentss to callout to obtain a unique path identifier. Should be specified with an absolute path.

/lib/udev/scsi_id -g -u -s

This is the default location and arguments.

path_grouping_policy

Specifies the path grouping policy for a multipath device hosted by a given controller.

failover. One path is assigned per priority group so that only one path at a time is used.

multibus. (Default) All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.

group_by_prio. One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.

group_by_serial. Paths are grouped by the SCSI target serial number (controller node WWN).

group_by_node_name. One priority group is assigned per target node name. Target node names are fetched in /sys/class/fc_transport/target*/node_name.

path_checker

Determines the state of the path.

directio. (Default in multipath-tools version 0.4.8 and later) Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in /var/log/messages.

readsector0. (Default in multipath-tools version 0.4.7 and earlier) Reads the first sector of the device. Logs failure messages in /var/log/messages.

tur. Issues a SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. The command does not fill up /var/log/messages on failure with messages.

Some SAN vendors provide custom path_checker options:

  • emc_clariionQueries the EMC Clariion EVPD page 0xC0 to determine the path state.

  • hp_swChecks the path state (Up, Down, or Ghost) for HP storage arrays with Active/Standby firmware.

  • rdacChecks the path state for the LSI/Engenio RDAC storage controller.

path_selector

Specifies the path-selector algorithm to use for load-balancing.

round-robin 0. (Default) The load-balancing algorithm used to balance traffic across all active paths in a priority group.

This is currently the only algorithm available.

pg_timeout

Specifies path group timeout handling.

NONE (internal default)

prio_callout

Specifies the program and arguments to use to determine the layout of the multipath map.

When queried by the multipath command, the specified mpath_prio_* callout program returns the priority for a given path in relation to the entire multipath layout.

When it is used with the path_grouping_policy of group_by_prio, all paths with the same priority are grouped into one multipath group. The group with the highest aggregate priority becomes the active group.

When all paths in a group fail, the group with the next highest aggregate priority becomes active. Additionally, a failover command (as determined by the hardware handler) might be send to the target.

The mpath_prio_* program can also be a custom script created by a vendor or administrator for a specified setup.

A %n in the command line expands to the device name in the /dev directory.

A %b expands to the device number in major:minor format in the /dev directory.

A %d expands to the device ID in the /dev/disk/by-id directory.

If devices are hot-pluggable, use the %d flag instead of %n. This addresses the short time that elapses between the time when devices are available and when udev creates the device nodes.

If no prio_callout attribute is used, all paths are equal. This is the default.

/bin/true. Use this value when the group_by_priority is not being used.

The prioritizer programs generate path priorities when queried by the multipath command. The program names must begin with mpath_prio_ and are named by the device type or balancing method used. Current prioritizer programs include the following:

/sbin/mpath_prio_alua %n. Generates path priorities based on the SCSI-3 ALUA settings.

/sbin/mpath_prio_balance_units. Generates the same priority for all paths.

/sbin/mpath_prio_emc %n. Generates the path priority for EMC arrays.

/sbin/mpath_prio_hds_modular %b. Generates the path priority for Hitachi HDS Modular storage arrays.

/sbin/mpath_prio_hp_sw %n. Generates the path priority for Compaq/HP controller in active/standby mode.

/sbin/mpath_prio_netapp %n. Generates the path priority for NetApp arrays.

/sbin/mpath_prio_random %n. Generates a random priority for each path.

/sbin/mpath_prio_rdac %n. Generates the path priority for LSI/Engenio RDAC controller.

/sbin/mpath_prio_tpc %n. You can optionally use a script created by a vendor or administrator that gets the priorities from a file where you specify priorities to use for each path.

/usr/local/sbin/mpath_prio_spec.sh %n. Provides the path of a user-created script that generates the priorities for multipathing based on information contained in a second data file. (This path and filename are provided as an example. Specify the location of your script instead.) The script can be created by a vendor or administrator. The script’s target file identifies each path for all multipathed devices and specifies a priority for each path. For an example, see Section 5.6.3, “Using a Script to Set Path Priorities”.

rr_min_io

Specifies the number of I/O transactions to route to a path before switching to the next path in the same path group, as determined by the specified algorithm in the path_selector setting.

n (>0).  Specify an integer value greater than 0.

1000.  Default.

rr_weight

Specifies the weighting method to use for paths.

uniform.  Default. All paths have the same round-robin weightings.

priorities.  Each path’s weighting is determined by the path’s priority times the rr_min_io setting.

no_path_retry

Specifies the behaviors to use on path failure.

n (> 0).  Specifies the number of retries until multipath stops the queuing and fails the path. Specify an integer value greater than 0.

fail.  Specified immediate failure (no queuing).

queue.  Never stop queuing (queue forever until the path comes alive).

failback

Specifies whether to monitor the failed path recovery, and indicates the timing for group failback after failed paths return to service.

When the failed path recovers, the path is added back into the multipath enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.

immediate.  When a path recovers, enable the path immediately.

n (> 0). When the path recovers, wait n seconds before enabling the path. Specify an integer value greater than 0.

manual.  (Default) The failed path is not monitored for recovery. The administrator runs the multipath command to update enabled paths and priority groups.


Configuring for Round-Robin Load Balancing

All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.

Configuring for Single Path Failover

A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.

Grouping I/O Paths for Round-Robin Load Balancing

Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.

Using a Script to Set Path Priorities

You can create a script that interacts with DM-MP to provide priorities for paths to the LUN when set as a resource for the prio_callout setting.

First, set up a text file that lists information about each device and the priority values you want to assign to each path. For example, name the file /usr/local/etc/primary-paths. Enter one line for each path in the following format:

host_wwpn target_wwpn scsi_id priority_value

Return a priority value for each path on the device. Make sure that the variable FILE_PRIMARY_PATHS resolves to a real file with appropriate data (host wwpn, target wwpn, scsi_id and priority value) for each device.

The contents of the primary-paths file for a single LUN with eight paths each might look like this:

0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563 2
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:2 sdd 3600a0b8000122c6d0000000345317524 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:3 sde 3600a0b80000fd6320000000245317593 2
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563 51
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:2 sdk 3600a0b8000122c6d0000000345317524 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:3 sdl 3600a0b80000fd6320000000245317593 51

To continue the example mentioned in Table 5.5, “Multipath Attributes”, create a script named /usr/local/sbin/path_prio.sh. You can use any path and filename. The script does the following:

  • On query from multipath, grep the device and its path from the /usr/local/etc/primary-paths file.

  • Return to multipath the priority value in the last column for that entry in the file.

Configuring ALUA

The mpath_prio_alua(8) command is used as a priority callout for the Linux multipath(8) command. It returns a number that is used by DM-MP to group SCSI devices with the same priority together. This path priority tool is based on ALUA (Asynchronous Logical Unit Access).

Syntax

mpath_prio_alua [-d directory] [-h] [-v] [-V] device [device...] 

Prerequisite

SCSI devices

Options

-d directory

Specifying the Linux directory path where the listed device node names can be found. The default directory is /dev. When used, specify the device node name only (such as sda) for the device or devices you want to manage.

-h

Displays help for this command, then exits.

-v

Turns on verbose output to display status in human-readable format. Output includes information about which port group the specified device is in and its current state.

-V

Displays the version number of this tool, then exits.

device

Specifies the SCSI device you want to manage. The device must be a SCSI device that supports the Report Target Port Groups (sg_rtpg(8)) command. Use one of the following formats for the device node name:

  • The full Linux directory path, such as /dev/sda. Do not use with the -d option.

  • The device node name only, such as sda. Specify the directory path using the -d option.

  • The major and minor number of the device separated by a colon (:) with no spaces, such as 8:0. This creates a temporary device node in the /dev directory with a name in the format of tmpdev-<major>:<minor>-<pid>. For example, /dev/tmpdev-8:0-<pid>.

Return Values

On success, returns a value of 0 and the priority value for the group. Table 5.6, “ALUA Priorities for Device Mapper Multipath” shows the priority values returned by the mpath_prio_alua command.

Table 5.6. ALUA Priorities for Device Mapper Multipath

Priority Value

Description

50

The device is in the active, optimized group.

10

The device is in an active but non-optimized group.

1

The device is in the standby group.

0

All other groups.


Values are widely spaced because of the way the multipath command handles them. It multiplies the number of paths in a group with the priority value for the group, then selects the group with the highest result. For example, if a non-optimized path group has six paths (6 x 10 = 60) and the optimized path group has a single path (1 x 50 = 50), the non-optimized group has the highest score, so multipath chooses the non-optimized group. Traffic to the device uses all six paths in the group in a round-robin fashion.

On failure, returns a value of 1 to 5 indicating the cause for the command’s failure. For information, see the man page for mpath_prio_alua.

Reporting Target Path Groups

Use the SCSI Report Target Port Groups (sg_rtpg(8)) command. For information, see the man page for sg_rtpg(8).


SUSE® Linux Enterprise Server Storage Administration Guide 10