Kdump README

Prerequisites
=============

Be sure that you have installed the kexec-tools and kdump package. For ppc64,
install kernel-kdump.rpm, too. The version of the kernel-kdump RPM must match
the version of the running system kernel.


Overview
========

Kdump uses kexec to quickly boot to a recovery kernel whenever a dump of the
system kernel's memory needs to be taken (for example, when the system panics).
The system memory image is preserved across the reboot and is accessible to the
debug kernel. You can use common Linux commands, such as cp and scp, to copy
the memory image to a dump file on the local host, or across the network to a
remote system.

Kdump and kexec are currently supported on the i386, x86-64, ia64 and ppc64
architectures.

The system kernel reserves a small section of memory for the capture
kernel at boot time of the system kernel. This ensures that ongoing
Direct Memory Access (DMA) from the system kernel does not corrupt the
capture kernel. The "kexec -p" command loads the capture kernel into
this reserved memory area.

On i386/x86-64 machines, the first 640 KB of physical memory is needed to boot,
irrespective of where the kernel loads. Therefore, kexec preserves this region
immediately before rebooting into the recovery kernel.

All of the necessary information about the system kernel's core image is
encoded in the ELF format, and stored in a reserved area of memory before a
crash. The physical address of the start of the ELF header is passed to the
recovery kernel through the "elfcorehdr=" boot parameter.

In the capture kernel, you can access the memory image from the system kernel
in two ways:

 1. Through a /dev/oldmem device interface.
    A capture utility can read the device file and write out the memory in raw
    format. This is a raw dump of memory. Analysis and capture tools must be
    intelligent enough to determine where to look for the right information.

 2. Through /proc/vmcore.
    This exports the memory dump as an ELF format file that can be written out
    using any file copy command such as cp or scp. Further, you can use
    analysis tools such as the GNU Debugger (GDB) or Crash to debug the dump
    file. This method ensures that the dump pages are ordered correctly.


Setup of Kdump
==============

You can use the kdump YaST module to setup kdump. Make sure that the package
"yast2-kdump" has been installed, and then use the module in the YaST2 Control
Center (or directly with "yast2 kdump" or "yast kdump").

If you want to setup kdump manually, follow following instructions:

Be sure the prerequisite RPMs are installed.

To enable a crash dump, you need to add an option to the boot loader to specify
the size and offset of the recovery kernel memory area.

An example of this boot loader option is "crashkernel=64M@16M". The 64M shows
the reserved space for the Kdump recovery kernel, and the 16M is the address of
the reserved area.  On ia64, the kernel automatically calculates the start
address, you don't have to provide a @xxx offset.
 
You can add this option either with the YaST boot loader module, or by manually
editing the boot loader configuration file. 

The recommended values by architecture for the "crashkernel" option are:

        i386:   crashkernel=64M@16M
        x86-64: crashkernel=64M@16M
        ia64:   crashkernel=512M          (on small machines use 256M)
        ppc64:  crashkernel=128M@32M

NOTE: The memory sizes are only "recommended". It really depends on the amount
of memory and number of CPUs/nodes of the system how much memory will be needed
in the caputure kernel to boot the kdump system and to take the dump.
Therefore, if the kdump capture system doesn't come up, try to increase the
memory size. Powers of two (2^n) are *not* necessary but the "natural" choice
for memory sizes.

After setting the boot loader option, activate the Kdump init script, which is
not activated by default. To do this, use the YaST System Services (Runlevel)
module. Alternately, enable the service on the command line with the following
command: "/sbin/chkconfig kdump on".

** Warning ** You must activate kdump service permanently via YaST or
chkconfig like above.  Starting kdump service temporarily (e.g. "rckdump
start") doesn't suffice. It's because the system is once rebooted over kexec
to another state, and the temporary activation is abandoned at the kdump boot
stage.

After enabling the Kdump init script, reboot the system so that the Kdump
kernel image is loaded properly.

Test your Kdump setup by issuing the following commands as the root
user:
 
** Warning ** This procedure will crash your system. Shut down all applications
and ensure that no users are logged on before performing this test.

  # sync
  # echo u > /proc/sysrq-trigger       (remount file systems read-only to
                                        avoid recovery after reboot)
  # echo c > /proc/sysrq-trigger

After the system recovers, verify that a vmcore file was generated in the save
dump directory. By default the vmcore file is located in
/var/log/dump/<date-string>.

When a crash occurs, the kernel crash handler starts the second recovery kernel
that the Kdump init script loaded earlier, and reboots the system using the
reserved memory up to the $KDUMP_RUNLEVEL runlevel.

During the boot of the recovery kernel, the Kdump init script loads again, but
this time it dumps the core image for later analysis.

When a crash happens in a graphical environment, you will likely have no GUI in
the second kernel boot. If you used a VGA console, you might still have visual
output from the secondary kernel. The default behavior of the Kdump script is
to save the old vmcore image, and then reboot the system immediately. You can
adjust the behavior of the Kdump script through sysconfig variables described
later in this document.


The Default Dumper
==================

By default, the Kdump script saves the vmcore file to a unique sub-directory
consisting of $KDUMP_SAVEDIR and the date string, such as
/var/log/dump/2006-02-21-13:20/vmcore. This directory can be on the local
machine or on FTP, SSH, NFS or CIFS (see $KDUMP_SAVEDIR below).

If a local directory is used, the default dumper does some system checks before
copying the vmcore file. First, it checks the number of old dump directories
and removes them if there are more than $KDUMP_KEEP_OLD_DUMPS. Then, the dumper
checks the free disk space in the partition of the dump directory. If the free
space is less than the sum of the memory size and the value given in
$KDUMP_FREE_DISK_SIZE, then the dumper will not create a dump.

$KDUMP_RUNLEVEL specifies the runlevel of the Kdump (recovery) kernel boot.
When $KDUMP_IMMEDIATE_REBOOT is set to yes, then the init script automatically
reboots after saving the vmcore. By default, the dumper uses KDUMP_RUNLEVEL=1
and KDUMP_IMMEDIATE_REBOOT=yes, in order to reduce the possible risk of disk
corruption in the recovery kernel environment.

If you want Kdump to run more complex jobs than set by the default dumper
configuration, set the name of the appropriate command or script to be run via
$KDUMP_TRANSFER, and change $KDUMP_RUNLEVEL and $KDUMP_IMMEDIATE_REBOOT.

For example, setting $KDUMP_TRANSFER="scp /proc/vmcore remote:/dump" and
KDUMP_RUNLEVEL=3 will make Kdump act like a netdump. You can set
KDUMP_IMMEDIATE_REBOOT=no to prevent the immediate reboot. This could be useful
to check the system over the network, for example.

Note that the available memory size for the recovery kernel is limited.
Setting KDUMP_RUNLEVEL=5 (graphical login) is not recommended.


Initrd-based Dump Saving
========================

The problem with the procedure mentioned above is that your root file system
(or whatever partition your KDUMP_SAVEDIR is in) may be corrupted.  So the
script may not be able to mount the device and is not able to save your file to
disk.

For this, you can configure KDUMP_DUMPDEV to point to an unused partition that
is large enough -- i.e. larger than the system's main memory -- to hold the
dump. Before mounting the root file system, the init script writes the dump to
that device. After rebooting, the normal boot script saves the dump from that
device to KDUMP_SAVEDIR. Because the data was is saved to disk, you can safely
turn off the computer and/or repair the file system using some tool (for
example, you may need to boot from a CD which is no problem).

After you changed that value, you have to re-run mkinitrd on the kdump kernel,
or on all kernels.


Tuning parameters
=================

You can adjust the basic behavior of the Kdump script by editing the
/etc/sysconfig/kdump file. Edit the script values with the YaST runlevel System
Services editor, or manually edit the /etc/sysconfig/kdump file, and then
restart the kdump service.


Generic options
---------------

KDUMP_KERNELVER

        This is the kernel version string for the Kdump kernel; an example is
        "2.6.16-5-kdump". The init script will use a kernel named
        /boot/vmlinux-$KDUMP_KERNELVER. The kdump script is located in the
        /etc/sysconfig file.

        If you do not specify a version, then the init script will try to find a
        Kdump kernel with the same version number as the running kernel. Using
        the string "kdump" will default to the most recently installed Kdump
        kernel (suitable for x86, x86-64 and ppc64).  For ia64, keep this
        string empty to point the same running kernel.


KDUMP_COMMANDLINE

        This sets the command string to be passed to the Kdump kernel. This
        will usually match the contents of the grub kernel line. An example is
        KDUMP_COMMANDLINE="ro root=LABEL=/".

        If you do not give a command line, then the default will be taken from
        /proc/cmdline.


KDUMP_COMMANDLINE_APPEND

        Set this variable if you only want to _append_ values to the default
        command line string. The string gets also appended if KDUMP_COMMANDLINE
        is set.


KEXEC_OPTIONS

        You can use this to pass additional arguments to kexec. For i386 and
        x86-64, you likely need to pass "--args-linux" here.


KDUMP-RUNLEVEL

        This is the runlevel that the Kdump kernel boots to. The default is
        "1".  To enable network support in the Kdump recovery environment, set
        this to "3".


KDUMP_IMMEDIATE_REBOOT

        This option specifies whether to reboot immediately after saving the
        core in the Kdump kernel. This option is ignored when KDUMP_DUMPDEV is
        set to a non-empty string. The default is "yes".


KDUMP_TRANSFER

        This is an option to execute a script or command to process or transfer
        the dump image. It can read the dump image either through /proc/vmcore
        or /dev/oldmem. An empty string will use the default dumper.


Options for the Default Dumper
------------------------------

KDUMP_SAVEDIR

        This option specifies the path to the directory where the dumps are
        saved. This can be

          - a local file, for example "file:///var/log/dump" (or, deprecated,
            just "/var/log/dump")
          - a FTP server, for example "ftp://user:password@host/var/log/dump"
          - a SSH server, for example "ssh://user@host/var/log/dump" please
            create a user that needs no password or set up public key
            authorization for the root user of the system -- or you have to
            enter the password on the serial console as the VGA console may not
            work!
          - a NFS share, for example "nfs://server:/export:/var/log/dump"
          - a CIFS (SMB) share, for example
            "cifs://user:password@host:/share/var/log/dump"

        For the exact URLs, see kdump-url_parser(8) manual page. Or use the
        YaST2 kdump module to configure this if you're unsure.

        The default is "/var/log/dump". See also KDUMP_DUMPDEV if you don't
        want to save the dump at first on a raw device which helps if your root
        file system is corrupted.


KDUMP_DUMPLEVEL

        Determines the dump level. If KDUMP_DUMPLEVEL != 0, then makedumpfile
        is used to strip pages that may not be necessary for analysing. 0 means
        no stripping, and 31 is the maximum dump level, i.e. 0 produces the
        largest dump files and 31 the smallest.

        The following table from makedumpfile(8) shows what each dump level
        means:

                        dump | zero | cache|cache  | user | free
                       level | page | page |private| data | page
                      -------+------+------+-------+------+------
                           0 |      |      |       |      |
                           1 |  X   |      |       |      |
                           2 |      |  X   |       |      |
                           3 |  X   |  X   |       |      |
                           4 |      |  X   |  X    |      |
                           5 |  X   |  X   |  X    |      |
                           6 |      |  X   |  X    |      |
                           7 |  X   |  X   |  X    |      |
                           8 |      |      |       |  X   |
                           9 |  X   |      |       |  X   |
                          10 |      |  X   |       |  X   |
                          11 |  X   |  X   |       |  X   |
                          12 |      |  X   |  X    |  X   |
                          13 |  X   |  X   |  X    |  X   |
                          14 |      |  X   |  X    |  X   |
                          15 |  X   |  X   |  X    |  X   |
                          16 |      |      |       |      |  X
                          17 |  X   |      |       |      |  X
                          18 |      |  X   |       |      |  X
                          19 |  X   |  X   |       |      |  X
                          20 |      |  X   |  X    |      |  X
                          21 |  X   |  X   |  X    |      |  X
                          22 |      |  X   |  X    |      |  X
                          23 |  X   |  X   |  X    |      |  X
                          24 |      |      |       |  X   |  X
                          25 |  X   |      |       |  X   |  X
                          26 |      |  X   |       |  X   |  X
                          27 |  X   |  X   |       |  X   |  X
                          28 |      |  X   |  X    |  X   |  X
                          29 |  X   |  X   |  X    |  X   |  X
                          30 |      |  X   |  X    |  X   |  X
                          31 |  X   |  X   |  X    |  X   |  X


KDUMP_DUMPFORMAT

        This variable specifies the dump format.

        "ELF" has the advantage that it's a standard format and GDB can be used
        to analyze the dumps. The disadvantage is that the dump files are
        larger.

        "compressed" is the kdump compressed format (see makedumpfile(8)) that
        produces small dumps. However, only "crash" can analyse the dumps and
        you need makedumpfile to have installed (but you need it anyway if you
        set KDUMP_DUMPLEVEL != 0 before).


KDUMP_DUMPDEV

        Specifies the dump device that is used for saving the dump with the
        kdump kernel. The dump device normally is a disk partition. You don't
        need to specify a dump device here. Then the dump is written to
        KDUMP_SAVEDIR when booting from the kdump kernel.

        If KDUMP_DUMPDEV points to a device file, the dump is written to that
        device when running the kdump kernel. The advantage over writing the
        dump to disk immediately is that you don't have to mount the root file
        system (which may be corrupted!) just to write the dump. So if the root
        file system is corrupted, you have the chance to fix the file system
        manually and reboot the system without loosing the dump information. On
        the first normal boot which is able to successfully mount the root file
        system, the dump is saved to KDUMP_SAVEDIR as usual.

        ** Warning ** The KDUMP_DUMPDEV is overwritten by kdump, so don't use
        it for saving any data. Also don't use the currently used swap
        partition.


KDUMP_KEEP_OLD_DUMPS

        This option specifies how many previous dumps are kept. If the number
        of saved dump files exceeds this number, the dumper removes older
        dumps.  You can prevent automatic removal by setting this to "0"
        (zero).  Set KDUMP_KEEP_OLD_DUMPS to "-2" is you want to delete all old
        dumps before saving the new dump. The default value is "5".


KDUMP_FREE_DISK_SIZE

        This specifies the minimum free disk space in megabytes of the dump
        partition. If the free disk space is less than the sum of this value
        and the memory size, then the default dumper will not save the vmcore
        file in order to prevent disk corruption. Setting this option to "0"
        (zero) forces the dumper to dump without checking the size. The default
        value is "64".


KDUMP_VERBOSE

        Determines if kdump uses verbose output. This value is a bitmask:

          1: kdump command line is written to system log when executing
             /etc/init.d/kdump
          2: progress is written to stdout while dumping 
          4: kdump command line is written so standard output when executing
             /etc/init.d/kdump
          8: Debugging for kdump transfer script


Machine-specific Notes
======================

IA64:

On SGI SN2 machines, the kdump doesn't work when the VGA console
is active. To disable the VGA console execute following commands
in the EFI shell

         Shell> set NoVGA 1
         Shell> reset


Dump Triggering Methods
=======================

This section talks about the various ways, other than a Kernel Panic, in which
Kdump can be triggered.  These methods will enable the user to invoke Kdump in
cases where the system is experiencing a hard hang.

 1. AltSysRq C

    On i386 and x86-64 machines, Kdump can be triggered with the
    combination of the 'Alt','SysRq' and 'C' keyboard keys.  This method
    will work only on directly attached consoles, and not on remote
    consoles.  In cases where the machine is in a hung state with
    interrupts disabled, AltSysRq C cannot be used.  If any kind of
    terminal access is still possible, the same result may be achieved
    from the shell command line like so:

       # echo c > /proc/sysrq-trigger

    On PowerPC boxes also AltSysrq C can be used to initiate Kdump if a
    directly attached console is available.  In addition, Kdump can also
    be triggered via Hardware Management Console(HMC) using 'Ctrl', 'O'
    and 'C' keyboard keys.  Inorder to use the Sysrq method for dump
    triggering /proc/sys/kernel/sysrq needs to be enabled, which can be
    done as follows:

       # echo 1 > /proc/sys/kernel/sysrq

 2. Kernel OOPs

    If we want to generate a dump everytime the Kernel OOPses, we can
    achieve this by setting the 'Panic On OOPs' option as follows:

       # echo 1 > /proc/sys/kernel/panic_on_oops


 3. NMI(Non maskable interrupt) button

    In cases where the system is in a hung state, and is not accepting keyboard
    interrupts, using NMI button for triggering Kdump can be very useful.  NMI
    button is present on most of the newer x86 and x86_64 machines.  Please
    refer to the User guides/manuals to locate the button, though in most
    occasions it is not very well documented. In most cases it is hidden behind
    a small hole on the front or back panel of the machine.  You could use a
    toothpick or some other non-conducting probe to press the button.

    For example, on the IBM X series 366 machine, the NMI button is located
    behind a small hole on the bottom center of the rear panel.

    To enable this method of dump triggering using NMI button, you will need to
    set the 'unknown_nmi_panic' option as follows:

       # echo 1 > /proc/sys/kernel/unknown_nmi_panic

    When enabling unknown_nmi_panic please be careful not to enable Nmi
    Watchdog feature, else the system will panic.

 4. NMI WATCHDOG

    Nmi watchdog is a feature available in the x86 and x86_64 kernels which
    uses NMI to monitor whether a CPU has locked up.  On i386 machines, nmi
    watchdog can be enabled by passing nmi_watchdog=1 in the commandline of the
    kernel.  On x86_64 machines, this is enabled by default.  To verify if your
    system has been configured with nmi watchdog, look at the NMI entry in
    /proc/interrupts.  If the count is greater than zero then nmi watchdog has
    been confgured, else it is not.

    Please refer to Documentation/nmi_watchdog.txt in the kernel source for a
    more detailed description.

    Once this feature has been enabled in the kernel, any lockups will result
    in an OOPs message to be generated, followed by Kdump being triggered.
    This also requires 'Panic On OOPs' to be enabled as explained in method 2
    above.

    Please refrain from simultaneously enabling 'nmi_watchdog' and setting
    /proc/sys/kernel/unknown_nmi_panic, as this would result in a Kernel Panic
    from legitimate NMIs generated by the nmi_watchdog.


 5. PowerPC specific methods

    On IBM PowerPC machines, the following methods to issue a soft reset can be
    used to trigger Kdump.  On SLES10 systems, XMON(debugger) is turned off by
    default.  If the user wishes to enable XMON, he can do so by booting the
    kernel with 'xmon=on' option.  With XMON enabled, issuing a soft reset will
    drop the user to the XMON prompt, where typing a 'X' will trigger Kdump.
    If XMON is not enabled then a soft reset will directly trigger Kdump.

     a) HMC

        Hardware Management Console(HMC) available on Power4 and Power5
        machines allow partitions to be reset remotely.  This is specially
        useful in hang situations where the system is not accepting any
        keyboard inputs.

        Once you have HMC configured, the following steps will enable you to
        trigger Kdump via a soft reset:

        On Power4
          Using GUI
            * In the right pane, right click on the partition you wish to
              dump. 
            * Select "Operating System->Reset".
            * Select "Soft Reset".
            * Select "Yes".

          Using HMC Commandline
            # reset_partition -m <machine> -p <partition> -t soft

        On Power5
          Using GUI
            * In the right pane, right click on the partition you wish to
              dump. 
            * Select "Restart Partition".
            * Select "Dump".
            * Select "OK".

          Using HMC Commandline
            # chsysstate -m <managed system name> -n <lpar name> \
                -o dumprestart -r lpar

     b) Blade Management Console for Blade Center

        To initiate a dump operation, go to Power/Restart option under "Blade
        Tasks" in the Blade Management Console.  Select the corresponding blade
        for which you want to initate the dump and then click "Restart blade
        with NMI".  This will issue a soft reset.

     c) Control Panel function for a standalone Power5 machine

        A standalone machine is one which does not have any LPARs configured
        and also does not have a HMC available.  In such cases the Control
        Panel, usually located on the front panel of the machine (please refer
        to the User guide of the specific model for details) can be used for
        dump triggering in case the system has a hard hang.

        The control panel provides many functions for System Management
        purposes; Function 22 is meant for invoking a Partition dump.  This
        function is available only in the Manual operating mode.

        To check if the system is operating in manual mode,

         * Select function 1 on the panel
         * Press enter
         * Read the Operating mode from the panel display
         * If it is not 'M', then use function 2 to set it (see below)

        To set manual mode:

         * Select function 2 on the panel
         * Press enter
         * The current OS IPL type is displayed with a pointer
         * Press enter to move to the Operating mode
         * Use increment, decrement buttons to change the mode to M
         * Press enter

        To trigger the dump:

         * Select function 22 on the panel
         * Press enter
         * Select function 22 on the panel
         * Press enter

        Invoking function 22 twice will issue a soft reset to the machine.


Dump Analysis
=============

Dump analysis can be performed using GDB or the Crash utility. The Crash
utility is included in the crash RPM package. You must install a debug-info
kernel matching the version of the system kernel (of the system where the dump
was collected) on the system where the analysis is to be performed. The
debug-info kernel provides symbol and type information that Crash and GDB use.
You can find kernel debug information RPMs on the SUSE support Web site.
Alternately, you can build a debug-info kernel from source by specifying the
CONFIG_DEBUG_INFO kernel parameter.

Even if you install kernel-debuginfo, you need to uncompress the kernel image
first. This depends on the architecture on which your system is running. If you
don't know, just run "uname -i" to get the architecture.

On i386, x86_64, s390 and s390x, you have to unpack the kernel image:

   $ gunzip /boot/vmlinux-<version>.gz

On IA64, the default kernel image is already a gzip'ed vmlinux image. Run

   $ zcat /boot/vmlinuz-<version> > /boot/vmlinux-<version>

On ppc64, you don't have do to anything as there the bootloader already loads
the vmlinux image.

The symbol information in the debug-info kernel may differ from the running
kernel, therefor; when running crash against a vmcore you should specify both
the System.map file and the debug-info kernel.  For example, to run crash
against a vmcore use the following command line:

    $ crash /boot/System.map-version /boot/vmlinux-version vmcore

Where:
  /boot/System.map-<version> -- The map file matching the kernel being
                                analyzed.
  /boot/vmlinux-<version>    -- The matching kernel.
  vmcore                     -- The crash dump.


GDB Helper Script
=================

The GDB-kdump script is provided to simplify use of GDB on dump images. The
usage is "gdb-kdump [vmcore]".

The argument is the vmcore dump image to analyze. If you do not give an
argument, then the latest dump image will be taken. The script starts GDB with
the vmlinux of the currently running kernel. The script assumes that the
vmlinux file is at /boot/vmlinux-$kernel. If the script finds only a
gzip-compressed file, the file is automatically uncompressed.

Note that you will need to supply kernel-versionnumber-debuginfo, with debug
symbols. GDB-kdump also reads some useful macros for the Kdump image,
originally provided in /usr/src/linux/Documentation/kdump, at startup. The
following macros then become available: bttnobp, btt, btpid, trapinfo, and
dmesg. See the help topic of each command in GDB for details.

