Collecting Linux kernel crash core dumps with kdump

kdump is a Linux feature that enables automatically creating core dumps at the time the kernel crashes. The core dump includes the entire memory of the system at the time of the crash and can be used in diagnosing and troubleshooting kernel bugs. It is not a feature most users need.

Requirements

The two most recent core dumps are kept on the filesystem, each of which can be as large as the system memory. Therefore, in order to reliably collect core dumps, at least three times the total system memory should be available in /var/crash. Furthermore, if you choose to install the kernel debug packages when enabling kdump, as described in the next section, you should expect to need as much space as required for however many kernel versions you will keep installed.

Usage

Two new scripts have been introduced as part of the bonding package in 6.1 – /usr/share/bonding/enable-kdump and /usr/share/bonding/disable-kdump – which enable and disable kdump, respectively.

The enable-kdump script will install the required packages and configure the system so that kdump is enabled next time the kernel boots. It modifies /etc/default/kdump-tools and /etc/default/grub and will prompt you to confirm the changes to grub. It will also prompt you about installing the kernel debug package – which is required to be present to troubleshoot a core dump. The debug package is relatively large when installed; more than 3 GB.

After enabling or disabling kdump, a reboot is required before any changes take effect.

If the kernel crashes, a core dump is written under the /var/crash directory. Afterwards, the machine should reboot.

On bootup, all but the two most recent dumps are removed from /var/crash.

Verify that kexec is enabled and loaded

We can ensure that kdump is in use by reading the contents of /sys/kernel/kexec_crash_loaded

# cat /sys/kernel/kexec_crash_loaded
1

Provoking a kernel crash

This is not a necessary step for enabling kdump; this is only for testing and should not be attempted while customers are using this node.

We can prompt a kernel crash, which will cause the system to crash. If kdump is enabled, it will create a core dump before rebooting.

Note

In the event of a kernel crash, induced or otherwise, every service on the machine will stop and cease to work until the core dump is created and the machine reboots.

To crash the kernel; execute the following:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

If kdump is loaded, a core dump should be written to the /var/crash directory. We should be able to verify that after the machine reboots.

# ls -lh /var/crash/
total 8.0K
drwxr-xr-x 2 root root 4.0K May 18 00:46 201705180045
-rw-r--r-- 1 root root  306 May 18 00:46 kexec_cmd