linux_crash_dumping.md 5.93 KB
Newer Older
1 2 3 4 5 6
# Linux Crash Dumping

Official builds of Chrome support crash dumping and reporting using the Google
crash servers. This is a guide to how this works.

[TOC]
7 8 9

## Breakpad

10 11 12 13 14 15
Breakpad is an open source library which we use for crash reporting across all
three platforms (Linux, Mac and Windows). For Linux, a substantial amount of
work was required to support cross-process dumping. At the time of writing this
code is currently forked from the upstream breakpad repo. While this situation
remains, the forked code lives in `breakpad/linux`. The upstream repo is
mirrored in `breakpad/src`.
16

17 18
The code currently supports i386 only. Getting x86-64 to work should only be a
minor amount of work.
19 20 21

### Minidumps

22 23 24 25
Breakpad deals in a file format called 'minidumps'. This is a Microsoft format
and thus is defined by in-memory structures which are dumped, raw, to disk. The
main header file for this file format is
`breakpad/src/google_breakpad/common/minidump_format.h`.
26

27 28 29 30 31
At the top level, the minidump file format is a list of key-value pairs. Many of
the keys are defined by the minidump format and contain cross-platform
representations of stacks, threads etc. For Linux we also define a number of
custom keys containing `/proc/cpuinfo`, `lsb-release` etc. These are defined in
`breakpad/linux/minidump_format_linux.h`.
32 33 34

### Catching exceptions

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Exceptional conditions (such as invalid memory references, floating point
exceptions, etc) are signaled by synchronous signals to the thread which caused
them. Synchronous signals are always run on the thread which triggered them as
opposed to asynchronous signals which can be handled by any thread in a
thread-group which hasn't masked that signal.

All the signals that we wish to catch are synchronous except SIGABRT, and we can
always arrange to send SIGABRT to a specific thread. Thus, we find the crashing
thread by looking at the current thread in the signal handler.

The signal handlers run on a pre-allocated stack in case the crash was triggered
by a stack overflow.

Once we have started handling the signal, we have to assume that the address
space is compromised. In order not to fall prey to this and crash (again) in the
crash handler, we observe some rules:

1.  We don't enter the dynamic linker. This, observably, can trigger crashes in
    the crash handler. Unfortunately, entering the dynamic linker is very easy
    and can be triggered by calling a function from a shared library who's
    resolution hasn't been cached yet. Since we can't know which functions have
    been cached we avoid calling any of these functions with one exception:
    `memcpy`. Since the compiler can emit calls to `memcpy` we can't really
    avoid it.
1.  We don't allocate memory via malloc as the heap may be corrupt. Instead we
    use a custom allocator (in `breadpad/linux/memory.h`) which gets clean pages
    directly from the kernel.

In order to avoid calling into libc we have a couple of header files which wrap
the system calls (`linux_syscall_support.h`) and reimplement a tiny subset of
libc (`linux_libc_support.h`).
66 67 68

### Self dumping

69 70 71
The simple case occurs when the browser process crashes. Here we catch the
signal and `clone` a new process to perform the dumping. We have to use a new
process because a process cannot ptrace itself.
72

73 74
The dumping process then ptrace attaches to all the threads in the crashed
process and writes out a minidump to `/tmp`. This is generic breakpad code.
75

76 77 78 79 80
Then we reach the Chrome specific parts in `chrome/app/breakpad_linux.cc`. Here
we construct another temporary file and write a MIME wrapping of the crash dump
ready for uploading. We then fork off `wget` to upload the file. Based on Debian
popcorn, `wget` is very commonly installed (much more so than `libcurl`) and
`wget` handles the HTTPS gubbins for us.
81 82 83

### Renderer dumping

84 85 86 87 88 89 90 91
In the case of a crash in the renderer, we don't want the renderer handling the
crash dumping itself. In the future we will sandbox the renderer and allowing it
the authority to crash dump itself is too much.

Thus, we split the crash dumping in two parts: the gathering of information
which is done in process and the external dumping which is done out of process.
In the case above, the latter half was done in a `clone`d child. In this case,
the browser process handles it.
92

93 94 95 96
When renderers are forked off, they have a `UNIX DGRAM` socket in file
descriptor 4. The signal handler then calls into Chrome specific code
(`chrome/renderer/render_crash_handler_linux.cc`) when it would otherwise
`clone`. The Chrome specific code sends a datagram to the socket which contains:
97

98 99 100 101
*   Information which is only available to the signal handler (such as the
    `ucontext` structure).
*   A file descriptor to a pipe which it then blocks on reading from.
*   A `CREDENTIALS` structure giving its PID.
102

103 104
The kernel enforces that the renderer isn't lying in the `CREDENTIALS` structure
so it can't ask the browser to crash dump another process.
105

106 107 108
The browser then performs the ptrace and minidump writing which would otherwise
be performed in the `clone`d process and does the MIME wrapping the uploading as
normal.
109

110 111 112 113 114 115
Once the browser has finished getting information from the crashed renderer via
ptrace, it writes a byte to the file descriptor which was passed from the
renderer. The renderer than wakes up (because it was blocking on reading from
the other end) and rethrows the signal to itself. It then appears to crash
'normally' and other parts of the browser notice the abnormal termination and
display the sad tab.
116 117 118

## How to test Breakpad support in Chromium

119
*   Build Chromium as normal.
120 121 122 123 124 125 126
*   Run the browser with the environment variable
    [CHROME_HEADLESS=1](https://crbug.com/19663). This enables crash dumping but
    prevents crash dumps from being uploaded and deleted.

    ```shell
    env CHROME_HEADLESS=1 ./out/Debug/chrome-wrapper
    ```
127
*   Visit the special URL `chrome://crash` to trigger a crash in the renderer
128 129 130
    process.
*   A crash dump file should appear in the directory
    `~/.config/chromium/Crash Reports`.