Fix signal propagation logic for Linux/Android exception handler.

The current code is relying on info->si_pid to figure out whether
the exception handler was triggered by a signal coming from the kernel
(that will re-trigger until the cause that triggered the signal has
been cleared) or from user-space e.g., kill -SIGNAL pid, which will NOT
automatically re-trigger in the next signal handler in the chain.
While the intentions are good (manually re-triggering user-space
signals), the current implementation mistakenly looks at the si_pid
field in siginfo_t, assuming that it is coming from the kernel if
si_pid == 0.
This is wrong. siginfo_t, in fact, is a union and si_pid is meaningful
only for userspace signals. For signals originated by the kernel,
instead, si_pid overlaps with si_addr (the faulting address).
As a matter of facts, the current implementation is mistakenly
re-triggering the signal using tgkill for most of the kernel-space
signals (unless the fault address is exactly 0x0).
This is not completelly correct for the case of SIGSEGV/SIGBUS. The
next handler in the chain will stil see the signal, but the |siginfo|
and the |context| arguments of the handler will be meaningless
(retriggering a signal with tgkill doesn't preserve them).
Therefore, if the next handler in the chain expects those arguments
to be set, it will fail.
Concretelly, this is causing problems to WebView. In some rare
circumstances, the next handler in the chain is a user-space runtime
which does SIGSEGV handling to implement speculative null pointer
managed exceptions (see as an example
http://www.mono-project.com/docs/advanced/runtime/docs/exception-handling/)

The fix herein proposed consists in using the si_code (see SI_FROMUSER
macros) to determine whether a signal is coming form the kernel
(and therefore just re-establish the next signal handler) or from
userspace (and use the tgkill logic).

Repro case:
This issue is visible in Chrome for Android with this simple repro case:
- Add a non-null pointer dereference in the codebase:
  *((volatile int*)0xbeef) = 42
Without this change: the next handler (the libc trap) prints:
  F/libc  (  595): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x487
  where 0x487 is actually the PID of the process (which is wrong).
With this change: the next handler prints:
  F/libc  (  595): Fatal signal 11 (SIGSEGV), code 1, fault addr 0xbeef
  which is the correct answer.

BUG=chromium:481937
R=mark@chromium.org

Review URL: https://breakpad.appspot.com/6844002

git-svn-id: http://google-breakpad.googlecode.com/svn/trunk@1454 4c0a9323-5329-0410-9bdc-e9ce6186880e
This commit is contained in:
primiano@chromium.org 2015-04-30 09:12:54 +00:00
parent aa75fa5d4e
commit 69b745aa74

View file

@ -365,7 +365,8 @@ void ExceptionHandler::SignalHandler(int sig, siginfo_t* info, void* uc) {
pthread_mutex_unlock(&g_handler_stack_mutex_); pthread_mutex_unlock(&g_handler_stack_mutex_);
if (info->si_pid || sig == SIGABRT) { // info->si_code <= 0 iff SI_FROMUSER (SI_FROMKERNEL otherwise).
if (info->si_code <= 0 || sig == SIGABRT) {
// This signal was triggered by somebody sending us the signal with kill(). // This signal was triggered by somebody sending us the signal with kill().
// In order to retrigger it, we have to queue a new signal by calling // In order to retrigger it, we have to queue a new signal by calling
// kill() ourselves. The special case (si_pid == 0 && sig == SIGABRT) is // kill() ourselves. The special case (si_pid == 0 && sig == SIGABRT) is