You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (Jira)" <ji...@apache.org> on 2019/12/19 22:20:00 UTC

[jira] [Commented] (KUDU-3030) Crash in tcmalloc stack unwinder

    [ https://issues.apache.org/jira/browse/KUDU-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000454#comment-17000454 ] 

Todd Lipcon commented on KUDU-3030:
-----------------------------------

I investigated this a bit. We do already install libunwind, and use that for things like '/stacks' and glog. However, it seems like tcmalloc has the following behavior:

- the autoconf script tries to check whether frame pointers are omitted by default (as they usually are on x86)
- it also checks if libunwind is present, and if so, compiles a libunwind-based stack walker
- at runtime, if both are present, it will prefer libunwind on systems where it has detected that frame pointers are omitted.

So, in theory, since we do install libunwind before building tcmalloc in thirdparty, we should be selecting libunwind by default. However, we set CXXFLAGS to '-fno-omit-frame-pointer' in our thirdparty build, which actually affects the {{configure}} script as well. So, when it tried to check whether frame pointers were omitted by default, it decided that they were _not_ omitted, and thus configured itself to prefer the fp-based unwinder.

A couple ways we can fix this:
(1) stop compiling tcmalloc with -fno-omit-frame-pointer. This should get it to prefer libunwind.
(2) add some capability in tcmalloc's configuration to force it to use libunwind even when built with no frame pointers of its own.
(3) at runtime, it seems like we could set TCMALLOC_STACKTRACE_METHOD=libunwind early at startup, and it would prefer libunwind.

If we find that the libunwind-based unwinder is too slow for use in heap sampling use case, we could also try to patch tcmalloc's FP unwinder to be more safe. One approach is to call write() on each address before reading it, since write() will return -EFAULT instead of crashing if the address is bad. Another approach would be to set a threadlocal while we're in the stack trace routine, and if we catch a SEGV with this threadlocal set, we could ignore it and abort the stack tracing.


> Crash in tcmalloc stack unwinder
> --------------------------------
>
>                 Key: KUDU-3030
>                 URL: https://issues.apache.org/jira/browse/KUDU-3030
>             Project: Kudu
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.11.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> We recently saw a crash where the tcmalloc heap profiler was trying to unwind the stack, and ended up accessing invalid memory. The issue here is that tcmalloc is relying on frame pointers for heap unwinding, but this particular stack trace was going through libstdc++, which was installed on the system and doesn't have frame pointers. "usually" this works OK, but when we get unlucky, we can crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)