You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Niklas Quarfot Nielsen (JIRA)" <ji...@apache.org> on 2013/12/07 00:55:35 UTC
[jira] [Created] (MESOS-873) Crash in os::killtree on Mavericks
Niklas Quarfot Nielsen created MESOS-873:
--------------------------------------------
Summary: Crash in os::killtree on Mavericks
Key: MESOS-873
URL: https://issues.apache.org/jira/browse/MESOS-873
Project: Mesos
Issue Type: Bug
Components: libprocess
Environment: Mac OS X Mavericks
Reporter: Niklas Quarfot Nielsen
This is a crash we experienced on a Mavericks installation. We haven't been able to reproduce it on other machines since, but managed to capture core files from the crashes.
Here is the stack trace from the crashing thread:
thread #2: tid = 0x0001, 0x0000000106816de5 mesos-executor`os::process(int) + 4133, stop reason = signal SIGSTOP
frame #0: 0x0000000106816de5 mesos-executor`os::process(int) + 4133
frame #1: 0x000000010681734c mesos-executor`os::processes() + 316
frame #2: 0x0000000106817752 mesos-executor`os::killtree(int, int, bool, bool) + 66
frame #3: 0x0000000106819748 mesos-executor`mesos::internal::CommandExecutorProcess::shutdown(mesos::ExecutorDriver*) + 200
frame #4: 0x000000010798be70
frame #5: 0x000000010798be60
frame #6: 0x0000000106b21c20 libmesos-0.16.0.dylib`process::Event::~Event() + 32
frame #7: 0x90c307894810c083
The stop condition is wrong (all threads in the core file is reported as stopped).
Here is a snippet of disassemble of the failing frame:
0x106817306: je 0x106817460 ; os::processes() + 592
0x10681730c: movq 16(%rsp), %rax
0x106817311: movq 296(%rsp), %rbx
0x106817319: leaq 16(%rax), %r14
0x10681731d: leaq 128(%rsp), %rax
0x106817325: addq $8, %r14
0x106817329: movq %rax, 24(%rsp)
0x10681732e: leaq 384(%rsp), %rbp
0x106817336: cmpq %rbx, %r14
0x106817339: je 0x106817530 ; os::processes() + 800
0x10681733f: movl 32(%rbx), %esi
0x106817342: movq 24(%rsp), %rdi
0x106817347: callq 0x10681d5a0 ; symbol stub for: os::process(int)
-> 0x10681734c: movl 128(%rsp), %esi
0x106817353: testl %esi, %esi
0x106817355: jne 0x1068173e0 ; os::processes() + 464
0x10681735b: movq 136(%rsp), %rsi
0x106817363: movq %rbp, %rdi
0x106817366: callq 0x10681d58e ; symbol stub for: os::Process::Process(os::Process const&)
0x10681736b: movl $112, %edi
0x106817370: callq 0x10681d9e4 ; symbol stub for: operator new(unsigned long)
We got to (while investigation the crash live in lldb) that using sysctl to get argument count probably was the reason for the crash, but still with no ways to validate this.
We can dig further into the core dump, if you know any suspected reasons for the failure / where to look further.
Also, since we haven't been able to reproduce the crash. If we don't hear of any others with the same problem, we can probably mark this as won't fix.
--
This message was sent by Atlassian JIRA
(v6.1#6144)