You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Scott Smith <sc...@gmail.com> on 2012/05/08 08:40:17 UTC

segfault in libprocess (slave)

I've encountered another segfault in the slave.  This time, nothing
unusual was happening.  Single framework / single user.  Four slaves,
one master, framework run from master.

version:
svn Revision: 1334534 + proposed fix for MESOS-190:
https://reviews.apache.org/r/5057/diff/2/#index_header

log messages:
I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
for framework 201205080535222558218-5050-29475-0004
I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
status update for task 8:863:0 of framework
201205080535222558218-5050-29475-0004
F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
sockets.count(s) > 0

stack trace:
#0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
src/utilities.cc:145
#3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at src/logging.cc:1256
#4  0x00007f0aedd6152f in google::LogMessage::SendToLog (this=0x7f0ae8a71c60)
    at src/logging.cc:1216
#5  0x00007f0aedd5e99b in google::LogMessage::Flush (this=0x7f0ae8a71c60)
    at src/logging.cc:1088
#6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at src/logging.cc:1777
#7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
#8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
#9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at ev.c:1971
#10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240, flags=<optimized out>)
    at ev.c:2333
#11 0x00007f0aedc8f30d in process::serve(void*) ()
   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
#12 0x00007f0aed17ee9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

-- 
        Scott

Re: segfault in libprocess (slave)

Posted by Scott Smith <sc...@gmail.com>.
I've run today with a similar patch and it (along with the MESOS-190
fix) addresses my segfault issues.  Before I would get 5+ per day;
today has been core file free!

On Wed, May 9, 2012 at 2:47 PM, Benjamin Hindman <be...@eecs.berkeley.edu> wrote:
> I've committed a fix in r1336417. Please let me know if this fixes the
> problem or if more needs to be done. Thank you!
>
>
> On Wed, May 9, 2012 at 1:46 PM, Benjamin Hindman <be...@eecs.berkeley.edu>wrote:
>
>> Yes, this looks like it should be the case. :(
>>
>> I'll fix this bug ASAP. Thanks for reporting!
>>
>>
>>
>> On Wed, May 9, 2012 at 8:56 AM, Scott Smith <sc...@gmail.com> wrote:
>>
>>> I've had numerous other segfaults in libprocess, mostly in
>>> std::map/rbtree code.  Is it possible that SocketManager::accepted is
>>> missing a synchronized(this) {} block?
>>>
>>> from process.cpp:
>>>
>>> Socket SocketManager::accepted(int s)
>>> {
>>>  return sockets[s] = Socket(s);
>>> }
>>>
>>> On Mon, May 7, 2012 at 11:40 PM, Scott Smith <sc...@gmail.com>
>>> wrote:
>>> > I've encountered another segfault in the slave.  This time, nothing
>>> > unusual was happening.  Single framework / single user.  Four slaves,
>>> > one master, framework run from master.
>>> >
>>> > version:
>>> > svn Revision: 1334534 + proposed fix for MESOS-190:
>>> > https://reviews.apache.org/r/5057/diff/2/#index_header
>>> >
>>> > log messages:
>>> > I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
>>> > for framework 201205080535222558218-5050-29475-0004
>>> > I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
>>> > status update for task 8:863:0 of framework
>>> > 201205080535222558218-5050-29475-0004
>>> > F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
>>> > sockets.count(s) > 0
>>> >
>>> > stack trace:
>>> > #0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
>>> > src/utilities.cc:145
>>> > #3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at
>>> src/logging.cc:1256
>>> > #4  0x00007f0aedd6152f in google::LogMessage::SendToLog
>>> (this=0x7f0ae8a71c60)
>>> >    at src/logging.cc:1216
>>> > #5  0x00007f0aedd5e99b in google::LogMessage::Flush
>>> (this=0x7f0ae8a71c60)
>>> >    at src/logging.cc:1088
>>> > #6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
>>> >    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at
>>> src/logging.cc:1777
>>> > #7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at
>>> ev.c:1971
>>> > #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240,
>>> flags=<optimized out>)
>>> >    at ev.c:2333
>>> > #11 0x00007f0aedc8f30d in process::serve(void*) ()
>>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>>> > #12 0x00007f0aed17ee9a in start_thread () from
>>> > /lib/x86_64-linux-gnu/libpthread.so.0
>>> > #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> > #14 0x0000000000000000 in ?? ()
>>> >
>>> > --
>>> >         Scott
>>>
>>>
>>>
>>> --
>>>         Scott
>>>
>>
>>



-- 
        Scott

Re: segfault in libprocess (slave)

Posted by Benjamin Hindman <be...@eecs.berkeley.edu>.
I've committed a fix in r1336417. Please let me know if this fixes the
problem or if more needs to be done. Thank you!


On Wed, May 9, 2012 at 1:46 PM, Benjamin Hindman <be...@eecs.berkeley.edu>wrote:

> Yes, this looks like it should be the case. :(
>
> I'll fix this bug ASAP. Thanks for reporting!
>
>
>
> On Wed, May 9, 2012 at 8:56 AM, Scott Smith <sc...@gmail.com> wrote:
>
>> I've had numerous other segfaults in libprocess, mostly in
>> std::map/rbtree code.  Is it possible that SocketManager::accepted is
>> missing a synchronized(this) {} block?
>>
>> from process.cpp:
>>
>> Socket SocketManager::accepted(int s)
>> {
>>  return sockets[s] = Socket(s);
>> }
>>
>> On Mon, May 7, 2012 at 11:40 PM, Scott Smith <sc...@gmail.com>
>> wrote:
>> > I've encountered another segfault in the slave.  This time, nothing
>> > unusual was happening.  Single framework / single user.  Four slaves,
>> > one master, framework run from master.
>> >
>> > version:
>> > svn Revision: 1334534 + proposed fix for MESOS-190:
>> > https://reviews.apache.org/r/5057/diff/2/#index_header
>> >
>> > log messages:
>> > I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
>> > for framework 201205080535222558218-5050-29475-0004
>> > I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
>> > status update for task 8:863:0 of framework
>> > 201205080535222558218-5050-29475-0004
>> > F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
>> > sockets.count(s) > 0
>> >
>> > stack trace:
>> > #0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>> > #1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
>> > #2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
>> > src/utilities.cc:145
>> > #3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at
>> src/logging.cc:1256
>> > #4  0x00007f0aedd6152f in google::LogMessage::SendToLog
>> (this=0x7f0ae8a71c60)
>> >    at src/logging.cc:1216
>> > #5  0x00007f0aedd5e99b in google::LogMessage::Flush
>> (this=0x7f0ae8a71c60)
>> >    at src/logging.cc:1088
>> > #6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
>> >    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at
>> src/logging.cc:1777
>> > #7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>> > #8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>> > #9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at
>> ev.c:1971
>> > #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240,
>> flags=<optimized out>)
>> >    at ev.c:2333
>> > #11 0x00007f0aedc8f30d in process::serve(void*) ()
>> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
>> > #12 0x00007f0aed17ee9a in start_thread () from
>> > /lib/x86_64-linux-gnu/libpthread.so.0
>> > #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> > #14 0x0000000000000000 in ?? ()
>> >
>> > --
>> >         Scott
>>
>>
>>
>> --
>>         Scott
>>
>
>

Re: segfault in libprocess (slave)

Posted by Benjamin Hindman <be...@eecs.berkeley.edu>.
Yes, this looks like it should be the case. :(

I'll fix this bug ASAP. Thanks for reporting!



On Wed, May 9, 2012 at 8:56 AM, Scott Smith <sc...@gmail.com> wrote:

> I've had numerous other segfaults in libprocess, mostly in
> std::map/rbtree code.  Is it possible that SocketManager::accepted is
> missing a synchronized(this) {} block?
>
> from process.cpp:
>
> Socket SocketManager::accepted(int s)
> {
>  return sockets[s] = Socket(s);
> }
>
> On Mon, May 7, 2012 at 11:40 PM, Scott Smith <sc...@gmail.com>
> wrote:
> > I've encountered another segfault in the slave.  This time, nothing
> > unusual was happening.  Single framework / single user.  Four slaves,
> > one master, framework run from master.
> >
> > version:
> > svn Revision: 1334534 + proposed fix for MESOS-190:
> > https://reviews.apache.org/r/5057/diff/2/#index_header
> >
> > log messages:
> > I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
> > for framework 201205080535222558218-5050-29475-0004
> > I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
> > status update for task 8:863:0 of framework
> > 201205080535222558218-5050-29475-0004
> > F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
> > sockets.count(s) > 0
> >
> > stack trace:
> > #0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
> > src/utilities.cc:145
> > #3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at
> src/logging.cc:1256
> > #4  0x00007f0aedd6152f in google::LogMessage::SendToLog
> (this=0x7f0ae8a71c60)
> >    at src/logging.cc:1216
> > #5  0x00007f0aedd5e99b in google::LogMessage::Flush (this=0x7f0ae8a71c60)
> >    at src/logging.cc:1088
> > #6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
> >    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at src/logging.cc:1777
> > #7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> > #8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> > #9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at
> ev.c:1971
> > #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240, flags=<optimized
> out>)
> >    at ev.c:2333
> > #11 0x00007f0aedc8f30d in process::serve(void*) ()
> >   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> > #12 0x00007f0aed17ee9a in start_thread () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > #14 0x0000000000000000 in ?? ()
> >
> > --
> >         Scott
>
>
>
> --
>         Scott
>

Re: segfault in libprocess (slave)

Posted by Scott Smith <sc...@gmail.com>.
I've had numerous other segfaults in libprocess, mostly in
std::map/rbtree code.  Is it possible that SocketManager::accepted is
missing a synchronized(this) {} block?

from process.cpp:

Socket SocketManager::accepted(int s)
{
  return sockets[s] = Socket(s);
}

On Mon, May 7, 2012 at 11:40 PM, Scott Smith <sc...@gmail.com> wrote:
> I've encountered another segfault in the slave.  This time, nothing
> unusual was happening.  Single framework / single user.  Four slaves,
> one master, framework run from master.
>
> version:
> svn Revision: 1334534 + proposed fix for MESOS-190:
> https://reviews.apache.org/r/5057/diff/2/#index_header
>
> log messages:
> I0508 06:35:21.458798   828 slave.cpp:447] Got assigned task 8:864:0
> for framework 201205080535222558218-5050-29475-0004
> I0508 06:35:21.459225   829 slave.cpp:689] Got acknowledgement of
> status update for task 8:863:0 of framework
> 201205080535222558218-5050-29475-0004
> F0508 06:35:21.459432   832 process.cpp:1772] Check failed:
> sockets.count(s) > 0
>
> stack trace:
> #0  0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at
> src/utilities.cc:145
> #3  0x00007f0aedd5ed9d in google::LogMessage::Fail () at src/logging.cc:1256
> #4  0x00007f0aedd6152f in google::LogMessage::SendToLog (this=0x7f0ae8a71c60)
>    at src/logging.cc:1216
> #5  0x00007f0aedd5e99b in google::LogMessage::Flush (this=0x7f0ae8a71c60)
>    at src/logging.cc:1088
> #6  0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal (
>    this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at src/logging.cc:1777
> #7  0x00007f0aedc93a55 in process::SocketManager::next(int) ()
>   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> #8  0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) ()
>   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> #9  0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at ev.c:1971
> #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240, flags=<optimized out>)
>    at ev.c:2333
> #11 0x00007f0aedc8f30d in process::serve(void*) ()
>   from /home/ubuntu/cr/lib/libmesos-0.9.0.so
> #12 0x00007f0aed17ee9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #14 0x0000000000000000 in ?? ()
>
> --
>         Scott



-- 
        Scott