You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/09/23 20:01:07 UTC

[jira] [Closed] (MESOS-457) Killing the slave while forked can cause the forked slave to deadlock.

     [ https://issues.apache.org/jira/browse/MESOS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler closed MESOS-457.
---------------------------------

    Resolution: Cannot Reproduce
    
> Killing the slave while forked can cause the forked slave to deadlock.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-457
>                 URL: https://issues.apache.org/jira/browse/MESOS-457
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>
> This is related to MESOS-393.
> Was discovered on a CentOS Linux machine in production.
> A kill was issued to the slave while forked doing executor launching, and then the child slave remained running deadlocked in the following location:
> $ ps aux | grep mesos-slave
> bmahler  13626  0.0  0.0  61224   784 pts/1    S+   21:28   0:00 grep mesos-slave
> root     48629  0.0  2.1 1156480 535644 ?      S    Apr29   0:00 /usr/local/sbin/mesos-slave --port=5051
> $ gdb -p 48629
> (gdb) where
> #0  0x00007f7612e484c4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f7612e43e1a in _L_lock_1034 () from /lib64/libpthread.so.0
> #2  0x00007f7612e43cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f7611d8a990 in std::locale::locale() () from /usr/lib64/libstdc++.so.6
> #4  0x00007f7613694a4d in basic_ostringstream (this=0x80) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_ios.h:446
> #5  process::UPID::operator std::string (this=0x80) at ../../../third_party/libprocess/src/pid.cpp:59
> #6  0x00007f761359febe in mesos::internal::slave::CgroupsIsolator::launchExecutor (this=0x7f7600005690, slaveId=..., frameworkId=..., frameworkInfo=..., executorInfo=..., uuid=..., directory=..., resources=...)
>     at ../../src/slave/cgroups_isolator.cpp:578
> #7  0x00007f76134a2582 in operator()<mesos::internal::slave::Isolator*> (__functor=<value optimized out>, __a1=0x7f7600005690)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:214
> #8  std::tr1::_Function_handler<void ()(mesos::internal::slave::Isolator*),std::tr1::_Bind<std::tr1::_Mem_fn<void (mesos::internal::slave::Isolator::*)(const mesos::SlaveID&, const mesos::FrameworkID&, const mesos::FrameworkInfo&, const mesos::ExecutorInfo&, const UUID&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const mesos::internal::Resources&)> ()(std::tr1::_Placeholder<1>, mesos::SlaveID, mesos::FrameworkID, mesos::FrameworkInfo, mesos::ExecutorInfo, UUID, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, mesos::internal::Resources)> >::_M_invoke(const std::tr1::_Any_data &, mesos::internal::slave::Isolator *) (__functor=<value optimized out>, __a1=0x7f7600005690) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
> #9  0x00007f76134ab4b4 in operator()<process::ProcessBase*> (__functor=<value optimized out>, __a1=0x7f76000058b0) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45
> #10 std::tr1::_Function_handler<void ()(process::ProcessBase*),std::tr1::_Bind<void (* ()(std::tr1::_Placeholder<1>, std::tr1::shared_ptr<std::tr1::function<void ()(mesos::internal::slave::Isolator*)> >))(process::ProcessBase*, std::tr1::shared_ptr<std::tr1::function<void ()(mesos::internal::slave::Isolator*)> >)> >::_M_invoke(const std::tr1::_Any_data &, process::ProcessBase *) (__functor=<value optimized out>,
>     __a1=0x7f76000058b0) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
> #11 0x00007f76136a799a in process::ProcessManager::resume (this=0x19cc6c0, process=0x7f76000058b0) at ../../../third_party/libprocess/src/process.cpp:2432
> #12 0x00007f76136a89af in process::schedule (arg=<value optimized out>) at ../../../third_party/libprocess/src/process.cpp:1167
> #13 0x00007f7612e4173d in start_thread () from /lib64/libpthread.so.0
> #14 0x00007f7611825f6d in clone () from /lib64/libc.so.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira