You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Marco Massenzio (JIRA)" <ji...@apache.org> on 2015/04/30 02:42:06 UTC
[jira] [Updated] (MESOS-2451) mesos c++ zookeeper code hangs from
api operation from within watcher of CHANGE event
[ https://issues.apache.org/jira/browse/MESOS-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marco Massenzio updated MESOS-2451:
-----------------------------------
Assignee: (was: Benjamin Hindman)
> mesos c++ zookeeper code hangs from api operation from within watcher of CHANGE event
> -------------------------------------------------------------------------------------
>
> Key: MESOS-2451
> URL: https://issues.apache.org/jira/browse/MESOS-2451
> Project: Mesos
> Issue Type: Bug
> Components: c++ api
> Affects Versions: 0.22.0
> Environment: red hat linux 6.5
> Reporter: craig bordelon
> Attachments: Makefile, bug.cpp, bug0.cpp, log.h
>
>
> We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears to hang (two threads stuck in indefinite pthread condition waits) on a test case that as best we can tell is mesos issue and not issue with underlying apache zookeeper C binding.
> (that is we tried same type case using apache zookeeper C binding directly and saw no issues.)
> This happens with a properly running zookeeper (standalone is sufficient).
> Heres how we hung it:
> We issue a mesos zk set via
> int ZooKeeper::set ( const std::string & path,
> const std::string & data,
> int version
> )
> then inside a Watcher we process on CHANGED event to issue a mesos zk get on
> the same path via
> int ZooKeeper::get ( const std::string & path,
> bool watch,
> std::string * result,
> Stat * stat
> )
> we end up with two threads in the process both in pthread_cond_waits
> #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
> at ../../../3rdparty/libprocess/src/gate.hpp:82
> #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
> at ../../../3rdparty/libprocess/src/process.cpp:2476
> #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
> at ../../../3rdparty/libprocess/src/process.cpp:2958
> #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0, duration=...)
> at ../../../3rdparty/libprocess/src/latch.cpp:49
> #5 0x00007f66649452cc in process::Future<int>::await (this=0x7fffa0fd9040,
> duration=...)
> at ../../3rdparty/libprocess/include/process/future.hpp:1156
> #6 0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040)
> at ../../3rdparty/libprocess/include/process/future.hpp:1167
> #7 0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0, path="/craig/mo", data=
> ...
> and
> #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
> at ../../../3rdparty/libprocess/src/gate.hpp:82
> #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
> at ../../../3rdparty/libprocess/src/process.cpp:2476
> #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
> at ../../../3rdparty/libprocess/src/process.cpp:2958
> #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00,
> duration=...)
> at ../../../3rdparty/libprocess/src/latch.cpp:49
> #5 0x00007f66649452cc in process::Future<int>::await (this=0x7f66595fb6f0,
> duration=...)
> at ../../3rdparty/libprocess/include/process/future.hpp:1156
> #6 0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0)
> at ../../3rdparty/libprocess/include/process/future.hpp:1167
> #7 0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0, path="/craig/mo",
> watch=false,
> ....
> We of course have a separate "enhancement" suggestion that the mesos C++ zookeeper api use timed waits and not block indefinitely for responses.
> But this case we think the mesos code itself is blocking on itself and not handling the responses.
> craig
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)