You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/03/26 18:41:15 UTC

[jira] [Commented] (MESOS-300) Libprocess throws exception in SocketManager::next()

    [ https://issues.apache.org/jira/browse/MESOS-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614344#comment-13614344 ] 

Benjamin Mahler commented on MESOS-300:
---------------------------------------

Just saw this occur on a slave as well.

F0326 17:26:20.864554 52175 process.cpp:1950] Check failed: outgoing.count(s) > 0
*** Check failure stack trace: ***
    @     0x7f9b7e324f9d  google::LogMessage::Fail()
    @     0x7f9b7e32ac07  google::LogMessage::SendToLog()
    @     0x7f9b7e32684c  google::LogMessage::Flush()
    @     0x7f9b7e326ab6  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9b7e1e156c  process::SocketManager::next()
    @     0x7f9b7e1e24d8  process::send_data()
    @     0x7f9b7e369ee3  ev_invoke_pending
    @     0x7f9b7e36f318  ev_loop
    @     0x7f9b7e1decc0  process::serve()
    @     0x7f9b7d8fb73d  start_thread
    @     0x7f9b7c2dff6d  clone
                
> Libprocess throws exception in SocketManager::next()
> ----------------------------------------------------
>
>                 Key: MESOS-300
>                 URL: https://issues.apache.org/jira/browse/MESOS-300
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Benjamin Hindman
>
> Came across this while I was debugging an issue at Twitter.
> I1025 18:34:52.799145 56374 dominant_share_allocator.cpp:417] Performed allocation for 1004 slaves in 337.449 milliseconds
> F1025 18:34:53.633313 56380 process.cpp:1827] Check failed: outgoing.count(s) > 0 
> *** Check failure stack trace: ***
>     @     0x7f68b604f03d  google::LogMessage::Fail()
>     @     0x7f68b6054ca7  google::LogMessage::SendToLog()
>     @     0x7f68b60508ec  google::LogMessage::Flush()
>     @     0x7f68b6050b56  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f68b5f3679c  process::SocketManager::next()
>     @     0x7f68b5f37704  process::send_data()
>     @     0x7f68b60940e3  ev_invoke_pending
>     @     0x7f68b6099518  ev_loop
>     @     0x7f68b5f3332a  process::serve()
>     @     0x7f68b531e73d  start_thread
>     @     0x7f68b4908f6d  clone
> Bottle server starting up (using WSGIRefServer())...
> Listening on http://0.0.0.0:8080/
> Use Ctrl-C to quit.
> Grokking the code, there is a huge comment stating we cannot/shouldn't be doing this check. right above where this check happens. 
> Encoder* SocketManager::next(int s)
> {
>   HttpProxy* proxy = NULL; // Non-null if needs to be terminated.
>   synchronized (this) {
>     // We cannot assume 'sockets.count(s) > 0' here because it's
>     // possible that 's' has been removed with a a call to
>     // SocketManager::close. For example, it could be the case that a
>     // socket has gone to CLOSE_WAIT and the call to 'recv' in
>     // recv_data returned 0 causing SocketManager::close to get
>     // invoked. Later a call to 'send' or 'sendfile' (e.g., in
>     // send_data or send_file) can "succeed" (because the socket is
>     // not "closed" yet because there are still some Socket
>     // references, namely the reference being used in send_data or
>     // send_file!). However, when SocketManger::next is actually
>     // invoked we find out there there is no more data and thus stop
>     // sending.
>     // TODO(benh): Should we actually finish sending the data!?
>     if (sockets.count(s) > 0) {
>       CHECK(outgoing.count(s) > 0);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira