You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2018/05/08 16:37:00 UTC

[jira] [Comment Edited] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections

    [ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467645#comment-16467645 ] 

Alan Conway edited comment on PROTON-1842 at 5/8/18 4:36 PM:
-------------------------------------------------------------

The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind.

This looks consistent with your theory, in particular a mutex being destroyed concurrently with being unlocked during shutdown. One thread locks, sees everything is ready to finalize and destroys the connection state while the second thread is blocked on the mutex - it gets released when the first thread unlocks before pthread_destroy but explodes when it tries to unlock after the destroy.


was (Author: aconway):
The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind.

> [c] Dispatch/Proton crashes when opening/closing connections
> ------------------------------------------------------------
>
>                 Key: PROTON-1842
>                 URL: https://issues.apache.org/jira/browse/PROTON-1842
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.22.0
>            Reporter: Chuck Rolke
>            Priority: Major
>         Attachments: helloworld.cpp, race.tsan, race.vg
>
>
> Using proton cpp example code that is modified to open and close connections by the thousands in the main loop and having the event loop short circuit any messaging with:
> {{  void on_connection_open(proton::connection& c) {}}
> {{      c.close();}}
> {{  }}}
> and then directing this client example to a dispatch router 1.1.0. Eventually (after 100,000 to 1,000,000 connection open/closes) the router crashes with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: wake_pop_front: Assertion `p->wakes_in_progress' failed.}}
> and with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}}
> This issue seems to happen only with qpid-dispatch accepting the open/close event stream. Proton cpp example _server_direct_ and c example _direct_ work properly with the same open/close event stream mounting into the 10s of millions of connections.
> A core dump backtrace with the PCONNECTION_TIMER failure reads as:
> {{(gdb) bt}}
> {{#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}}
> {{#1  0x00007f795c712c41 in __GI_abort () at abort.c:79}}
> {{#2  0x00007f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }}
> {{    file=file@entry=0x7f795d72de98 "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }}
> {{    function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") at assert.c:92}}
> {{#3  0x00007f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }}
> {{    function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") at assert.c:101}}
> {{#4  0x00007f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}}
> {{#5  0x00007f795d72d30e in pn_proactor_wait (p=0x26b7310) at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}}
> {{#6  0x00007f795dbe89ad in thread_run (arg=0x26be750) at /home/chug/git/qpid-dispatch/src/server.c:946}}
> {{#7  0x00007f795d50e50b in start_thread (arg=0x7f794f486700) at pthread_create.c:465}}
> {{#8  0x00007f795c7d216f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org