You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Ken Giusti (Jira)" <ji...@apache.org> on 2021/11/17 14:34:00 UTC

[jira] [Commented] (PROTON-2466) raw connection posts wake events after disconnect event is handled

    [ https://issues.apache.org/jira/browse/PROTON-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445250#comment-17445250 ] 

Ken Giusti commented on PROTON-2466:
------------------------------------

This is a difficult issue to reproduce.  In my experience it can take a few hours and the resulting log files are huge.

To reproduce:
 # check out head of the qdrouter 1.18.x branch
 # back out the pointer clear patch that prevents the crash from occurring:
 ## commit 6734891419fcafdbc87d40eca269d07821c1b813 DISPATCH-2286: reset the raw conn context when handling disconnect
 # run two routers using the above configurations:
 ## rm -f qdrouterd-A-log.txt ; qdrouterd -c qdrouterd-A.conf & rm -f qdrouterd-B-log.txt ; qdrouterd -c qdrouterd-B.conf &
 # Install iperf3
 # spawn an iperf3 server for the router to connected to:
 ## iperf3 -s -p 8080 &
 # run iperf3 clients to generate traffic in a loop:
 ## while iperf3 -c 127.0.0.1 -p 8000 -t 5 -P 8; do echo "OK"; sleep 2; done
 # wait for crash

> raw connection posts wake events after disconnect event is handled
> ------------------------------------------------------------------
>
>                 Key: PROTON-2466
>                 URL: https://issues.apache.org/jira/browse/PROTON-2466
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.36.0
>            Reporter: Ken Giusti
>            Priority: Major
>         Attachments: qdrouterd-A.conf, qdrouterd-B.conf
>
>
> While running tcp stress tests against qdrouterd a crash occurred.  The crash was due to a stale pointer dereference.
> qdrouterd code has been patched to properly clear the pointer and check for null in the effected codepath.  However...
> ... the access occurred while processing a PN_RAW_CONNECTION_WAKE event that arrived on a raw connection *after* a PN_RAW_CONNECTION_DISCONNECTED event previously arrived on the raw connection.
> IIUC the PN_RAW_CONNECTION_DISCONNECTED event is supposed to be the last event generated on a raw connection, and once that event has been handled the raw connection is released.   If that is correct then the arrival of the following WAKE event is a bug.
> Here is the log output from the router just prior to the crash (filtered on the affected connection):
> $ tail C140.txt                                                                                              
> 2021-11-16 17:11:10.925728 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector                                                      
> 2021-11-16 17:11:10.926990 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector                                                      
> 2021-11-16 17:11:10.927001 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_READ connector Event                                                
> 2021-11-16 17:11:10.927034 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_READ Read 0 bytes. Total read 0 bytes                               
> 2021-11-16 17:11:10.927596 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers wrote 3276\
> 8 bytes. Total written 36929573 bytes                                                                                                             
> 2021-11-16 17:11:10.928207 -0500 TCP_ADAPTOR (debug) [C140][L322] PN_RAW_CONNECTION_CLOSED_READ connector                                         
> 2021-11-16 17:11:10.928591 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_CLOSED_WRITE connector                                              
> 2021-11-16 17:11:10.929160 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers wrote 3276\
> 8 bytes. Total written 36962341 bytes                                                                                                             
> *2021-11-16 17:11:10.929410 -0500 TCP_ADAPTOR (info) [C140] PN_RAW_CONNECTION_DISCONNECTED connector* 
> *2021-11-16 17:11:10.929915 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org