You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Ken Giusti (Jira)" <ji...@apache.org> on 2021/11/17 14:34:00 UTC
[jira] [Commented] (PROTON-2466) raw connection posts wake events after disconnect event is handled
[ https://issues.apache.org/jira/browse/PROTON-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445250#comment-17445250 ]
Ken Giusti commented on PROTON-2466:
------------------------------------
This is a difficult issue to reproduce. In my experience it can take a few hours and the resulting log files are huge.
To reproduce:
# check out head of the qdrouter 1.18.x branch
# back out the pointer clear patch that prevents the crash from occurring:
## commit 6734891419fcafdbc87d40eca269d07821c1b813 DISPATCH-2286: reset the raw conn context when handling disconnect
# run two routers using the above configurations:
## rm -f qdrouterd-A-log.txt ; qdrouterd -c qdrouterd-A.conf & rm -f qdrouterd-B-log.txt ; qdrouterd -c qdrouterd-B.conf &
# Install iperf3
# spawn an iperf3 server for the router to connected to:
## iperf3 -s -p 8080 &
# run iperf3 clients to generate traffic in a loop:
## while iperf3 -c 127.0.0.1 -p 8000 -t 5 -P 8; do echo "OK"; sleep 2; done
# wait for crash
> raw connection posts wake events after disconnect event is handled
> ------------------------------------------------------------------
>
> Key: PROTON-2466
> URL: https://issues.apache.org/jira/browse/PROTON-2466
> Project: Qpid Proton
> Issue Type: Bug
> Components: proton-c
> Affects Versions: proton-c-0.36.0
> Reporter: Ken Giusti
> Priority: Major
> Attachments: qdrouterd-A.conf, qdrouterd-B.conf
>
>
> While running tcp stress tests against qdrouterd a crash occurred. The crash was due to a stale pointer dereference.
> qdrouterd code has been patched to properly clear the pointer and check for null in the effected codepath. However...
> ... the access occurred while processing a PN_RAW_CONNECTION_WAKE event that arrived on a raw connection *after* a PN_RAW_CONNECTION_DISCONNECTED event previously arrived on the raw connection.
> IIUC the PN_RAW_CONNECTION_DISCONNECTED event is supposed to be the last event generated on a raw connection, and once that event has been handled the raw connection is released. If that is correct then the arrival of the following WAKE event is a bug.
> Here is the log output from the router just prior to the crash (filtered on the affected connection):
> $ tail C140.txt
> 2021-11-16 17:11:10.925728 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector
> 2021-11-16 17:11:10.926990 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector
> 2021-11-16 17:11:10.927001 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_READ connector Event
> 2021-11-16 17:11:10.927034 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_READ Read 0 bytes. Total read 0 bytes
> 2021-11-16 17:11:10.927596 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers wrote 3276\
> 8 bytes. Total written 36929573 bytes
> 2021-11-16 17:11:10.928207 -0500 TCP_ADAPTOR (debug) [C140][L322] PN_RAW_CONNECTION_CLOSED_READ connector
> 2021-11-16 17:11:10.928591 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_CLOSED_WRITE connector
> 2021-11-16 17:11:10.929160 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers wrote 3276\
> 8 bytes. Total written 36962341 bytes
> *2021-11-16 17:11:10.929410 -0500 TCP_ADAPTOR (info) [C140] PN_RAW_CONNECTION_DISCONNECTED connector*
> *2021-11-16 17:11:10.929915 -0500 TCP_ADAPTOR (debug) [C140] PN_RAW_CONNECTION_WAKE connector*
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org