You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Ken Giusti (Jira)" <ji...@apache.org> on 2019/09/20 21:00:00 UTC

[jira] [Commented] (DISPATCH-1426) Repetitive receiver fail over causes memory leak

    [ https://issues.apache.org/jira/browse/DISPATCH-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934751#comment-16934751 ] 

Ken Giusti commented on DISPATCH-1426:
--------------------------------------

I believe I found the source of the leak. 

qdr_delivery_t's are being leaked because the reference count is not properly decremented by the container under certain conditions.  When a qdr_delivery_t is associated with a proton pn_delivery_t the container links the proton and qdr deliveries together and increments the qdr_delivery_t's reference count (see router_node.c:qdr_node_(connect/disconnect)_deliveries() and qdr_node_reap_abandoned_deliveries).   Note that the qd_link_t keeps a reference to the qdr_delivery_t in a list (which is why the qdr_delivery_t's reference count must be incremented).

Now look at container.c:qd_link_free() - this route simply drops any references it holds to qdr_delivery_t's, resulting in a leak.

While decref'ing the qdr_delivery_t's in qd_link_free() will avoid the leak, I think that the real bug is a code path that ends up invoking qd_link_free() with delivery references still held.

If you search the codebase for qd_link_free() calls you will see that all but one of the calls to qd_link_free() is preceded by a call to qdr_node_reap_abandoned_deliveries() - which releases the references.

There is only one call to qd_link_free() that does NOT call the reap code - container.c:close_links().   This is the code path I can trigger to cause the leak.

Here's the sequence of events that triggers the link:

*  A RX link grants a lot of credit and only consumes a handful of messages before CLEANLY detaching the link and closing the connection
* The first event arriving at the container in PN_LINK_REMOTE_CLOSE
* This causes the container to calls AMQP_link_detach_handler
* AMQP_link_detach_handler clears the qd_link_t's context
* AMQP_link_detach_handler does NOT call the reap code because dt is QD_CLOSED (it was clean) and qdr_link_t's context is referencing the qd_link_t.
* the qd_link_t is NOT FREED
* the next event to arrive is PN_CONNECTION_REMOTE_CLOSE
* this causes container.c::close_links() to be invoked
* since the qd_link_t's context has been cleared by AMQP_link_detach_handler, close_links() simply frees the qd_link_t (resulting in the leak)

This can be reproduced quite easily using the test-sender and test-receiver C clients in the test subdirectory:

run qdrouterd then

$./tests/test-receiver -c 13 & sleep 1; ./tests/test-sender -c 100

Do that a few times - it should trigger the leak.




> Repetitive receiver fail over causes memory leak
> ------------------------------------------------
>
>                 Key: DISPATCH-1426
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1426
>             Project: Qpid Dispatch
>          Issue Type: Bug
>          Components: Router Node
>    Affects Versions: 1.8.0, 1.9.0
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Major
>
> I'm able to cause a slow memory leak by running a new test I'm creating for reproducing [DISPATCH-1406|https://github.com/apache/qpid-dispatch/pull/564] test for an extended time. Output of qdstat -m after test run:
> {{$ qdstat -m -b 127.0.0.1:22381
> Types
>   type                        size   batch  thread-max  total  in-threads  rebal-in  rebal-out
>   ==============================================================================================
>   qd_bitmask_t                24     64     128         576    448         1         3
>   qd_buffer_t                 536    64     128         2,240  2,240       8         8
>   qd_composed_field_t         64     64     128         256    256         0         0
>   qd_composite_t              112    64     128         320    320         0         0
>   qd_connection_t             2,360  16     32          48     48          0         0
>   qd_connector_t              504    64     128         64     64          0         0
>   qd_hash_handle_t            16     64     128         128    128         0         0
>   qd_hash_item_t              32     64     128         128    128         0         0
>   qd_iterator_t               160    64     128         4,224  4,224       4         4
>   qd_link_ref_t               24     64     128         320    192         0         2
>   qd_link_t                   104    64     128         192    192         0         0
>   qd_listener_t               440    64     128         64     64          0         0
>   qd_log_entry_t              2,112  16     32          1,056  1,056       0         0
>   qd_management_context_t     56     64     128         128    128         0         0
>   qd_message_content_t        1,080  64     128         576    576         0         0
>   qd_message_t                160    64     128         768    704         2         3
>   qd_node_t                   56     64     128         64     64          0         0
>   qd_parse_node_t             104    64     128         128    128         0         0
>   qd_parsed_field_t           88     64     128         1,664  1,664       0         0
>   qd_parsed_turbo_t           64     64     128         256    256         0         0
>   qd_pn_free_link_session_t   32     64     128         64     64          0         0
>   qd_timer_t                  56     64     128         128    128         0         0
>   qdr_action_t                160    64     128         320    192         24        26
>   qdr_address_config_t        72     64     128         64     64          0         0
>   qdr_address_t               376    64     128         64     64          0         0
>   qdr_connection_info_t       88     64     128         192    192         0         0
>   qdr_connection_t            568    64     128         192    192         0         0
>   qdr_connection_work_t       48     64     128         64     64          0         0
>   qdr_delivery_cleanup_t      32     64     128         384    384         1         1
>   qdr_delivery_ref_t          24     64     128         256    192         2         3
>   qdr_delivery_t              272    64     128         640    512         1         3
>   qdr_error_t                 24     64     128         256    256         0         0
>   qdr_field_t                 40     64     128         768    768         0         0
>   qdr_forward_deliver_info_t  32     64     128         64     64          0         0
>   qdr_general_work_t          128    64     128         384    384         2         2
>   qdr_link_ref_t              24     64     128         448    448         0         0
>   qdr_link_t                  512    64     128         256    256         0         0
>   qdr_link_work_t             48     64     128         448    448         1         1
>   qdr_node_t                  64     64     128         64     64          0         0
>   qdr_query_t                 344    64     128         128    128         0         0
>   qdr_terminus_t              64     64     128         256    256         0         0
>   qdtm_router_t               16     64     128         64     64          0         0
> }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org