You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2018/05/24 14:32:00 UTC
[jira] [Commented] (PROTON-1515) Python sender client doesn't check actual link state and continues to send messages even if link is down

    [ https://issues.apache.org/jira/browse/PROTON-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489109#comment-16489109 ] 

Alan Conway commented on PROTON-1515:
-------------------------------------

In the event of a connection failure, there is always scope for messages to be "in doubt" - the client believes they were sent but they never arrived. That's unavoidable because of the unpredictable delays in any network.

Of course proton should notice failures as quickly as possible, if there is an unreasonable delay that should be addressed. Sending 10 messages and killing a broker doesn't really seem to demonstrate an unreasonable delay. Sending a message after the broker is killed might, depending on the timing. Can you confirm any of the following:
 * run client with PN_TRACE_FRM=1, wait to see "EOS" on the client side, send a message.
 * run client with heartbeat enabled, wait for > 3x heartbeat, send a message.

The common practice of testing failover by killing a broker process is unrealistic. The kernel will send TCP FIN packets on open connections giving quick notice of the failure. In a real host crash or network partition, outgoing client TCP packets simply vanish. The client-side timeout to assume the connection has failed is **based on very pessimistic assumptions about worst-case latencies in arbitrary TCP networks (multiple minutes, TCP was designed in the 70s :).  Thats why AMQP includes configurable heartbeats to narrow the uncertainty based on more realistic knowledge of actual expected response times. The tradeoff is: longer heartbeats = more "in doubt" messages; shorter heartbeats = greater chance of incorrectly assuming your peer has failed.

> Python sender client doesn't check actual link state and continues to send messages even if link is down
> --------------------------------------------------------------------------------------------------------
>
>                 Key: PROTON-1515
>                 URL: https://issues.apache.org/jira/browse/PROTON-1515
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: python-binding
>         Environment: RHEL7.3
> Jboss AMQ 7
> python-qpid-proton.x86_64-0.14.0-1.el7
>            Reporter: Dmitrii Puzikov
>            Assignee: Justin Ross
>            Priority: Major
>         Attachments: sender.log
>
>
> Steps to reproduce:
> 1. Start broker
> 2. Create queue
> 3. Start sending e.g. 10 messages with python sender
> 4. Kill broker
> 5. Notice that client continues send messages and raises exception only after all 10 messages were sent.
> Actual behavior: Python sender client ignores link failure until all messages were sent and only then raises an exception/ begins re-connection attempts.
> Expected behavior: Client should stop sending messages and raise exception or try to begin re-connection attempts if reconnect option is set.
> Please, see sender.log. Global handler was added for event logging purposes. It just prints event/handler name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org