You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Timothy Bish (JIRA)" <ji...@apache.org> on 2018/03/16 17:52:01 UTC

[jira] [Updated] (QPIDJMS-365) [Failover] failover.reconnectDelay not applied between connection attempts if peer Closes gracefully

     [ https://issues.apache.org/jira/browse/QPIDJMS-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothy Bish updated QPIDJMS-365:
---------------------------------
    Fix Version/s: 0.31.0

> [Failover] failover.reconnectDelay not applied between connection attempts if peer Closes gracefully
> ----------------------------------------------------------------------------------------------------
>
>                 Key: QPIDJMS-365
>                 URL: https://issues.apache.org/jira/browse/QPIDJMS-365
>             Project: Qpid JMS
>          Issue Type: Bug
>          Components: qpid-jms-client
>            Reporter: Keith Wall
>            Assignee: Timothy Bish
>            Priority: Major
>             Fix For: 0.31.0
>
>
> When using Broker-J's High Availability feature, the client's failover abilities are used to allow the client to discover which Broker in the HA group has the master role.  The client tries each Broker on the failover list until until one successfully responds to the {{Open}} indicating that it is current master.
>  
> When a node is not in the master role, it gracefully closes the AMQP connection by sending a {{Close}} performative.  During election periods, it is normal for all nodes in the HA group to respond with the {{Close}} until the election concludes.
> {noformat}
> Close{error=Error{condition=amqp:not-found, description='Virtual host 'localhost' is not active', info=null}}
> {noformat} 
> The QpidJMS Client failover options includes a {{failover.reconnectDelay}} which "Controls the delay between successive reconnection attempts".   However it appears that the reconnection delay is only applied between attempts when a connection fails owing to a 'transport' level failure (connection refused etc).  If the connection fails at the AMQP level, the delay is not applied.
> This is impactful to the HA use-case for Broker-J.   It means that during periods of reelection, the client, tightly spins in the reconnection loop, excessively consuming system resources.  It is also necessary to ensure that {{failover.maxReconnectAttempts}} is set sufficiently large to allow for an election period to conclude successfully.  Whilst the user could use unlimited reconnection attempts, this is unpleasant as it means the system won't fail in the case where the election does not conclude within a reasonable time period.
> Extract of TRACE level logging from {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when a AMQP connection is closed gracefully ({{Close}} performative):
> {noformat}
> 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider               - Failover: the provider reports failure: Connection closed by external action [condition = amqp:connection:forced]
> 2018-03-13 11:04:54,951 [lization thread] - DEBUG FailoverProvider               - handling Provider failure: Connection closed by external action [condition = amqp:connection:forced]
> 2018-03-13 11:04:54,951 [lization thread] - TRACE FailoverProvider               - stack
> java.io.IOException: Connection closed by external action [condition = amqp:connection:forced]
> 	at org.apache.qpid.jms.util.IOExceptionSupport.create(IOExceptionSupport.java:45)
> 	at org.apache.qpid.jms.provider.amqp.AmqpProvider.fireProviderException(AmqpProvider.java:1086)
> 	at org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.closeResource(AmqpAbstractResource.java:182)
> 	at org.apache.qpid.jms.provider.amqp.AmqpAbstractResource.processRemoteClose(AmqpAbstractResource.java:262)
> 	at org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:949)
> 	at org.apache.qpid.jms.provider.amqp.AmqpProvider.access$1900(AmqpProvider.java:104)
> 	at org.apache.qpid.jms.provider.amqp.AmqpProvider$17.run(AmqpProvider.java:831)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.jms.JMSException: Connection closed by external action [condition = amqp:connection:forced]
> 	at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:164)
> 	at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToException(AmqpSupport.java:117)
> 	... 11 more
> 2018-03-13 11:04:54,954 [ connect thread] - DEBUG FailoverProvider               - Connection attempt:[1] to: amqp://localhost:5672 in-progress
> 2018-03-13 11:04:55,003 [lization thread] - DEBUG FailoverProvider               - Signalling connection recovery: AmqpProvider: localhost:5672
> 2018-03-13 11:04:55,007 [lization thread] - DEBUG FailoverProvider               - handling Provider failure: Virtual host 'localhost' is not active [condition = amqp:not-found]
> {noformat}
> Extract of TRACE level logging from {{org.apache.qpid.jms.provider.failover.FailoverProvider}} for the case when a AMQP connection is closed at the transport level (Connection Refused):
> {noformat}
> 2018-03-13 11:03:47,069 [lization thread] - TRACE FailoverProvider               - stack
> java.io.IOException: Transport connection remotely closed.
> 	at org.apache.qpid.jms.provider.amqp.AmqpProvider$20.run(AmqpProvider.java:901)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-03-13 11:03:47,089 [ connect thread] - DEBUG FailoverProvider               - Connection attempt:[1] to: amqp://localhost:5672 in-progress
> 2018-03-13 11:03:47,095 [ connect thread] - INFO  FailoverProvider               - Connection attempt:[1] to: amqp://localhost:5672 failed
> 2018-03-13 11:03:47,095 [ connect thread] - TRACE FailoverProvider               - Next reconnect attempt will be in 10000 milliseconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org