You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by bryand <br...@bldixon.net> on 2018/02/08 13:04:28 UTC

qpid-broker-j-7.0.0 failover is slow

I have HA setup for our qpid-broker-j-7.0.0 Development environment.  The
setup is this:
- 2 Windows 2012R2 servers
- on one Windows server I have 1 qpid-broker-j-7.0.0 instance
- on the other Windows server I have 2 qpid-broker-j-7.0.0 instances
(listening on different ports)
- all qpid-broker-j-7.0.0 instances are using jdk-8u162-windows-x64

To test failover I simply stop the ACTIVE Virtual Host Node.  Failover is
always solid (always works successfully).  However, it is now consistently
taking around 1 minute and 45 seconds to failover to another Virtual Host
Node - takes that long for another Node to become ACTIVE and actually start
processing client requests.  When I first starting testing this it was
taking around 45 seconds.

I do have 33 durable queues defined (about half are dead letter queues).  
We will actually have more queues in our Test and Production environments.

Is that amount of time normal?  It seems pretty long.  During that time if a
client is trying to publish a message it is blocked until failover completes
so if the client request is coming from an end user's action from an
application's user interface the end user is just stuck sitting there
waiting for almost 2 minutes.  We're trying to move away from ActiveMQ to
this Apache message broker and ActiveMQ failover (we are using Master/Slave
with a SAN for shared storage) is much much quicker than broker-j. 

Also, I've noticed on the Windows server that I have the 2 broker-j
instances running that the 1st instance never becomes the ACTIVE Node in the
group unless I give it a higher priority than the other Nodes or I start it
up first before the other 2 broker instances are started.  



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-broker-j-7.0.0 failover is slow

Posted by Keith W <ke...@gmail.com>.

Bryan

Thanks for reporting the problem.  This is a defect in Broker-J 7.0.0
and 7.0.1.  The defect will cause startup to be delayed as you have
observed (5 seconds (cumulative) per queue or exchange using an
alternate binding).   The synchronous/asynchronous recoverer and
existence or not of consumers has no bearing.

The fix should be straight forward. I anticipate putting it out, as
part of a 7.0.2, soon.

Keith

[1] https://issues.apache.org/jira/browse/QPID-8106

On 20 February 2018 at 14:50, bryand <br...@bldixon.net> wrote:
> Answers to your questions..
>
> Could you please clarify whether you have seen "Gave up waiting"
> warnings with synchronous or asynchronous recoverer?
> - definitely when using synchronous but also pretty sure it was after I made
> the context change to use asynchronous recovery also.  I've been making
> several changes to get everything cleaned up so I can't remember about
> asynchronous for sure.
>
> Do you have full broker log with the warnings?  Can you share it?
> qpid.log <http://qpid.2158936.n2.nabble.com/file/t396334/qpid.log>
>
> Did you connect consumers to DLQs whilst the VH messages were
> asynchronously recovered?
> - I never had any consumers connected to any of the DLQs
>
>
>
>
>
>
>
> --
> Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-broker-j-7.0.0 failover is slow

Posted by bryand <br...@bldixon.net>.

Answers to your questions..

Could you please clarify whether you have seen "Gave up waiting" 
warnings with synchronous or asynchronous recoverer?  
- definitely when using synchronous but also pretty sure it was after I made
the context change to use asynchronous recovery also.  I've been making
several changes to get everything cleaned up so I can't remember about
asynchronous for sure.

Do you have full broker log with the warnings?  Can you share it? 
qpid.log <http://qpid.2158936.n2.nabble.com/file/t396334/qpid.log>  

Did you connect consumers to DLQs whilst the VH messages were 
asynchronously recovered? 
- I never had any consumers connected to any of the DLQs







--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-broker-j-7.0.0 failover is slow

Posted by Oleksandr Rudyy <or...@gmail.com>.

Hi Bryan,

Could you please clarify whether you have seen "Gave up waiting"
warnings with synchronous or asynchronous recoverer?
Do you have full broker log with the warnings? Can you share it?
Did you connect consumers to DLQs whilst the VH messages were
asynchronously recovered?

Kind Regards,
Alex

On 19 February 2018 at 14:27, bryand <br...@bldixon.net> wrote:
> I was utilizing DLQs (Alternate Binding) for most of my queues and I think
> that is why the Virtual Host Node startup time was so slow (1 minute 45
> seconds).  After I deleted all those DLQs, the startup time was down to 11
> seconds - a huge difference especially when dealing with a failover
> situation.
>
> Here are some log messages I was seeing in the Broker-j log file regarding
> the DLQs (note that all my DLQs were empty):
>
> 2018-02-12 10:42:24,719 WARN  [VirtualHostNode-spgqpiddev3-Config]
> (o.a.q.s.m.AbstractConfiguredObject) - Gave up waiting for Queue
> 'app_attach_workqueue_DLQ' to attain state. Check object's state via
> Management.
>
> So I'm wondering if there is some type of issue with broker-j regarding
> DLQs/Alternate Bindings
>
>
>
>
> --
> Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-broker-j-7.0.0 failover is slow

Posted by bryand <br...@bldixon.net>.

I was utilizing DLQs (Alternate Binding) for most of my queues and I think
that is why the Virtual Host Node startup time was so slow (1 minute 45
seconds).  After I deleted all those DLQs, the startup time was down to 11
seconds - a huge difference especially when dealing with a failover
situation.

Here are some log messages I was seeing in the Broker-j log file regarding
the DLQs (note that all my DLQs were empty):

2018-02-12 10:42:24,719 WARN  [VirtualHostNode-spgqpiddev3-Config]
(o.a.q.s.m.AbstractConfiguredObject) - Gave up waiting for Queue
'app_attach_workqueue_DLQ' to attain state. Check object's state via
Management.

So I'm wondering if there is some type of issue with broker-j regarding
DLQs/Alternate Bindings




--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-broker-j-7.0.0 failover is slow

Posted by Oleksandr Rudyy <or...@gmail.com>.

Hi Bryan,

I suspect, that it is a virtual host recovery what is causing the
delay. By default, the messages on the Virtual Host queues are
recovered one by one in sequence as part of VH activation. The more
messages you have, the slower recovery would be. The VH becames ready
to connect only when all messages and VH configured objects are
recovered. You can try to switch to asynchronous message store
recovery [2] by setting Virtual Host context variable
'use_async_message_store_recovery' to 'true'. You can set set it on
Master VH (via Web Management Console or REST API), and, the change
will be replicated to Replicas. With asynchronous message store
recovery, the messages on different queues are recovered in parallel
and VH does not wait for recovery to complete. The VH will became
ready immediately after recovery of its children and will accept
connections before the message recovery is complete. The producer
should be able to publish even when queue recovery has not finished
yet. Though, synchronous publishing or message transaction commits
would be delayed until queue recovery is complete.

Kind Regards,
Alex

[1] http://qpid.apache.org/releases/qpid-broker-j-7.0.0/book/Java-Broker-Runtime-Background-Recovery.html

On 8 February 2018 at 13:04, bryand <br...@bldixon.net> wrote:
> I have HA setup for our qpid-broker-j-7.0.0 Development environment.  The
> setup is this:
> - 2 Windows 2012R2 servers
> - on one Windows server I have 1 qpid-broker-j-7.0.0 instance
> - on the other Windows server I have 2 qpid-broker-j-7.0.0 instances
> (listening on different ports)
> - all qpid-broker-j-7.0.0 instances are using jdk-8u162-windows-x64
>
> To test failover I simply stop the ACTIVE Virtual Host Node.  Failover is
> always solid (always works successfully).  However, it is now consistently
> taking around 1 minute and 45 seconds to failover to another Virtual Host
> Node - takes that long for another Node to become ACTIVE and actually start
> processing client requests.  When I first starting testing this it was
> taking around 45 seconds.
>
> I do have 33 durable queues defined (about half are dead letter queues).
> We will actually have more queues in our Test and Production environments.
>
> Is that amount of time normal?  It seems pretty long.  During that time if a
> client is trying to publish a message it is blocked until failover completes
> so if the client request is coming from an end user's action from an
> application's user interface the end user is just stuck sitting there
> waiting for almost 2 minutes.  We're trying to move away from ActiveMQ to
> this Apache message broker and ActiveMQ failover (we are using Master/Slave
> with a SAN for shared storage) is much much quicker than broker-j.
>
> Also, I've noticed on the Windows server that I have the 2 broker-j
> instances running that the 1st instance never becomes the ACTIVE Node in the
> group unless I give it a higher priority than the other Nodes or I start it
> up first before the other 2 broker instances are started.
>
>
>
> --
> Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org