You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Tim Robbins <ti...@outlook.com> on 2015/02/20 02:14:40 UTC

High CPU load with network connector, failover transport

Hi,

We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a configuration similar to the following:

Broker 1:
networkConnector with static:(failover:(tcp://broker2 <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)

Broker 2:
networkConnector with static:(failover:(tcp://broker1 <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)

When one of the brokers is restarted, the other broker uses ~400% CPU. The cause is the FailoverTransport reconnectTask spinning, and nothing is stopping the task.

Reverting this fix made for AMQ-5315, while it does reintroduce the NullPointerException, does handle failover properly without spinning:
https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f <https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>

The reason it works after reverting that change is the NullPointerException is caught, -> serviceLocalException() -> ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, the dispose() call is never made.

I think, rather than reverting the AMQ-5315 commit, it would be fine to just call dispose() before fireBridgeFailed() in the case where we can’t retrieve the broker info

This does seem like a fairly serious problem; as far as I’m aware this is a common use case; anyone using the masterslave transport or the failover transport w/ the required maxReconnectAttempts=0 for bridges would be exposed to it for example.

Regards,

Tim

Re: High CPU load with network connector, failover transport

Posted by goggles123 <lo...@gmail.com>.

Created https://issues.apache.org/jira/browse/AMQ-5605



--
View this message in context: http://activemq.2283324.n4.nabble.com/High-CPU-load-with-network-connector-failover-transport-tp4691798p4691829.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: High CPU load with network connector, failover transport

Posted by Tim Bain <tb...@alumni.duke.edu>.

Pleas submit a bug in JIRA for it, ideally with a unit test that shows the
problem (though I'm not quite sure how you'd write a unit test to confirm
that a particular thread isn't spinning a core so that might be wishful
thinking).
On Feb 20, 2015 6:59 AM, "Tim Robbins" <ti...@outlook.com> wrote:

> By the way, I've noticed Lars has run into the same issue and posted via
> Nabble but it hasn't turned up on the mailing list yet:
>
>
> http://activemq.2283324.n4.nabble.com/Using-a-NetworkConnector-results-in-high-CPU-load-td4691627.html
>
>
> > On 20 Feb 2015, at 12:16 pm, Tim Robbins <ti...@outlook.com>
> wrote:
> >
> > Hi,
> >
> > We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a
> configuration similar to the following:
> >
> > Broker 1:
> > networkConnector with static:(failover:(tcp://broker2
> <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)
> >
> > Broker 2:
> > networkConnector with static:(failover:(tcp://broker1
> <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)
> >
> > When one of the brokers is restarted, the other broker uses ~400% CPU.
> The cause is the FailoverTransport reconnectTask spinning, and nothing is
> stopping the task.
> >
> > Reverting this fix made for AMQ-5315, while it does reintroduce the
> NullPointerException, does handle failover properly without spinning:
> >
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f
> <
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f
> >
> >
> > The reason it works after reverting that change is the
> NullPointerException is caught, -> serviceLocalException() ->
> ServiceSupport.dispose(getControllingService()); with the fix made in
> AMQ-5315, the dispose() call is never made.
> >
> > I think, rather than reverting the AMQ-5315 commit, it would be fine to
> just call dispose() before fireBridgeFailed() in the case where we can’t
> retrieve the broker info
> >
> > This does seem like a fairly serious problem; as far as I’m aware this
> is a common use case; anyone using the masterslave transport or the
> failover transport w/ the required maxReconnectAttempts=0 for bridges would
> be exposed to it for example.
> >
> > Regards,
> >
> > Tim
> >
>

Re: High CPU load with network connector, failover transport

Posted by Tim Robbins <ti...@outlook.com>.

By the way, I've noticed Lars has run into the same issue and posted via Nabble but it hasn't turned up on the mailing list yet:

http://activemq.2283324.n4.nabble.com/Using-a-NetworkConnector-results-in-high-CPU-load-td4691627.html


> On 20 Feb 2015, at 12:16 pm, Tim Robbins <ti...@outlook.com> wrote:
> 
> Hi,
> 
> We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a configuration similar to the following:
> 
> Broker 1:
> networkConnector with static:(failover:(tcp://broker2 <tcp://broker2>)?randomize=false&maxReconnectAttempts=0)
> 
> Broker 2:
> networkConnector with static:(failover:(tcp://broker1 <tcp://broker1>)?randomize=false&maxReconnectAttempts=0)
> 
> When one of the brokers is restarted, the other broker uses ~400% CPU. The cause is the FailoverTransport reconnectTask spinning, and nothing is stopping the task.
> 
> Reverting this fix made for AMQ-5315, while it does reintroduce the NullPointerException, does handle failover properly without spinning:
> https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f <https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>
> 
> The reason it works after reverting that change is the NullPointerException is caught, -> serviceLocalException() -> ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, the dispose() call is never made.
> 
> I think, rather than reverting the AMQ-5315 commit, it would be fine to just call dispose() before fireBridgeFailed() in the case where we can’t retrieve the broker info
> 
> This does seem like a fairly serious problem; as far as I’m aware this is a common use case; anyone using the masterslave transport or the failover transport w/ the required maxReconnectAttempts=0 for bridges would be exposed to it for example.
> 
> Regards,
> 
> Tim
>