You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by "Howard Orner (JIRA)" <ji...@apache.org> on 2008/05/02 19:57:43 UTC

[jira] Created: (AMQ-1709) Network of Brokers Memory Leak Due to Race Condition

Network of Brokers Memory Leak Due to Race Condition
----------------------------------------------------

Key: AMQ-1709
URL: https://issues.apache.org/activemq/browse/AMQ-1709
Project: ActiveMQ
Issue Type: Bug
Components: Broker, Transport
Affects Versions: 5.0.0, 4.1.2
Reporter: Howard Orner

When you a a network of brokers configuration with at least 3 brokers, such as:

with the other brokers have a similar configuration.
Then, if you have subscribers trying to connect to all of the brokers you can have a race condition at start up where the transports accept connections from subscribers before the network connectors are initialized. In BrokerService.startAllConnectors(), the transports are started first. Then the NetworkConnectors. As part of starting the network connectors, their constructors takes a collection obtained by calling getBroker().getDurableDestinations(). Normally this list would be empty. However, if clients connect before this is called, a list is returned for each topic subscribed to. Then, instead of creating standard TopicSubscriptions for the network connector, DurableTopicSubscriptions are created. I'm not sure if this really should be a problem, but it is because SimpleDispatchPolicy, in the process of iterating through the DurableTopicSubscriptions, causes messages to be queued up for prefetch without clearing all of the references (for each pass through it looks like three references are registered and only two are cleared. This becomes a memory leak. In the logs you see a message saying the PrefetchLimit was reached and then you start seeing logs about memory usage increasing until it gets to 100% and then everything stops.

To reproduce this, create a network of brokers configuration of at least 3 brokers -- the more you have the more likely you are to hit this without a lot of tries so I suggest a bunch. Start all brokers. Establish a publisher on broker A using failover://(tcp://localhost:61610) then establish a bunch of subscribers on all the brokers using a similar configuration, i.e, failover://(tcp://localhost:61610), failover://(tcp://localhost:61620). The more you have on broker 'A' the better since you are trying to reproduce the race condition. You want the others up so that the other brokers expect messages to be passed to them. Once everybody is up and happy, kill broker A and restart it. If you do that enough times, you will hit the race condition and the memory leak will start. You can also put a break point in BrokerService.startAllConnectors() after the transports are started but before the network connectors are started. That'll give clients to connect to the transport threads before you tell the VM to continue.

I found it an easy fix to store the durable destination list in a local variable before starting the transports and passing that to the network connectors instead of separate calls.. I'm not sure if there are 'normal' ways for that list to be anything other than empty. If not, you could just pass an empty set to the network connectors, but suspect there are legitimate configurations that may need this to requested. If so, this memory leak would likely occur in these cases, too.

I ran into this in 4.1.2. I haven't tested 5.0 since our attempts to switch to 5.0 were met with failure due to the number of bugs in 5.0 (already reported by others). Looking at 5.0.0 source, the race condition is still there in BrokerService.startAllConnectors() so I suspect the memory leak is there as well.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (AMQ-1709) Network of Brokers Memory Leak Due to Race Condition

Posted by "Rob Davies (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/activemq/browse/AMQ-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Davies resolved AMQ-1709.
-----------------------------

    Fix Version/s: 5.2.0
       Resolution: Fixed

Fixed by SVN revision 656601

> Network of Brokers Memory Leak Due to Race Condition
> ----------------------------------------------------
>
>                 Key: AMQ-1709
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1709
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Transport
>    Affects Versions: 4.1.2, 5.0.0
>            Reporter: Howard Orner
>            Assignee: Rob Davies
>             Fix For: 5.2.0
>
>
> When you a a network of brokers configuration with at least 3 brokers, such as:
> <broker brokerName="A" persistent="false" ...
> ...
> <transportConnector name="AListener" uri="tcp://localhost:61610"/>
> ...
> <networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
> <networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>
> with the other brokers have a similar configuration.
> Then, if you have subscribers trying to connect to all of the brokers you can have a race condition at start up where the transports accept connections from subscribers before the network connectors are initialized.  In BrokerService.startAllConnectors(), the transports are started first.  Then the NetworkConnectors.  As part of starting the network connectors, their constructors takes a collection obtained by calling getBroker().getDurableDestinations().  Normally this list would be empty.  However, if clients connect before this is called, a list is returned for each topic subscribed to.  Then, instead of creating standard TopicSubscriptions for the network connector, DurableTopicSubscriptions are created.  I'm not sure if this really should be a problem, but it is because SimpleDispatchPolicy, in the process of iterating through the DurableTopicSubscriptions, causes messages to be queued up for prefetch without clearing all of the references (for each pass through it looks like three references are registered and only two are cleared.  This becomes a memory leak.  In the logs you see a message saying the PrefetchLimit was reached and then you start seeing logs about memory usage increasing until it gets to 100% and then everything stops.  
> To reproduce this, create a network of brokers configuration of at least 3 brokers -- the more you have the more likely you are to hit this without a lot of tries so I suggest a bunch.  Start all brokers.  Establish a publisher on broker A using failover://(tcp://localhost:61610) then establish a bunch of subscribers on all the brokers using a similar configuration, i.e, failover://(tcp://localhost:61610), failover://(tcp://localhost:61620).  The more you have on broker 'A' the better since you are trying to reproduce the race condition.  You want the others up so that the other brokers expect messages to be passed to them.    Once everybody is up and happy, kill broker A and restart it.  If you do that enough times, you will hit the race condition and the memory leak will start.    You can also put a break point in BrokerService.startAllConnectors() after the transports are started but before the network connectors are started.  That'll give clients to connect to the transport threads before you tell the VM to continue.
> I found it an easy fix to store the durable destination list in a local variable before starting the transports and passing that to the network connectors instead of separate calls..  I'm not sure if there are 'normal' ways for that list to be anything other than empty.  If not, you could just pass an empty set to the network connectors, but suspect there are legitimate configurations that may need this to requested.  If so, this memory leak would likely occur in these cases, too.   
> I ran into this in 4.1.2.  I haven't tested 5.0 since our attempts to switch to 5.0 were met with failure due to the number of bugs in 5.0 (already reported by others).  Looking at 5.0.0 source, the race condition is still there in BrokerService.startAllConnectors() so I suspect the memory leak is there as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (AMQ-1709) Network of Brokers Memory Leak Due to Race Condition

Posted by "Rob Davies (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/activemq/browse/AMQ-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Davies reassigned AMQ-1709:
-------------------------------

    Assignee: Rob Davies

> Network of Brokers Memory Leak Due to Race Condition
> ----------------------------------------------------
>
>                 Key: AMQ-1709
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1709
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Transport
>    Affects Versions: 4.1.2, 5.0.0
>            Reporter: Howard Orner
>            Assignee: Rob Davies
>
> When you a a network of brokers configuration with at least 3 brokers, such as:
> <broker brokerName="A" persistent="false" ...
> ...
> <transportConnector name="AListener" uri="tcp://localhost:61610"/>
> ...
> <networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
> <networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>
> with the other brokers have a similar configuration.
> Then, if you have subscribers trying to connect to all of the brokers you can have a race condition at start up where the transports accept connections from subscribers before the network connectors are initialized.  In BrokerService.startAllConnectors(), the transports are started first.  Then the NetworkConnectors.  As part of starting the network connectors, their constructors takes a collection obtained by calling getBroker().getDurableDestinations().  Normally this list would be empty.  However, if clients connect before this is called, a list is returned for each topic subscribed to.  Then, instead of creating standard TopicSubscriptions for the network connector, DurableTopicSubscriptions are created.  I'm not sure if this really should be a problem, but it is because SimpleDispatchPolicy, in the process of iterating through the DurableTopicSubscriptions, causes messages to be queued up for prefetch without clearing all of the references (for each pass through it looks like three references are registered and only two are cleared.  This becomes a memory leak.  In the logs you see a message saying the PrefetchLimit was reached and then you start seeing logs about memory usage increasing until it gets to 100% and then everything stops.  
> To reproduce this, create a network of brokers configuration of at least 3 brokers -- the more you have the more likely you are to hit this without a lot of tries so I suggest a bunch.  Start all brokers.  Establish a publisher on broker A using failover://(tcp://localhost:61610) then establish a bunch of subscribers on all the brokers using a similar configuration, i.e, failover://(tcp://localhost:61610), failover://(tcp://localhost:61620).  The more you have on broker 'A' the better since you are trying to reproduce the race condition.  You want the others up so that the other brokers expect messages to be passed to them.    Once everybody is up and happy, kill broker A and restart it.  If you do that enough times, you will hit the race condition and the memory leak will start.    You can also put a break point in BrokerService.startAllConnectors() after the transports are started but before the network connectors are started.  That'll give clients to connect to the transport threads before you tell the VM to continue.
> I found it an easy fix to store the durable destination list in a local variable before starting the transports and passing that to the network connectors instead of separate calls..  I'm not sure if there are 'normal' ways for that list to be anything other than empty.  If not, you could just pass an empty set to the network connectors, but suspect there are legitimate configurations that may need this to requested.  If so, this memory leak would likely occur in these cases, too.   
> I ran into this in 4.1.2.  I haven't tested 5.0 since our attempts to switch to 5.0 were met with failure due to the number of bugs in 5.0 (already reported by others).  Looking at 5.0.0 source, the race condition is still there in BrokerService.startAllConnectors() so I suspect the memory leak is there as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.