You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "yang.yang.zz" <ya...@outlook.com> on 2016/01/10 10:46:55 UTC

Message stuck after failover [5.11.1, 5.13.0]

Hi:

I run into an issue that message will get stuck after failover. The
workaround seems to be restarting the consumer broker. Here's my network:

A <--- broker1 ------ duplex network ------- broker*2a* ---> B1

then failover happens, broker2a and B1 are killed (process killed*) and the
backup node broker2b and B2 are up. 

A <--- broker1 ------ duplex network ------- broker*2b* ---> B2

After failover, A can still send message to B2. But all the messages from B2
get stuck on broker2b, until I restart broker 1.

I spent hours and tried everything I could find from the internet including
disabling audit and replayWhenNoConsumers but still not helping.


* it seems when properly shutdown broker2a and B1, this issue doesn't
happen. But what I'm trying to handle is network failure or server power
off. So I have to kill the process.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
*Configuration for node B*


<beans
  xmlns="http://www.springframework.org/schema/beans"
  xmlns:amq="http://activemq.apache.org/schema/core"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
  http://activemq.apache.org/schema/core
http://activemq.apache.org/schema/core/activemq-core.xsd">

    
    <bean
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
        <property name="locations">
            <value>file:${activemq.base}/conf/credentials.properties</value>
        </property>      
    </bean>

    
    <broker xmlns="http://activemq.apache.org/schema/core" 
    	brokerName="da_broker" 
    	dataDirectory="${activemq.base}/data" 
		useJmx="true"
	>
 
        
              
        <destinationPolicy>
            <policyMap>
              <policyEntries>
                    <policyEntry queue=">" producerFlowControl="true" 
memoryLimit="200mb">
                    	<pendingQueuePolicy>
                 			<vmQueueCursor/>
                 		</pendingQueuePolicy>
                    </policyEntry>
                    <policyEntry topic=">" producerFlowControl="false">
                        <subscriptionRecoveryPolicy>
                            <lastImageSubscriptionRecoveryPolicy/>
                        </subscriptionRecoveryPolicy>
                    </policyEntry>

              </policyEntries>
            </policyMap>
        </destinationPolicy> 
 
        
        
        <managementContext>
            <managementContext createConnector="true"
connectorPort="11099"/>
        </managementContext>

        
        <persistenceAdapter>
            <kahaDB directory="${activemq.base}/data/kahadb"
            		ignoreMissingJournalfiles="true" 
                    checkForCorruptJournalFiles="true" 
                    checksumJournalFiles="true" />
        </persistenceAdapter>

        
          
             
        <systemUsage>
            <systemUsage sendFailIfNoSpace="false">
                <memoryUsage>
                    <memoryUsage percentOfJvmHeap="70" />
                </memoryUsage>
            </systemUsage>
        </systemUsage>
		  
        
        <transportConnectors>
            <transportConnector name="openwire" uri="tcp://0.0.0.0:61616"/>
        </transportConnectors>
        
        <plugins>
        	<statisticsBrokerPlugin/>
        </plugins>

    </broker>

    
    <import resource="jetty.xml"/>
    
</beans>




--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705734.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
I really hope this is a misconfiguration. I have started working on a
mechanism to restart the broker1 after a failover... However, since the
failover happens on the broker1, so it seems I couldn't catch the failover
event on node A. Therefore it makes detecting when it needs to restart
broker1 very difficult :(



--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705736.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by Tim Bain <tb...@alumni.duke.edu>.
If there's no shared store, you can failover between the two for new
messages, but any messages in the persistence store of the broker will be
unavailable while it's down and only be dispatched once it comes back up.
Which sounds like exactly what you're seeing.

If you want redundancy, make them a master/slave pair.

Tim
On Jan 11, 2016 10:10 AM, "yang.yang.zz" <ya...@outlook.com> wrote:

> Thanks for replay Tim! I've posted the two config files above. For your
> questions:
>
>
> /Are broker2a and broker2b configured as a master/slave pair with a shared
> persistence store, or simply as two standalone brokers?  If there is a
> shared persistence store, which type is it? /
>
> There's no special setting on broker2a and broker2b. They are just using
> TransportConnectors which allows broker1's network connector connecting to
> it. There's no shared persistent store. They are two standalone brokers
> serves as a failover backup to each other.
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705735.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
Thanks for replay Tim! I've posted the two config files above. For your
questions:


/Are broker2a and broker2b configured as a master/slave pair with a shared 
persistence store, or simply as two standalone brokers?  If there is a 
shared persistence store, which type is it? /

There's no special setting on broker2a and broker2b. They are just using
TransportConnectors which allows broker1's network connector connecting to
it. There's no shared persistent store. They are two standalone brokers
serves as a failover backup to each other.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705735.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
Configuration for node *A*



<beans
  xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
  http://activemq.apache.org/schema/core
http://activemq.apache.org/schema/core/activemq-core.xsd">

    
    <bean
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
        <property name="locations">
            <value>file:${activemq.conf}/credentials.properties</value>
        </property>
    </bean>

    
    
    
    
    <broker 
    	xmlns="http://activemq.apache.org/schema/core" 
    	brokerName="dc_broker_1d54f2fd-87f0-4b07-85bf-16475ce2cf5c" 
    	dataDirectory="${activemq.data}"
        useShutdownHook="false"
        useJmx="true">
        
        
        <destinationPolicy>
            <policyMap>

                <policyEntries>
					<policyEntry queue=">" producerFlowControl="true"
cursorMemoryHighWaterMark="30" memoryLimit="100M"/>
                    <policyEntry topic=">" producerFlowControl="false">
                        <subscriptionRecoveryPolicy>
                            <lastImageSubscriptionRecoveryPolicy/>
                        </subscriptionRecoveryPolicy>
                    </policyEntry>
                </policyEntries>
            </policyMap>

        </destinationPolicy>


        
        <managementContext>
            <managementContext createConnector="true"
connectorPort="11099"/>
        </managementContext>

		
        <networkConnectors>
            
            				  
		  <networkConnector name="da_manager"
uri="static:(failover:(tcp://yanya04-vertica1:61616,tcp://uspmscale1-da:61616)?randomize=false)"
duplex="true" suppressDuplicateTopicSubscriptions="false"/>
        </networkConnectors> 

        
        <persistenceAdapter>
            <kahaDB directory="${activemq.data}/kahadb"/>
        </persistenceAdapter>
        
        
        
        
        <systemUsage>
            <systemUsage sendFailIfNoSpace="true">
                <memoryUsage>
					
                 	<memoryUsage percentOfJvmHeap="70" />
                </memoryUsage>
                <tempUsage>
                	
                	
                    <tempUsage limit="200 gb"/>
                </tempUsage>                
            </systemUsage>
        </systemUsage>

        
        <transportConnectors>
            
            <transportConnector name="openwire"
uri="tcp://0.0.0.0:61616?maximumConnections=100&amp;wireFormat.maxFrameSize=104857600"/>
        </transportConnectors>

        <plugins>
        	<statisticsBrokerPlugin/>
        </plugins>

        
        <shutdownHooks>
            <bean xmlns="http://www.springframework.org/schema/beans"
class="org.apache.activemq.hooks.SpringContextHook" />
        </shutdownHooks>
    </broker>
	
    
    <import resource="jetty.xml"/>

</beans>




--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705733.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by Tim Bain <tb...@alumni.duke.edu>.
Are broker2a and broker2b configured as a master/slave pair with a shared
persistence store, or simply as two standalone brokers?  If there is a
shared persistence store, which type is it?

Please post your full config for each broker if possible, or at a minimum
the networkConnectors for each broker.  Any networkConnector that uses the
failover transport needs to wrap it in a static transport and prevent the
failover transport from reconnecting; let's confirm that you're doing that.

Your expectation that failover will be transparent and not require a
restart is correct, so you've probably just got something misconfigured.

Tim
On Jan 10, 2016 9:42 AM, "yang.yang.zz" <ya...@outlook.com> wrote:

> I was trying to imagine what's happening behind the scenes....
>
> *before failover*
>
>                        B1 (up)
>                      /
> A --- failover ?
>                      \
>                        B2 (down)
>
> *after failover*
>
>                        B1 (down)
>                      /
> A --- failover ?
>                      \
>                        B2 (up)
>
>
>
> After failover, A can still send message to B2. So, to this point, this
> failover went succeed. But B2, as a backup broker and producer, couldn't
> send message to A. All the messages from B2 are stuck at B2's local broker.
>
> However, restarting the broker at A will solve it. So it seems to be
> restarting A broker will re-register consumer listeners at A. So it will
> consume messages queued at B? Shouldn't the failover does the
> re-registration automatically? Does A's broker have to be restarted after
> the failover??
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705731.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
I was trying to imagine what's happening behind the scenes....

*before failover*

                       B1 (up)
                     / 
A --- failover ?  
                     \
                       B2 (down)

*after failover*

                       B1 (down)
                     / 
A --- failover ?  
                     \
                       B2 (up)



After failover, A can still send message to B2. So, to this point, this
failover went succeed. But B2, as a backup broker and producer, couldn't
send message to A. All the messages from B2 are stuck at B2's local broker.

However, restarting the broker at A will solve it. So it seems to be
restarting A broker will re-register consumer listeners at A. So it will
consume messages queued at B? Shouldn't the failover does the
re-registration automatically? Does A's broker have to be restarted after
the failover??




--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705731.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by Tim Bain <tb...@alumni.duke.edu>.
That defect isn't a bug, it's user error (the submitter should have set
maxReconnectAttempts=0 with the failover transport in his
networkConnector).  And you're right, that's the same problem you're
having.  Fixing that will address the connectivity issues after a restart;
it won't address the unavailability of any messages on the broker when it
goes down, but my other message addresses that problem.

Tim
On Jan 11, 2016 10:10 AM, "yang.yang.zz" <ya...@outlook.com> wrote:

> Here I found an open Defect for this.
>
> https://issues.apache.org/jira/browse/AMQ-5531
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705738.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Message stuck after failover [5.11.1, 5.13.0]

Posted by "yang.yang.zz" <ya...@outlook.com>.
Here I found an open Defect for this.

https://issues.apache.org/jira/browse/AMQ-5531




--
View this message in context: http://activemq.2283324.n4.nabble.com/Message-stuck-after-failover-5-11-1-5-13-0-tp4705730p4705738.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.