You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by mdasari <md...@gmail.com> on 2009/04/02 18:47:39 UTC

FailoverTransport stops working after a while

Hi,

We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
servers) after a while the AMQ failover transport stops working thus making
messages to be not delivered. (from a producer AMQ server box to a central
consumer AMQ server box through camel)

--------------------------------------------------------------
The following is the data from our log files:
--------------------------------------------------------------
INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport             
- Connection established
INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport             
- Successfully connected to tcp://10.87.129.196:61616
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2            
- Executing callback on JMS Session: ActiveMQSession
{id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer                   
- Endpoint[centralMQ:topic:Topic1] sending JMS message: ActiveMQTextMessage
{...}
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2            
- Sending created message: ActiveMQTextMessage {...}
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession               
- ID:LOCALMQ-3675-1236961500048-2:218:1 sending message: ActiveMQTextMessage
{...}
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport             
- Stopped.
INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport                  
- Stopping transport tcp:///10.87.129.196:61616
INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter         
- Checkpoint started.
INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter         
- Checkpoint done.
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer       
- ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
{...}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener       
- Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
receiving JMS message: ActiveMQTextMessage {...}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Waking up reconnect task
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Started.
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Waking up reconnect task
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Attempting connect to: tcp://10.87.129.196:61616
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator          
- Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
CacheEnabled=true, SizePrefixDisabled=false,
MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
MaxInactivityDuration=30000, TightEncodingEnabled=true,
StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator          
- Received WireFormat: WireFormatInfo { version=3,
properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
MaxInactivityDuration=30000, TightEncodingEnabled=true,
StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator          
- tcp:///10.87.129.196:61616 before negotiation: OpenWireFormat{version=3,
cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
sizePrefixDisabled=false}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator          
- tcp:///10.87.129.196:61616 after negotiation: OpenWireFormat{version=3,
cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
sizePrefixDisabled=false}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Connection established
INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport             
- Successfully connected to tcp://10.87.129.196:61616
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2            
- Executing callback on JMS Session: ActiveMQSession
{id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer                   
- Endpoint[centralMQ:topic:Topic1] sending JMS message: ActiveMQTextMessage
{...}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2            
- Sending created message: ActiveMQTextMessage {...}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession               
- ID:LOCALMQ-3675-1236961500048-2:219:1 sending message: ActiveMQTextMessage
{...}
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport             
- Stopped.
INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport                  
- Stopping transport tcp:///10.87.129.196:61616
INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer       
- ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
{...}
INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener       
- Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
receiving JMS message: ActiveMQTextMessage {...}
INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport             
- Waiting 10 ms before attempting connection. 
INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
Failover Worker: 1889455" java.lang.NullPointerException
INFO   | jvm 1    | 2009/03/16 21:26:15 | 	at
org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
INFO   | jvm 1    | 2009/03/16 21:26:15 | 	at
org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
INFO   | jvm 1    | 2009/03/16 21:26:15 | 	at
org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport             
- Waking up reconnect task
INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport             
- Started.
INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport             
- Waking up reconnect task
INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter         
- Checkpoint started.
INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter         
- Checkpoint done.
INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter         
- Checkpoint started.
---------------------------------------------


Basically, it was able to deliver a message (and few more prior to that time
period), but for another message that is very close (in time) to the
previous message it is running into a NullPointerException, after that it
stops functioning totally.

I took a brief look at the FailoverTransport.java code, I'm not an expert on
the AMQ code, but I suspect that FailoverTransport.java reconnectTask member
variable is attempted to be used by the task-runner thread before it was
completely initialized  (basically race conditions without proper
synchronization)

I can provide more details on our network topology if it is required. I
searched around but didn't find any related issues or bugs. Does anyone know
if this is a known issue, and which version this is going to be addressed.
If not I'll open a JIRA.

Appreciate your help.

cheers
- mdasari


-- 
View this message in context: http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: FailoverTransport stops working after a while

Posted by Norbert Pfistner <np...@picturesafe.de>.
Hi Dejan,

we use JDBC based persistant messages. But we'll give 5.3-SNAPSHOT a try 
anyway.

Thank's for your hint.

Greetings,
Norbert

Dejan Bosanac schrieb:
> Hi Norbert,
> 
> this sounds like a different problem that this one. Take a look at
> http://issues.apache.org/activemq/browse/AMQ-2149 which is being worked on
> and give 5.3-SNAPSHOT a try.
> 
> Cheers
> --
> Dejan Bosanac
> 
> Open Source Integration - http://fusesource.com/
> ActiveMQ in Action - http://www.manning.com/snyder/
> Blog - http://www.nighttale.net
> 
> 
> On Mon, Apr 6, 2009 at 9:06 AM, Norbert Pfistner <
> norbert.pfistner@picturesafe.de> wrote:
> 
>> Hallo Murty,
>>
>> We also experience the same problems when using failover: Sometimes clients
>> stop working after a slave became a master and processing a bunch of
>> messages with this new master.
>> And yes, we use 5.1 . We also did some testing with 5.2, unfortunately with
>> the same result. So it looks like 5.2 is suffering from the same bug.
>> Actually we do not use failover in our productive environment due to this
>> unreliable feature.
>>
>> Would be fine when this bug is fixed.
>>
>> Greetings,
>> Norbert
>>
>>
>> Murty Dasari schrieb:
>>
>>  Thanks Dejan for the reply.
>>> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
>>> issue before I try pushing the new version to our servers (that is little
>>> lengthy process). I looked at the 5.2 source code and I suspect the
>>> problem
>>> is still there.
>>>
>>> I'm surprised to see that others are not running into any issues with it,
>>> may be there is something wrong with my topology and setup. Does the
>>> following setup look right?
>>>
>>> 1. We have a bunch of applications posting messages to a local
>>> (localhost) AMQ. (We have several boxes like this)
>>> 2. We setup a camel route to delivery the messages to a central AMQ host
>>> with durable subscription. (There is only one box like this)
>>>
>>> ----------------------------------------------------------------
>>>  <camelContext>
>>>  <route>
>>>            <from
>>>
>>> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>>>            <to uri="CENTRALMQ:topic:Topic1"/>
>>>  </route>
>>> ...... Few other routes
>>>    </camelContext>
>>>
>>>    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>>>        <property name="connectionFactory">
>>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>>               <property name="brokerURL"
>>> value="vm://LOCALMQ?broker.persistent=false" />
>>>            </bean>
>>>        </property>
>>>    </bean>
>>>    <bean id="CENTRALMQ"
>>> class="org.apache.camel.component.jms.JmsComponent">
>>>        <property name="connectionFactory">
>>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>>               <property name="brokerURL" value="failover://(tcp://
>>> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100"
>>> />
>>>            </bean>
>>>        </property>
>>>    </bean>
>>> -----------------------------------------
>>>
>>> The main change compared with other config I saw is, we are using failover
>>> with two end points that are same, basically with this model we were able
>>> to
>>> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
>>> problems. We need retries but not really failover (i.e, send to secondary
>>> if
>>> primary were down), as messages would still be there in LOCALMQ if there
>>> were some connectivity problems.
>>>
>>> Is there any other way to achieve retries without using "failover
>>> transport"?
>>>
>>> thanks for your time.
>>>
>>> cheers
>>> - mdasari
>>>
>>> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net>
>>> wrote:
>>>
>>>  Hi,
>>>> did you try 5.2.0 version? Probably some of those issues are already
>>>> addressed.
>>>>
>>>> Cheers
>>>> --
>>>> Dejan Bosanac
>>>>
>>>> Open Source Integration - http://fusesource.com/
>>>> ActiveMQ in Action - http://www.manning.com/snyder/
>>>> Blog - http://www.nighttale.net
>>>>
>>>>
>>>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
>>>>> servers) after a while the AMQ failover transport stops working thus
>>>>>
>>>> making
>>>>
>>>>> messages to be not delivered. (from a producer AMQ server box to a
>>>>>
>>>> central
>>>>
>>>>> consumer AMQ server box through camel)
>>>>>
>>>>> --------------------------------------------------------------
>>>>> The following is the data from our log files:
>>>>> --------------------------------------------------------------
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
>>>>> - Connection established
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
>>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>>> - Executing callback on JMS Session: ActiveMQSession
>>>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
>>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>>
>>>> ActiveMQTextMessage
>>>>
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>>> - Sending created message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
>>>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
>>>>> ActiveMQTextMessage
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
>>>>> - Stopped.
>>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
>>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint done.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
>>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>>> MessageDispatch
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
>>>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Attempting connect to: tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
>>>>> CacheEnabled=true, SizePrefixDisabled=false,
>>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - Received WireFormat: WireFormatInfo { version=3,
>>>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
>>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - tcp:///10.87.129.196:61616 before negotiation:
>>>>>
>>>> OpenWireFormat{version=3,
>>>>
>>>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
>>>>> sizePrefixDisabled=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>>> - tcp:///10.87.129.196:61616 after negotiation:
>>>>>
>>>> OpenWireFormat{version=3,
>>>>
>>>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
>>>>> sizePrefixDisabled=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Connection established
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
>>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>>> - Executing callback on JMS Session: ActiveMQSession
>>>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
>>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>>
>>>> ActiveMQTextMessage
>>>>
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>>> - Sending created message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
>>>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
>>>>> ActiveMQTextMessage
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>>> - Stopped.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
>>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
>>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>>> MessageDispatch
>>>>> {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
>>>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waiting 10 ms before attempting connection.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
>>>>> Failover Worker: 1889455" java.lang.NullPointerException
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>>
>>>>>
>>>>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>>> - Waking up reconnect task
>>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint done.
>>>>> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
>>>>> - Checkpoint started.
>>>>> ---------------------------------------------
>>>>>
>>>>>
>>>>> Basically, it was able to deliver a message (and few more prior to that
>>>>> time
>>>>> period), but for another message that is very close (in time) to the
>>>>> previous message it is running into a NullPointerException, after that
>>>>> it
>>>>> stops functioning totally.
>>>>>
>>>>> I took a brief look at the FailoverTransport.java code, I'm not an
>>>>> expert
>>>>> on
>>>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
>>>>> member
>>>>> variable is attempted to be used by the task-runner thread before it was
>>>>> completely initialized  (basically race conditions without proper
>>>>> synchronization)
>>>>>
>>>>> I can provide more details on our network topology if it is required. I
>>>>> searched around but didn't find any related issues or bugs. Does anyone
>>>>> know
>>>>> if this is a known issue, and which version this is going to be
>>>>>
>>>> addressed.
>>>>
>>>>> If not I'll open a JIRA.
>>>>>
>>>>> Appreciate your help.
>>>>>
>>>>> cheers
>>>>> - mdasari
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>>
>>>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
>>>>
>>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>

-- 

Dipl.-Ing. Norbert Pfistner
Softwareentwicklung

picturesafe GmbH
Simon-von-Utrecht-Straße 31-37
D-20359 Hamburg
http://www.picturesafe.de

fon: +49 40 374127 901
fax: +49 40 374127 999
npfistner@picturesafe.de

Sitz der Gesellschaft: Hannover
Geschäftsführer: Herbert Wirth
HR: Amtsgericht Hannover HR B 53 366

Re: FailoverTransport stops working after a while

Posted by Dejan Bosanac <de...@nighttale.net>.
Hi Murty,

 You don't need to provide your broker uri twice in the failover URI. If you
provide only one URI it will try to reconnect after the connection failures.
Can you try it like that and let us know your results?

Cheers
--
Dejan Bosanac

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net


On Fri, Apr 3, 2009 at 6:19 PM, Murty Dasari <md...@gmail.com> wrote:

> Thanks Dejan for the reply.
>
> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
> issue before I try pushing the new version to our servers (that is little
> lengthy process). I looked at the 5.2 source code and I suspect the problem
> is still there.
>
> I'm surprised to see that others are not running into any issues with it,
> may be there is something wrong with my topology and setup. Does the
> following setup look right?
>
> 1. We have a bunch of applications posting messages to a local
> (localhost) AMQ. (We have several boxes like this)
> 2. We setup a camel route to delivery the messages to a central AMQ host
> with durable subscription. (There is only one box like this)
>
> ----------------------------------------------------------------
>  <camelContext>
>  <route>
>            <from
>
> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>            <to uri="CENTRALMQ:topic:Topic1"/>
>  </route>
> ...... Few other routes
>    </camelContext>
>
>    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>        <property name="connectionFactory">
>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>               <property name="brokerURL"
> value="vm://LOCALMQ?broker.persistent=false" />
>            </bean>
>        </property>
>    </bean>
>    <bean id="CENTRALMQ"
> class="org.apache.camel.component.jms.JmsComponent">
>        <property name="connectionFactory">
>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>               <property name="brokerURL" value="failover://(tcp://
> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100"
> />
>            </bean>
>        </property>
>    </bean>
> -----------------------------------------
>
> The main change compared with other config I saw is, we are using failover
> with two end points that are same, basically with this model we were able
> to
> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
> problems. We need retries but not really failover (i.e, send to secondary
> if
> primary were down), as messages would still be there in LOCALMQ if there
> were some connectivity problems.
>
> Is there any other way to achieve retries without using "failover
> transport"?
>
> thanks for your time.
>
> cheers
> - mdasari
>
> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net>
> wrote:
>
> > Hi,
> >
> > did you try 5.2.0 version? Probably some of those issues are already
> > addressed.
> >
> > Cheers
> > --
> > Dejan Bosanac
> >
> > Open Source Integration - http://fusesource.com/
> > ActiveMQ in Action - http://www.manning.com/snyder/
> > Blog - http://www.nighttale.net
> >
> >
> > On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:
> >
> > >
> > > Hi,
> > >
> > > We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
> > > servers) after a while the AMQ failover transport stops working thus
> > making
> > > messages to be not delivered. (from a producer AMQ server box to a
> > central
> > > consumer AMQ server box through camel)
> > >
> > > --------------------------------------------------------------
> > > The following is the data from our log files:
> > > --------------------------------------------------------------
> > > INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
> > > - Connection established
> > > INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
> > > - Successfully connected to tcp://10.87.129.196:61616
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> > > - Executing callback on JMS Session: ActiveMQSession
> > > {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
> > > - Endpoint[centralMQ:topic:Topic1] sending JMS message:
> > ActiveMQTextMessage
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> > > - Sending created message: ActiveMQTextMessage {...}
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
> > > - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
> > > ActiveMQTextMessage
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
> > > - Stopped.
> > > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
> > > - Stopping transport tcp:///10.87.129.196:61616
> > > INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> > > - Checkpoint started.
> > > INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> > > - Checkpoint done.
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
> > > - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
> MessageDispatch
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
> > > - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
> > > receiving JMS message: ActiveMQTextMessage {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Waking up reconnect task
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Started.
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Waking up reconnect task
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Attempting connect to: tcp://10.87.129.196:61616
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > > - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
> > > CacheEnabled=true, SizePrefixDisabled=false,
> > > MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> > > MaxInactivityDuration=30000, TightEncodingEnabled=true,
> > > StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > > - Received WireFormat: WireFormatInfo { version=3,
> > > properties={CacheSize=1024, CacheEnabled=true,
> SizePrefixDisabled=false,
> > > MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> > > MaxInactivityDuration=30000, TightEncodingEnabled=true,
> > > StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > > - tcp:///10.87.129.196:61616 before negotiation:
> > OpenWireFormat{version=3,
> > > cacheEnabled=false, stackTraceEnabled=false,
> tightEncodingEnabled=false,
> > > sizePrefixDisabled=false}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > > - tcp:///10.87.129.196:61616 after negotiation:
> > OpenWireFormat{version=3,
> > > cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
> > > sizePrefixDisabled=false}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Connection established
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
> > > - Successfully connected to tcp://10.87.129.196:61616
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> > > - Executing callback on JMS Session: ActiveMQSession
> > > {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
> > > - Endpoint[centralMQ:topic:Topic1] sending JMS message:
> > ActiveMQTextMessage
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> > > - Sending created message: ActiveMQTextMessage {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
> > > - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
> > > ActiveMQTextMessage
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > > - Stopped.
> > > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
> > > - Stopping transport tcp:///10.87.129.196:61616
> > > INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
> > > - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
> MessageDispatch
> > > {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
> > > - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
> > > receiving JMS message: ActiveMQTextMessage {...}
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > > - Waiting 10 ms before attempting connection.
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
> > > Failover Worker: 1889455" java.lang.NullPointerException
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> > >
> > >
> >
> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> > >
> > >
> >
> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> > >
> > >
> >
> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > > - Waking up reconnect task
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > > - Started.
> > > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > > - Waking up reconnect task
> > > INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> > > - Checkpoint started.
> > > INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> > > - Checkpoint done.
> > > INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
> > > - Checkpoint started.
> > > ---------------------------------------------
> > >
> > >
> > > Basically, it was able to deliver a message (and few more prior to that
> > > time
> > > period), but for another message that is very close (in time) to the
> > > previous message it is running into a NullPointerException, after that
> it
> > > stops functioning totally.
> > >
> > > I took a brief look at the FailoverTransport.java code, I'm not an
> expert
> > > on
> > > the AMQ code, but I suspect that FailoverTransport.java reconnectTask
> > > member
> > > variable is attempted to be used by the task-runner thread before it
> was
> > > completely initialized  (basically race conditions without proper
> > > synchronization)
> > >
> > > I can provide more details on our network topology if it is required. I
> > > searched around but didn't find any related issues or bugs. Does anyone
> > > know
> > > if this is a known issue, and which version this is going to be
> > addressed.
> > > If not I'll open a JIRA.
> > >
> > > Appreciate your help.
> > >
> > > cheers
> > > - mdasari
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
> > > Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> > >
> > >
> >
>

Re: FailoverTransport stops working after a while

Posted by Dejan Bosanac <de...@nighttale.net>.
Hi Norbert,

this sounds like a different problem that this one. Take a look at
http://issues.apache.org/activemq/browse/AMQ-2149 which is being worked on
and give 5.3-SNAPSHOT a try.

Cheers
--
Dejan Bosanac

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net


On Mon, Apr 6, 2009 at 9:06 AM, Norbert Pfistner <
norbert.pfistner@picturesafe.de> wrote:

> Hallo Murty,
>
> We also experience the same problems when using failover: Sometimes clients
> stop working after a slave became a master and processing a bunch of
> messages with this new master.
> And yes, we use 5.1 . We also did some testing with 5.2, unfortunately with
> the same result. So it looks like 5.2 is suffering from the same bug.
> Actually we do not use failover in our productive environment due to this
> unreliable feature.
>
> Would be fine when this bug is fixed.
>
> Greetings,
> Norbert
>
>
> Murty Dasari schrieb:
>
>  Thanks Dejan for the reply.
>>
>> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
>> issue before I try pushing the new version to our servers (that is little
>> lengthy process). I looked at the 5.2 source code and I suspect the
>> problem
>> is still there.
>>
>> I'm surprised to see that others are not running into any issues with it,
>> may be there is something wrong with my topology and setup. Does the
>> following setup look right?
>>
>> 1. We have a bunch of applications posting messages to a local
>> (localhost) AMQ. (We have several boxes like this)
>> 2. We setup a camel route to delivery the messages to a central AMQ host
>> with durable subscription. (There is only one box like this)
>>
>> ----------------------------------------------------------------
>>  <camelContext>
>>  <route>
>>            <from
>>
>> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>>            <to uri="CENTRALMQ:topic:Topic1"/>
>>  </route>
>> ...... Few other routes
>>    </camelContext>
>>
>>    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>>        <property name="connectionFactory">
>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>               <property name="brokerURL"
>> value="vm://LOCALMQ?broker.persistent=false" />
>>            </bean>
>>        </property>
>>    </bean>
>>    <bean id="CENTRALMQ"
>> class="org.apache.camel.component.jms.JmsComponent">
>>        <property name="connectionFactory">
>>            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>>               <property name="brokerURL" value="failover://(tcp://
>> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100"
>> />
>>            </bean>
>>        </property>
>>    </bean>
>> -----------------------------------------
>>
>> The main change compared with other config I saw is, we are using failover
>> with two end points that are same, basically with this model we were able
>> to
>> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
>> problems. We need retries but not really failover (i.e, send to secondary
>> if
>> primary were down), as messages would still be there in LOCALMQ if there
>> were some connectivity problems.
>>
>> Is there any other way to achieve retries without using "failover
>> transport"?
>>
>> thanks for your time.
>>
>> cheers
>> - mdasari
>>
>> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net>
>> wrote:
>>
>>  Hi,
>>>
>>> did you try 5.2.0 version? Probably some of those issues are already
>>> addressed.
>>>
>>> Cheers
>>> --
>>> Dejan Bosanac
>>>
>>> Open Source Integration - http://fusesource.com/
>>> ActiveMQ in Action - http://www.manning.com/snyder/
>>> Blog - http://www.nighttale.net
>>>
>>>
>>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:
>>>
>>>  Hi,
>>>>
>>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
>>>> servers) after a while the AMQ failover transport stops working thus
>>>>
>>> making
>>>
>>>> messages to be not delivered. (from a producer AMQ server box to a
>>>>
>>> central
>>>
>>>> consumer AMQ server box through camel)
>>>>
>>>> --------------------------------------------------------------
>>>> The following is the data from our log files:
>>>> --------------------------------------------------------------
>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
>>>> - Connection established
>>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>> - Executing callback on JMS Session: ActiveMQSession
>>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>
>>> ActiveMQTextMessage
>>>
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>>> - Sending created message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
>>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
>>>> ActiveMQTextMessage
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
>>>> - Stopped.
>>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint done.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>> MessageDispatch
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
>>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Attempting connect to: tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
>>>> CacheEnabled=true, SizePrefixDisabled=false,
>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - Received WireFormat: WireFormatInfo { version=3,
>>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
>>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - tcp:///10.87.129.196:61616 before negotiation:
>>>>
>>> OpenWireFormat{version=3,
>>>
>>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
>>>> sizePrefixDisabled=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>>> - tcp:///10.87.129.196:61616 after negotiation:
>>>>
>>> OpenWireFormat{version=3,
>>>
>>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
>>>> sizePrefixDisabled=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Connection established
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
>>>> - Successfully connected to tcp://10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>> - Executing callback on JMS Session: ActiveMQSession
>>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
>>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>>>>
>>> ActiveMQTextMessage
>>>
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>>> - Sending created message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
>>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
>>>> ActiveMQTextMessage
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>>> - Stopped.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
>>>> - Stopping transport tcp:///10.87.129.196:61616
>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
>>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message:
>>>> MessageDispatch
>>>> {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
>>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
>>>> receiving JMS message: ActiveMQTextMessage {...}
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waiting 10 ms before attempting connection.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
>>>> Failover Worker: 1889455" java.lang.NullPointerException
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>>
>>>>
>>>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
>>>
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Started.
>>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>>> - Waking up reconnect task
>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint done.
>>>> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
>>>> - Checkpoint started.
>>>> ---------------------------------------------
>>>>
>>>>
>>>> Basically, it was able to deliver a message (and few more prior to that
>>>> time
>>>> period), but for another message that is very close (in time) to the
>>>> previous message it is running into a NullPointerException, after that
>>>> it
>>>> stops functioning totally.
>>>>
>>>> I took a brief look at the FailoverTransport.java code, I'm not an
>>>> expert
>>>> on
>>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
>>>> member
>>>> variable is attempted to be used by the task-runner thread before it was
>>>> completely initialized  (basically race conditions without proper
>>>> synchronization)
>>>>
>>>> I can provide more details on our network topology if it is required. I
>>>> searched around but didn't find any related issues or bugs. Does anyone
>>>> know
>>>> if this is a known issue, and which version this is going to be
>>>>
>>> addressed.
>>>
>>>> If not I'll open a JIRA.
>>>>
>>>> Appreciate your help.
>>>>
>>>> cheers
>>>> - mdasari
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>>
>>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
>>>
>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>

Re: FailoverTransport stops working after a while

Posted by Norbert Pfistner <no...@picturesafe.de>.
Hallo Murty,

We also experience the same problems when using failover: Sometimes 
clients stop working after a slave became a master and processing a 
bunch of messages with this new master.
And yes, we use 5.1 . We also did some testing with 5.2, unfortunately 
with the same result. So it looks like 5.2 is suffering from the same 
bug. Actually we do not use failover in our productive environment due 
to this unreliable feature.

Would be fine when this bug is fixed.

Greetings,
Norbert


Murty Dasari schrieb:
> Thanks Dejan for the reply.
> 
> I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
> issue before I try pushing the new version to our servers (that is little
> lengthy process). I looked at the 5.2 source code and I suspect the problem
> is still there.
> 
> I'm surprised to see that others are not running into any issues with it,
> may be there is something wrong with my topology and setup. Does the
> following setup look right?
> 
> 1. We have a bunch of applications posting messages to a local
> (localhost) AMQ. (We have several boxes like this)
> 2. We setup a camel route to delivery the messages to a central AMQ host
> with durable subscription. (There is only one box like this)
> 
> ----------------------------------------------------------------
>  <camelContext>
>  <route>
>             <from
> uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
>             <to uri="CENTRALMQ:topic:Topic1"/>
>  </route>
> ...... Few other routes
>     </camelContext>
> 
>     <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
>         <property name="connectionFactory">
>             <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>                <property name="brokerURL"
> value="vm://LOCALMQ?broker.persistent=false" />
>             </bean>
>         </property>
>     </bean>
>     <bean id="CENTRALMQ"
> class="org.apache.camel.component.jms.JmsComponent">
>         <property name="connectionFactory">
>             <bean class="org.apache.activemq.ActiveMQConnectionFactory">
>                <property name="brokerURL" value="failover://(tcp://
> 10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100" />
>             </bean>
>         </property>
>     </bean>
> -----------------------------------------
> 
> The main change compared with other config I saw is, we are using failover
> with two end points that are same, basically with this model we were able to
> achieve retries between LOCALMQ and CENTRALMQ if there were any connection
> problems. We need retries but not really failover (i.e, send to secondary if
> primary were down), as messages would still be there in LOCALMQ if there
> were some connectivity problems.
> 
> Is there any other way to achieve retries without using "failover
> transport"?
> 
> thanks for your time.
> 
> cheers
> - mdasari
> 
> On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net> wrote:
> 
>> Hi,
>>
>> did you try 5.2.0 version? Probably some of those issues are already
>> addressed.
>>
>> Cheers
>> --
>> Dejan Bosanac
>>
>> Open Source Integration - http://fusesource.com/
>> ActiveMQ in Action - http://www.manning.com/snyder/
>> Blog - http://www.nighttale.net
>>
>>
>> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
>>> servers) after a while the AMQ failover transport stops working thus
>> making
>>> messages to be not delivered. (from a producer AMQ server box to a
>> central
>>> consumer AMQ server box through camel)
>>>
>>> --------------------------------------------------------------
>>> The following is the data from our log files:
>>> --------------------------------------------------------------
>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
>>> - Connection established
>>> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
>>> - Successfully connected to tcp://10.87.129.196:61616
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>> - Executing callback on JMS Session: ActiveMQSession
>>> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>> ActiveMQTextMessage
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
>>> - Sending created message: ActiveMQTextMessage {...}
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
>>> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
>>> ActiveMQTextMessage
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
>>> - Stopped.
>>> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
>>> - Stopping transport tcp:///10.87.129.196:61616
>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>> - Checkpoint started.
>>> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
>>> - Checkpoint done.
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
>>> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
>>> receiving JMS message: ActiveMQTextMessage {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Waking up reconnect task
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Started.
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Waking up reconnect task
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Attempting connect to: tcp://10.87.129.196:61616
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
>>> CacheEnabled=true, SizePrefixDisabled=false,
>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>> - Received WireFormat: WireFormatInfo { version=3,
>>> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
>>> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
>>> MaxInactivityDuration=30000, TightEncodingEnabled=true,
>>> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>> - tcp:///10.87.129.196:61616 before negotiation:
>> OpenWireFormat{version=3,
>>> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
>>> sizePrefixDisabled=false}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
>>> - tcp:///10.87.129.196:61616 after negotiation:
>> OpenWireFormat{version=3,
>>> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
>>> sizePrefixDisabled=false}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Connection established
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
>>> - Successfully connected to tcp://10.87.129.196:61616
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>> - Executing callback on JMS Session: ActiveMQSession
>>> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
>>> - Endpoint[centralMQ:topic:Topic1] sending JMS message:
>> ActiveMQTextMessage
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
>>> - Sending created message: ActiveMQTextMessage {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
>>> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
>>> ActiveMQTextMessage
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
>>> - Stopped.
>>> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
>>> - Stopping transport tcp:///10.87.129.196:61616
>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
>>> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
>>> {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
>>> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
>>> receiving JMS message: ActiveMQTextMessage {...}
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>> - Waiting 10 ms before attempting connection.
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
>>> Failover Worker: 1889455" java.lang.NullPointerException
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>
>>>
>> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>
>>>
>> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>>>
>>>
>> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>> - Waking up reconnect task
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>> - Started.
>>> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
>>> - Waking up reconnect task
>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>> - Checkpoint started.
>>> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
>>> - Checkpoint done.
>>> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
>>> - Checkpoint started.
>>> ---------------------------------------------
>>>
>>>
>>> Basically, it was able to deliver a message (and few more prior to that
>>> time
>>> period), but for another message that is very close (in time) to the
>>> previous message it is running into a NullPointerException, after that it
>>> stops functioning totally.
>>>
>>> I took a brief look at the FailoverTransport.java code, I'm not an expert
>>> on
>>> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
>>> member
>>> variable is attempted to be used by the task-runner thread before it was
>>> completely initialized  (basically race conditions without proper
>>> synchronization)
>>>
>>> I can provide more details on our network topology if it is required. I
>>> searched around but didn't find any related issues or bugs. Does anyone
>>> know
>>> if this is a known issue, and which version this is going to be
>> addressed.
>>> If not I'll open a JIRA.
>>>
>>> Appreciate your help.
>>>
>>> cheers
>>> - mdasari
>>>
>>>
>>> --
>>> View this message in context:
>>>
>> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>
>>>

Re: FailoverTransport stops working after a while

Posted by Murty Dasari <md...@gmail.com>.
Thanks Dejan for the reply.

I've not tried with 5.2 as yet, but I wanted to get a confirmation on the
issue before I try pushing the new version to our servers (that is little
lengthy process). I looked at the 5.2 source code and I suspect the problem
is still there.

I'm surprised to see that others are not running into any issues with it,
may be there is something wrong with my topology and setup. Does the
following setup look right?

1. We have a bunch of applications posting messages to a local
(localhost) AMQ. (We have several boxes like this)
2. We setup a camel route to delivery the messages to a central AMQ host
with durable subscription. (There is only one box like this)

----------------------------------------------------------------
 <camelContext>
 <route>
            <from
uri="LOCALMQ:topic:Topic1?clientId=prod1-Topic1&amp;durableSubscriptionName=prod1-Topic1&amp;subscriptionDurable=true"/>
            <to uri="CENTRALMQ:topic:Topic1"/>
 </route>
...... Few other routes
    </camelContext>

    <bean id="LOCALMQ" class="org.apache.camel.component.jms.JmsComponent">
        <property name="connectionFactory">
            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
               <property name="brokerURL"
value="vm://LOCALMQ?broker.persistent=false" />
            </bean>
        </property>
    </bean>
    <bean id="CENTRALMQ"
class="org.apache.camel.component.jms.JmsComponent">
        <property name="connectionFactory">
            <bean class="org.apache.activemq.ActiveMQConnectionFactory">
               <property name="brokerURL" value="failover://(tcp://
10.87.129.196:61616,tcp://10.87.129.196:61616)?initialReconnectDelay=100" />
            </bean>
        </property>
    </bean>
-----------------------------------------

The main change compared with other config I saw is, we are using failover
with two end points that are same, basically with this model we were able to
achieve retries between LOCALMQ and CENTRALMQ if there were any connection
problems. We need retries but not really failover (i.e, send to secondary if
primary were down), as messages would still be there in LOCALMQ if there
were some connectivity problems.

Is there any other way to achieve retries without using "failover
transport"?

thanks for your time.

cheers
- mdasari

On Fri, Apr 3, 2009 at 12:36 AM, Dejan Bosanac <de...@nighttale.net> wrote:

> Hi,
>
> did you try 5.2.0 version? Probably some of those issues are already
> addressed.
>
> Cheers
> --
> Dejan Bosanac
>
> Open Source Integration - http://fusesource.com/
> ActiveMQ in Action - http://www.manning.com/snyder/
> Blog - http://www.nighttale.net
>
>
> On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:
>
> >
> > Hi,
> >
> > We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
> > servers) after a while the AMQ failover transport stops working thus
> making
> > messages to be not delivered. (from a producer AMQ server box to a
> central
> > consumer AMQ server box through camel)
> >
> > --------------------------------------------------------------
> > The following is the data from our log files:
> > --------------------------------------------------------------
> > INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
> > - Connection established
> > INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
> > - Successfully connected to tcp://10.87.129.196:61616
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> > - Executing callback on JMS Session: ActiveMQSession
> > {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
> > - Endpoint[centralMQ:topic:Topic1] sending JMS message:
> ActiveMQTextMessage
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> > - Sending created message: ActiveMQTextMessage {...}
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
> > - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
> > ActiveMQTextMessage
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
> > - Stopped.
> > INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
> > - Stopping transport tcp:///10.87.129.196:61616
> > INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> > - Checkpoint started.
> > INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> > - Checkpoint done.
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
> > - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
> > - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
> > receiving JMS message: ActiveMQTextMessage {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Waking up reconnect task
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Started.
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Waking up reconnect task
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Attempting connect to: tcp://10.87.129.196:61616
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
> > CacheEnabled=true, SizePrefixDisabled=false,
> > MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> > MaxInactivityDuration=30000, TightEncodingEnabled=true,
> > StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > - Received WireFormat: WireFormatInfo { version=3,
> > properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
> > MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> > MaxInactivityDuration=30000, TightEncodingEnabled=true,
> > StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > - tcp:///10.87.129.196:61616 before negotiation:
> OpenWireFormat{version=3,
> > cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
> > sizePrefixDisabled=false}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> > - tcp:///10.87.129.196:61616 after negotiation:
> OpenWireFormat{version=3,
> > cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
> > sizePrefixDisabled=false}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Connection established
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
> > - Successfully connected to tcp://10.87.129.196:61616
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> > - Executing callback on JMS Session: ActiveMQSession
> > {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
> > - Endpoint[centralMQ:topic:Topic1] sending JMS message:
> ActiveMQTextMessage
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> > - Sending created message: ActiveMQTextMessage {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
> > - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
> > ActiveMQTextMessage
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> > - Stopped.
> > INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
> > - Stopping transport tcp:///10.87.129.196:61616
> > INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
> > - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
> > {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
> > - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
> > receiving JMS message: ActiveMQTextMessage {...}
> > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > - Waiting 10 ms before attempting connection.
> > INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
> > Failover Worker: 1889455" java.lang.NullPointerException
> > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> >
> >
> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
> > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> >
> >
> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
> > INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
> >
> >
> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
> > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > - Waking up reconnect task
> > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > - Started.
> > INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> > - Waking up reconnect task
> > INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> > - Checkpoint started.
> > INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> > - Checkpoint done.
> > INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
> > - Checkpoint started.
> > ---------------------------------------------
> >
> >
> > Basically, it was able to deliver a message (and few more prior to that
> > time
> > period), but for another message that is very close (in time) to the
> > previous message it is running into a NullPointerException, after that it
> > stops functioning totally.
> >
> > I took a brief look at the FailoverTransport.java code, I'm not an expert
> > on
> > the AMQ code, but I suspect that FailoverTransport.java reconnectTask
> > member
> > variable is attempted to be used by the task-runner thread before it was
> > completely initialized  (basically race conditions without proper
> > synchronization)
> >
> > I can provide more details on our network topology if it is required. I
> > searched around but didn't find any related issues or bugs. Does anyone
> > know
> > if this is a known issue, and which version this is going to be
> addressed.
> > If not I'll open a JIRA.
> >
> > Appreciate your help.
> >
> > cheers
> > - mdasari
> >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
> > Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> >
> >
>

Re: FailoverTransport stops working after a while

Posted by Dejan Bosanac <de...@nighttale.net>.
Hi,

did you try 5.2.0 version? Probably some of those issues are already
addressed.

Cheers
--
Dejan Bosanac

Open Source Integration - http://fusesource.com/
ActiveMQ in Action - http://www.manning.com/snyder/
Blog - http://www.nighttale.net


On Thu, Apr 2, 2009 at 6:47 PM, mdasari <md...@gmail.com> wrote:

>
> Hi,
>
> We are using AMQ 5.1.0 on some of our servers. We noticed that (on few
> servers) after a while the AMQ failover transport stops working thus making
> messages to be not delivered. (from a producer AMQ server box to a central
> consumer AMQ server box through camel)
>
> --------------------------------------------------------------
> The following is the data from our log files:
> --------------------------------------------------------------
> INFO   | jvm 1    | 2009/03/16 21:25:42 | DEBUG FailoverTransport
> - Connection established
> INFO   | jvm 1    | 2009/03/16 21:25:42 | INFO  FailoverTransport
> - Successfully connected to tcp://10.87.129.196:61616
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> - Executing callback on JMS Session: ActiveMQSession
> {id=ID:LOCALMQ-3675-1236961500048-2:218:1,started=false}
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsProducer
> - Endpoint[centralMQ:topic:Topic1] sending JMS message: ActiveMQTextMessage
> {...}
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG JmsConfiguration$2
> - Sending created message: ActiveMQTextMessage {...}
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG ActiveMQSession
> - ID:LOCALMQ-3675-1236961500048-2:218:1 sending message:
> ActiveMQTextMessage
> {...}
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG FailoverTransport
> - Stopped.
> INFO   | jvm 1    | 2009/03/16 21:25:43 | DEBUG TcpTransport
> - Stopping transport tcp:///10.87.129.196:61616
> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> - Checkpoint started.
> INFO   | jvm 1    | 2009/03/16 21:26:00 | DEBUG AMQPersistenceAdapter
> - Checkpoint done.
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQMessageConsumer
> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
> {...}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG EndpointMessageListener
> - Endpoint[localMQ:topic:Topic1?clientId=...&subscriptionDurable=true]
> receiving JMS message: ActiveMQTextMessage {...}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Waking up reconnect task
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Started.
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Waking up reconnect task
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Attempting connect to: tcp://10.87.129.196:61616
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> - Sending: WireFormatInfo { version=3, properties={CacheSize=1024,
> CacheEnabled=true, SizePrefixDisabled=false,
> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> MaxInactivityDuration=30000, TightEncodingEnabled=true,
> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> - Received WireFormat: WireFormatInfo { version=3,
> properties={CacheSize=1024, CacheEnabled=true, SizePrefixDisabled=false,
> MaxInactivityDurationInitalDelay=10000, TcpNoDelayEnabled=true,
> MaxInactivityDuration=30000, TightEncodingEnabled=true,
> StackTraceEnabled=true}, magic=[A,c,t,i,v,e,M,Q]}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> - tcp:///10.87.129.196:61616 before negotiation: OpenWireFormat{version=3,
> cacheEnabled=false, stackTraceEnabled=false, tightEncodingEnabled=false,
> sizePrefixDisabled=false}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG WireFormatNegotiator
> - tcp:///10.87.129.196:61616 after negotiation: OpenWireFormat{version=3,
> cacheEnabled=true, stackTraceEnabled=true, tightEncodingEnabled=true,
> sizePrefixDisabled=false}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Connection established
> INFO   | jvm 1    | 2009/03/16 21:26:13 | INFO  FailoverTransport
> - Successfully connected to tcp://10.87.129.196:61616
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> - Executing callback on JMS Session: ActiveMQSession
> {id=ID:LOCALMQ-3675-1236961500048-2:219:1,started=false}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsProducer
> - Endpoint[centralMQ:topic:Topic1] sending JMS message: ActiveMQTextMessage
> {...}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG JmsConfiguration$2
> - Sending created message: ActiveMQTextMessage {...}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG ActiveMQSession
> - ID:LOCALMQ-3675-1236961500048-2:219:1 sending message:
> ActiveMQTextMessage
> {...}
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG FailoverTransport
> - Stopped.
> INFO   | jvm 1    | 2009/03/16 21:26:13 | DEBUG TcpTransport
> - Stopping transport tcp:///10.87.129.196:61616
> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG ActiveMQMessageConsumer
> - ID:LOCALMQ-3675-1236961500048-2:0:1:1 received message: MessageDispatch
> {...}
> INFO   | jvm 1    | 2009/03/16 21:26:14 | DEBUG EndpointMessageListener
> - Endpoint[localmq:topic:Topic1?clientId=...&subscriptionDurable=true]
> receiving JMS message: ActiveMQTextMessage {...}
> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> - Waiting 10 ms before attempting connection.
> INFO   | jvm 1    | 2009/03/16 21:26:15 | Exception in thread "ActiveMQ
> Failover Worker: 1889455" java.lang.NullPointerException
> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>
> org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:124)
> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>
> org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)
> INFO   | jvm 1    | 2009/03/16 21:26:15 |       at
>
> org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)
> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> - Waking up reconnect task
> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> - Started.
> INFO   | jvm 1    | 2009/03/16 21:26:15 | DEBUG FailoverTransport
> - Waking up reconnect task
> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> - Checkpoint started.
> INFO   | jvm 1    | 2009/03/16 21:27:00 | DEBUG AMQPersistenceAdapter
> - Checkpoint done.
> INFO   | jvm 1    | 2009/03/16 21:28:00 | DEBUG AMQPersistenceAdapter
> - Checkpoint started.
> ---------------------------------------------
>
>
> Basically, it was able to deliver a message (and few more prior to that
> time
> period), but for another message that is very close (in time) to the
> previous message it is running into a NullPointerException, after that it
> stops functioning totally.
>
> I took a brief look at the FailoverTransport.java code, I'm not an expert
> on
> the AMQ code, but I suspect that FailoverTransport.java reconnectTask
> member
> variable is attempted to be used by the task-runner thread before it was
> completely initialized  (basically race conditions without proper
> synchronization)
>
> I can provide more details on our network topology if it is required. I
> searched around but didn't find any related issues or bugs. Does anyone
> know
> if this is a known issue, and which version this is going to be addressed.
> If not I'll open a JIRA.
>
> Appreciate your help.
>
> cheers
> - mdasari
>
>
> --
> View this message in context:
> http://www.nabble.com/FailoverTransport-stops-working-after-a-while-tp22851122p22851122.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>