You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Francois Guillemette <fr...@trilliantnetworks.com> on 2008/04/01 21:47:22 UTC

Messages hang in Queue even when consumers are present after failover

Hi, I'm using build version 643461 of activemq (windows XP, Java 6, MySQL).
(svn link https://svn.apache.org/repos/asf/activemq/trunk)

Sometime, a (or some) message(s) hang in the queue while no consumer eat it.
It happen after a failover. 

Scenario:
2 brokers (jdbc master/slave), 2 consumers, 2 producers

Producers :
  ant producer
-Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)"
-Ddurable=true -Dmax=500
Consumer 1: 
  ant consumer
-Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
-DclientId=c1
Consumer 2:
  ant consumer
-Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
-DclientId=c2

1 - Start the two brokers (one will be master, the other will be slave)
2 - Start the producers, consumers
3 - Wait a little,
4 - Kill the master -> slave become master
5 - Producers continue producing, consumers continue consuming
6 - After all producers finish their task, the consumer will finish
consuming, and sometimes there still messages left in the queue (in the
database, and using JMX to see the state of the queue).
7 - Restart a new broker, kill the master
8 - The messages will be consumed

Is there something missing in my test?

I have seen older post that seem related to this problem, but I didn't have
found any bug in Jira for this problem, is there an entry in Jira?

Thanks
-- 
View this message in context: http://www.nabble.com/Messages-hang-in-Queue-even-when-consumers-are-present-after-failover-tp16422835s2354p16422835.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Messages hang in Queue even when consumers are present after failover

Posted by Francois Guillemette <fr...@trilliantinc.com>.
More info for this problem:

Producer output

     [java] Sending message: A: Message: 67 sent at: Thu Apr 03 17:34:22 EDT
20...
     [java] Sending message: A: Message: 68 sent at: Thu Apr 03 17:34:23 EDT
20...
     [java] Sending message: A: Message: 69 sent at: Thu Apr 03 17:34:23 EDT
20...
     [java] 17:34:40 WARN  Transport failed, attempting to automatically
reconnect due to: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too long: localhost/127.0.0.1:61618
     [java] 17:34:40 INFO  Successfully reconnected to tcp://localhost:61619
     [java] Sending message: A: Message: 70 sent at: Thu Apr 03 17:34:40 EDT
20...
     [java] Sending message: A: Message: 71 sent at: Thu Apr 03 17:34:40 EDT
20...
     [java] Sending message: A: Message: 72 sent at: Thu Apr 03 17:34:40 EDT
20...

Consumer output

     [java] Received: A: Message: 67 sent at: Thu Apr 03 17:34:22 EDT 20...
     [java] Received: A: Message: 68 sent at: Thu Apr 03 17:34:23 EDT 20...
     [java] Received: A: Message: 69 sent at: Thu Apr 03 17:34:23 EDT 20...
     [java] 17:34:35 WARN  Transport failed, attempting to automatically
reconnect due to: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too long: localhost/127.0.0.1:61618
     [java] 17:34:35 INFO  Successfully reconnected to tcp://localhost:61619
     [java] Received: A: Message: 34 sent at: Thu Apr 03 17:34:10 EDT 20...
     [java] Received: A: Message: 71 sent at: Thu Apr 03 17:34:40 EDT 20...
     [java] Received: A: Message: 72 sent at: Thu Apr 03 17:34:40 EDT 20...
     [java] Received: A: Message: 73 sent at: Thu Apr 03 17:34:41 EDT 20...
     [java] Received: A: Message: 74 sent at: Thu Apr 03 17:34:41 EDT 20...

Message 70 was not consumed and it is left in the database (I can also see
it in the JMX console). Also in this log, why message 34 consumed out of
order, it was put in the queue before message 69?

Starting a new consumer will not consume message 70, it look like the
lastMessageId in JDBCMessageStore has skip for some reason the id of message
70.  The only way to consume this message was to do a failover again.

Is anyone have seen this behavior?


Francois Guillemette wrote:
> 
> Hi, I'm using build version 643461 of activemq (windows XP, Java 6,
> MySQL). (svn link https://svn.apache.org/repos/asf/activemq/trunk)
> 
> Sometime, a (or some) message(s) hang in the queue while no consumer eat
> it. It happen after a failover. 
> 
> Scenario:
> 2 brokers (jdbc master/slave), 2 consumers, 2 producers
> 
> Producers :
>   ant producer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)"
> -Ddurable=true -Dmax=500
> Consumer 1: 
>   ant consumer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
> -DclientId=c1
> Consumer 2:
>   ant consumer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
> -DclientId=c2
> 
> 1 - Start the two brokers (one will be master, the other will be slave)
> 2 - Start the producers, consumers
> 3 - Wait a little,
> 4 - Kill the master -> slave become master
> 5 - Producers continue producing, consumers continue consuming
> 6 - After all producers finish their task, the consumer will finish
> consuming, and sometimes there still messages left in the queue (in the
> database, and using JMX to see the state of the queue).
> 7 - Restart a new broker, kill the master
> 8 - The messages will be consumed
> 
> Is there something missing in my test?
> 
> I have seen older post that seem related to this problem, but I didn't
> have found any bug in Jira for this problem, is there an entry in Jira?
> 
> Thanks
> 

-- 
View this message in context: http://www.nabble.com/Messages-hang-in-Queue-even-when-consumers-are-present-after-failover-tp16422835s2354p16491412.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Messages hang in Queue even when consumers are present after failover

Posted by Francois Guillemette <fr...@trilliantinc.com>.
More info for this problem:

Producer output

     [java] Sending message: A: Message: 67 sent at: Thu Apr 03 17:34:22 EDT
20...
     [java] Sending message: A: Message: 68 sent at: Thu Apr 03 17:34:23 EDT
20...
     [java] Sending message: A: Message: 69 sent at: Thu Apr 03 17:34:23 EDT
20...
     [java] 17:34:40 WARN  Transport failed, attempting to automatically
reconnect due to: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too long: localhost/127.0.0.1:61618
     [java] 17:34:40 INFO  Successfully reconnected to tcp://localhost:61619
     [java] Sending message: A: Message: 70 sent at: Thu Apr 03 17:34:40 EDT
20...
     [java] Sending message: A: Message: 71 sent at: Thu Apr 03 17:34:40 EDT
20...
     [java] Sending message: A: Message: 72 sent at: Thu Apr 03 17:34:40 EDT
20...

Consumer output

     [java] Received: A: Message: 67 sent at: Thu Apr 03 17:34:22 EDT 20...
     [java] Received: A: Message: 68 sent at: Thu Apr 03 17:34:23 EDT 20...
     [java] Received: A: Message: 69 sent at: Thu Apr 03 17:34:23 EDT 20...
     [java] 17:34:35 WARN  Transport failed, attempting to automatically
reconnect due to: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too long: localhost/127.0.0.1:61618
     [java] 17:34:35 INFO  Successfully reconnected to tcp://localhost:61619
     [java] Received: A: Message: 34 sent at: Thu Apr 03 17:34:10 EDT 20...
     [java] Received: A: Message: 71 sent at: Thu Apr 03 17:34:40 EDT 20...
     [java] Received: A: Message: 72 sent at: Thu Apr 03 17:34:40 EDT 20...
     [java] Received: A: Message: 73 sent at: Thu Apr 03 17:34:41 EDT 20...
     [java] Received: A: Message: 74 sent at: Thu Apr 03 17:34:41 EDT 20...

Message 70 was not consumed and it is left in the database (I can also see
it in the JMX console). Also in this log, why message 34 consumed out of
order, it was put in the queue before message 69?

Starting a new consumer will not consume message 70, it look like the
lastMessageId in JDBCMessageStore has skip for some reason the id of message
70.  The only way to consume this message was to do a failover again.

Is anyone have seen this behavior?


Francois Guillemette wrote:
> 
> Hi, I'm using build version 643461 of activemq (windows XP, Java 6,
> MySQL). (svn link https://svn.apache.org/repos/asf/activemq/trunk)
> 
> Sometime, a (or some) message(s) hang in the queue while no consumer eat
> it. It happen after a failover. 
> 
> Scenario:
> 2 brokers (jdbc master/slave), 2 consumers, 2 producers
> 
> Producers :
>   ant producer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)"
> -Ddurable=true -Dmax=500
> Consumer 1: 
>   ant consumer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
> -DclientId=c1
> Consumer 2:
>   ant consumer
> -Durl="failover:(tcp://localhost:61618,tcp://localhost:61619)" -Dmax=10000
> -DclientId=c2
> 
> 1 - Start the two brokers (one will be master, the other will be slave)
> 2 - Start the producers, consumers
> 3 - Wait a little,
> 4 - Kill the master -> slave become master
> 5 - Producers continue producing, consumers continue consuming
> 6 - After all producers finish their task, the consumer will finish
> consuming, and sometimes there still messages left in the queue (in the
> database, and using JMX to see the state of the queue).
> 7 - Restart a new broker, kill the master
> 8 - The messages will be consumed
> 
> Is there something missing in my test?
> 
> I have seen older post that seem related to this problem, but I didn't
> have found any bug in Jira for this problem, is there an entry in Jira?
> 
> Thanks
> 

-- 
View this message in context: http://www.nabble.com/Messages-hang-in-Queue-even-when-consumers-are-present-after-failover-tp16422835s2354p16491412.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Messages hang in Queue even when consumers are present after failover

Posted by Francois Guillemette <fr...@trilliantinc.com>.
Okay, I think I found the problem.

There is a race condition between the time the message is set with the
broker sequence number (RegionBroker.java in send method), and the time it
is actually put in the database (DefaultJDBCAdapter.java in doAddMessage
method). 

I have seen that sometimes message with higher sequence number are put in
database before a lower sequence number.  For example: 386 is put in the
database before 385. If it is happening when JDBCMessageStore is recovering
the next message (lastMessageId is 384), then 386 will be fetched and the
lastMessageId will change to be 386. 385 is then put in the db but never
retrieved (stopping and restarting the broker will allow to retrieve the
message because at start the lastMessageId is -1).

I have synchronized the code inside the RegionBroker.send, and I don't have
gaps anymore. This is a workaround for us since we don't process a lot of
message. But maybe a more elegant solution is to set the brokerSequenceId in
doAddMessage of JDBCAdapter (I may be wrong, I didn't check if the
brokerSequenceId is used elsewhere).

Is anyone can confirm that this is bug? Thanks

-- 
View this message in context: http://www.nabble.com/Messages-hang-in-Queue-even-when-consumers-are-present-after-failover-tp16422835s2354p16587198.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.