You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Danielius Jurna <da...@elitnet.lt> on 2006/11/07 08:46:05 UTC

Bug in failover transport

I think there's a bug in failover transport (in version 4.0.2).
After failover transport reconnect, if there is prefetched messages on the
client, client sends invalid ack messages to the server. After that client
stops receiving messages and it must be restarted.
 This happens only when there are more messages in the queue, than queue
prefetch size.

On the client we receive lots of warrnings like this:

2006-11-07 09:34:13,614 WARN  [AcitveMQ Connection Worker:
tcp://localhost:61616] org.apache.activemq.ActiveMQConnection - Async
exception with no exception listener: javax.jms.JMSException: Invalid
acknowledgment: MessageAck {commandId = 200, responseRequired = false,
ackType = 2, consumerId = ID:dj-34091-1162884805650-2:0:2:1, firstMessageId
= null, lastMessageId = ID:dj-34091-1162884805650-2:0:1:1:91, destination =
queue://Test, transactionId = null, messageCount = 1}
javax.jms.JMSException: Invalid acknowledgment: MessageAck {commandId = 200,
responseRequired = false, ackType = 2, consumerId =
ID:dj-34091-1162884805650-2:0:2:1, firstMessageId = null, lastMessageId =
ID:dj-34091-1162884805650-2:0:1:1:91, destination = queue://Test,
transactionId = null, messageCount = 1}
	at
org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:185)
	at
org.apache.activemq.broker.region.AbstractRegion.acknowledge(AbstractRegion.java:234)
	at
org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBroker.java:366)
	at
org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionBroker.java:177)
	at
org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:66)
	at
org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:66)
	at
org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBrokerFilter.java:79)
	at
org.apache.activemq.broker.AbstractConnection.processMessageAck(AbstractConnection.java:441)
	at org.apache.activemq.command.MessageAck.visit(MessageAck.java:179)
	at
org.apache.activemq.broker.AbstractConnection.service(AbstractConnection.java:237)
	at
org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:61)
	at
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:92)
	at
org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:67)
	at
org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:124)
	at
org.apache.activemq.transport.InactivityMonitor.onCommand(InactivityMonitor.java:123)
	at
org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:88)
	at
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:137)
	at java.lang.Thread.run(Thread.java:595)

And on the server a lot of these entries:

INFO  PrefetchSubscription           - Could not correlate acknowledgment
with dispatched message: MessageAck {commandId = 200, responseRequired =
false, ackType = 2, consumerId = ID:dj-34091-1162884805650-2:0:2:1,
firstMessageId = null, lastMessageId = ID:dj-34091-1162884805650-2:0:1:1:91,
destination = queue://Test, transactionId = null, messageCount = 1}

There's an Jira issue whith a test case attached to it.
https://issues.apache.org/activemq/browse/AMQ-1027

Any suggestions or workarounds? Setting perfetch size to 1 desn't help
-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7213887
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by kieran1 <KM...@herzumsoftware.com>.
My testing also indicates that if you shut down that consumer before it has
processed the prefetched messages, those messages would be lost -- they
would not be consumed by a new consumer at all.
-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7607926
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
Right, I haven't noticed that before. That explains strange behaviour of our
system during failover tests (messages that are out of order)


kieran1 wrote:
> 
> In my testing of this scenario, it appears that at step 9 -- 11 messages
> are sent to the consumer -- there is a problem: the first of those 11
> messages is the new message sent in step 8.  
> 
> That new message should not be consumed until all of the old messages have
> been consumed, right?
> 
> I have been testing this with the same problem on both 4.0.1 and 4.0.2.
> 
> Kieran
> 
> 

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7593969
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by kieran1 <KM...@herzumsoftware.com>.
In my testing of this scenario, it appears that at step 9 -- 11 messages are
sent to the consumer -- there is a problem: the first of those 11 messages
is the new message sent in step 8.  

That new message should not be consumed until all of the old messages have
been consumed, right?

I have been testing this with the same problem on both 4.0.1 and 4.0.2.

Kieran


Danielius Jurna wrote:
> 
> Thanks for your explanations. It gave a little light for this problem.
> 
> After some more investigation, it seems, that there is only minor issue
> regarding failover.
> If there's more messages in the queue, than queue prefetch size, messages
> that are not yet delivered to the client before failover are still kept in
> the queue. They are only delivered when new message is sent to the queue.
> This is a little complicated. This is the scenario:
> 1. Queue prefetch size is set to 90.
> 2. 100 messages are sent to queue.
> 3. 90 messages are prefetched to the client.
> 4. Connection is recovered.
> 5. Lots of invalid ack warrnings (this is correct bahavior)
> 6. Client sends valid ack messages.
> 7. There are still 10 messages in the queue which somehow got stuck on the
> broker. Client doesn't receive anything. (incorrect behaviour)
> 8. One message is sent to the qeue.
> 9. 11 Messages are delivered to client (correct behaviour).
> 
> So the only problem, is that some messages are waiting till somebody
> publishes something to the queue. So for me it's only a minor issue,
> because our queues are quite busy :-)
> 
> 
> Hiram Chirino wrote:
>> 
>> This is tricky bug, but it might be normal.  It gets a little
>> complicated, but let me try to explain what I think is happening:
>> 
>>  (1) Say broker sends messages A,B,C,D to client
>>   (2) Client acks message A and B
>>   (3) Connection failure occurs while ack B is being delivered to broker
>>  (4) Connection is 'recovered'
>>        (4.1) If the clients previous connection is still connected to
>> the broker (client reconnect quicker than broker can detect client
>> failure), then the broker forcibly disconnects the previous
>> connection.  This cause all previous subscriptions to be destroyed.
>>        (4.2) Client replays all connection state including open
>> subscriptions to the broker
>>        (4.3) Client resends the ack that was in flight when the
>> failure occurred.
>>        (4.4) Since broker has not sent the new subscription any
>> messages yet, it does not match anything sent to the client and we get
>> the "Invalid acknowledgment" message.
>>        (4.5) Broker re-sends message B,C,D to client
>> 
>> Now in the scenario above the worst case is that B is delivered 2
>> times.  But since 4.5 and 4.4 occur concurrently there could be some
>> subtle bugs in the code that need a closer look.
>> 
>> Anyways.. I hope this help shed some light into the issue.
>> 
>> 
>> -- 
>> Regards,
>> Hiram
>> 
>> Blog: http://hiramchirino.com
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7588764
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
Thanks for your explanations. It gave a little light for this problem.

After some more investigation, it seems, that there is only minor issue
regarding failover.
If there's more messages in the queue, than queue prefetch size, messages
that are not yet delivered to the client before failover are still kept in
the queue. They are only delivered when new message is sent to the queue.
This is a little complicated. This is the scenario:
1. Queue prefetch size is set to 90.
2. 100 messages are sent to queue.
3. 90 messages are prefetched to the client.
4. Connection is recovered.
5. Lots of invalid ack warrnings (this is correct bahavior)
6. Client sends valid ack messages.
7. There are still 10 messages in the queue which somehow got stuck on the
broker. Client doesn't receive anything. (incorrect behaviour)
8. One message is sent to the qeue.
9. 11 Messages are delivered to client (correct behaviour).

So the only problem, is that some messages are waiting till somebody
publishes something to the queue. So for me it's only a minor issue, because
our queues are quite busy :-)


Hiram Chirino wrote:
> 
> This is tricky bug, but it might be normal.  It gets a little
> complicated, but let me try to explain what I think is happening:
> 
>  (1) Say broker sends messages A,B,C,D to client
>   (2) Client acks message A and B
>   (3) Connection failure occurs while ack B is being delivered to broker
>  (4) Connection is 'recovered'
>        (4.1) If the clients previous connection is still connected to
> the broker (client reconnect quicker than broker can detect client
> failure), then the broker forcibly disconnects the previous
> connection.  This cause all previous subscriptions to be destroyed.
>        (4.2) Client replays all connection state including open
> subscriptions to the broker
>        (4.3) Client resends the ack that was in flight when the
> failure occurred.
>        (4.4) Since broker has not sent the new subscription any
> messages yet, it does not match anything sent to the client and we get
> the "Invalid acknowledgment" message.
>        (4.5) Broker re-sends message B,C,D to client
> 
> Now in the scenario above the worst case is that B is delivered 2
> times.  But since 4.5 and 4.4 occur concurrently there could be some
> subtle bugs in the code that need a closer look.
> 
> Anyways.. I hope this help shed some light into the issue.
> 
> 
> -- 
> Regards,
> Hiram
> 
> Blog: http://hiramchirino.com
> 

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7290557
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by Hiram Chirino <hi...@hiramchirino.com>.
This is tricky bug, but it might be normal.  It gets a little
complicated, but let me try to explain what I think is happening:

 (1) Say broker sends messages A,B,C,D to client
  (2) Client acks message A and B
  (3) Connection failure occurs while ack B is being delivered to broker
 (4) Connection is 'recovered'
       (4.1) If the clients previous connection is still connected to
the broker (client reconnect quicker than broker can detect client
failure), then the broker forcibly disconnects the previous
connection.  This cause all previous subscriptions to be destroyed.
       (4.2) Client replays all connection state including open
subscriptions to the broker
       (4.3) Client resends the ack that was in flight when the
failure occurred.
       (4.4) Since broker has not sent the new subscription any
messages yet, it does not match anything sent to the client and we get
the "Invalid acknowledgment" message.
       (4.5) Broker re-sends message B,C,D to client

Now in the scenario above the worst case is that B is delivered 2
times.  But since 4.5 and 4.4 occur concurrently there could be some
subtle bugs in the code that need a closer look.

Anyways.. I hope this help shed some light into the issue.




On 11/10/06, Danielius Jurna <da...@elitnet.lt> wrote:
>
> No. I don't have network of brokers. And error log in the broker shows, that
> client sends ack to correct broker, which cannot correlate ACK. It's very
> easy to reproduce this bug. I've made a test case, which you can find in the
> jira issue mentioned in the first message.
>
>
> yaussy wrote:
> >
> > If you have a network of brokers, does this problem happen if the consumer
> > fails over to another, running, broker?
> >
>
> --
> View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7271859
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Hiram

Blog: http://hiramchirino.com

Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
No. I don't have network of brokers. And error log in the broker shows, that
client sends ack to correct broker, which cannot correlate ACK. It's very
easy to reproduce this bug. I've made a test case, which you can find in the
jira issue mentioned in the first message.


yaussy wrote:
> 
> If you have a network of brokers, does this problem happen if the consumer
> fails over to another, running, broker?
> 

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7271859
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by yaussy <ya...@cboe.com>.
If you have a network of brokers, does this problem happen if the consumer
fails over to another, running, broker?


Danielius Jurna wrote:
> 
> Also in 4.1 branch
> 

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7265309
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
Also in 4.1 branch
-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7261560
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
Yes, this bug still exists in 4.0 branch. Haven't checked 4.1 yet.


-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7261062
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by James Strachan <ja...@gmail.com>.
Could you double check this is still the case with 4.1-SNAPSHOT or 4.0
branch? A number of bugs have been fixed recently in this area...

http://issues.apache.org/activemq/browse/AMQ-1034
http://issues.apache.org/activemq/browse/AMQ-1031
http://issues.apache.org/activemq/browse/AMQ-1032
http://issues.apache.org/activemq/browse/AMQ-1026

On 11/9/06, Danielius Jurna <da...@elitnet.lt> wrote:
>
> Has anybody looked at this problem?
>
> From my point of vew, failover transport can only work on sending messages
> to the broker, but it cannot work while receiving messages. For example:
> 1. Client prefetches 1000 messages.
> 2. Broker goes down and restarts.
> 3. Failover reconnects to the broker (client still has messages prefetched).
> 4. Client sends ACK message to the broker.
> 5. Broker cannot correlate ACK, because it doesn't know anything about
> prefetched messages on the client.
>
> Please can someone comment on that? Is it really unusable, or I don't
> understand something? The problem is that after such scenario, client and
> broker cannot fully recover (those prefetched messages are still in message
> store, but are not received by the client).
>
> --
> View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7258236
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>


-- 

James
-------
http://radio.weblogs.com/0112098/

Re: Bug in failover transport

Posted by Danielius Jurna <da...@elitnet.lt>.
Has anybody looked at this problem?

>From my point of vew, failover transport can only work on sending messages
to the broker, but it cannot work while receiving messages. For example:
1. Client prefetches 1000 messages.
2. Broker goes down and restarts.
3. Failover reconnects to the broker (client still has messages prefetched).
4. Client sends ACK message to the broker.
5. Broker cannot correlate ACK, because it doesn't know anything about
prefetched messages on the client.

Please can someone comment on that? Is it really unusable, or I don't
understand something? The problem is that after such scenario, client and
broker cannot fully recover (those prefetched messages are still in message
store, but are not received by the client).

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7258236
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Bug in failover transport

Posted by kieran1 <KM...@herzumsoftware.com>.
It looks like this is fixed in 4.1.0.

-- 
View this message in context: http://www.nabble.com/Bug-in-failover-transport-tf2587218.html#a7704485
Sent from the ActiveMQ - User mailing list archive at Nabble.com.