You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by Pete Schwamb <pe...@meteostar.com> on 2008/03/11 18:08:51 UTC

5.0 (and later) Queue stops dispatching

I've been seeing some freezes on Queue dispatching, particularly with 
the latest 5.0 release and more recent snapshots.  I've tried very hard 
to reproduce reliably it in a small test case, but it seems very timing 
dependent.  I was able to reproduce at least one variant of it fairly 
reliably.  I am using the default AMQMessageStore setup.  Also, I'm 
using fuse-5.0.0.9, because the SNAPSHOT builds are failing even more 
spectacularly for me at the moment, though from what I've seen in SVN, I 
believe this is still an issue in the trunk.

There are two non-durable subscribers on the queue via stomp, and they 
consume more slowly than the producer, which publishes in bursts.  After 
the first burst of 30 - 50k messages, I stop the producer and let the 
consumers catch up.  Then I publish another burst of messages.  This is 
usually where the freeze happens.

First, I usually get a message like the following:

ERROR RecoveryListenerAdapter        - Message id 
ID:sand-52497-1205185863002-2:1:2:1:42787 could not be recovered from 
the data store - already dispatched

Then the queue stops dispatching.
Here's what I first saw in the debugger, after the "already dispatched" 
message appears:

a) on the Queue, messages.hasNext() returns false, so the doPageIn() 
method never pages anything in.
b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() -> 
doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize, this) ->
this.store.recoverNextMessages(this.maxBatchSize, this)
c) KahaReferenceStore recoverNextMessages gets null back from 
messageContainer.getNext(entry), because entry.nextItem = -1

However, the message store usually has many thousands of messages still 
in it, as evidenced by the 'size' attribute on DiskIndexLinkedList.  So 
this is the first hint that the LinkedList is corrupt.  I started 
looking more closely at DiskIndexLinkedList, and noticed the following 
incorrect (I think) behavior:

In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is 
"result = last".  On some occasions result.nextItem is -1, and 
last.nextItem != -1.  Shouldn't last.nextItem always be -1?  I'm 
wondering if the opposite was intended: to update "last".

So I changed the following:

Index: 
src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java
===================================================================
--- 
src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java  
(revision 635580)
+++ 
src/main/java/org/apache/activemq/kaha/impl/index/DiskIndexLinkedList.java  
(working copy)
@@ -271,7 +271,7 @@
               }
               // essential last get's updated consistently
               if (result != null && last != null && last.equals(result)) {
-                       result = last;
+                       last = result;
               }
               return result;
       }


And indeed, I no longer get the "already dispatched" message, and queues 
continue dispatching after many cycles of the producer flowing 10s of 
thousands of messages through.

Hopefully this sheds some light on stability issues others may be 
having.  I'm not sure I've fixed the problem 100%.  Is anyone else 
seeing this?

-Pete

Re: 5.0 (and later) Queue stops dispatching

Posted by Rob Davies <ra...@gmail.com>.

On 11 Mar 2008, at 17:08, Pete Schwamb wrote:

> I've been seeing some freezes on Queue dispatching, particularly  
> with the latest 5.0 release and more recent snapshots.  I've tried  
> very hard to reproduce reliably it in a small test case, but it  
> seems very timing dependent.  I was able to reproduce at least one  
> variant of it fairly reliably.  I am using the default  
> AMQMessageStore setup.  Also, I'm using fuse-5.0.0.9, because the  
> SNAPSHOT builds are failing even more spectacularly for me at the  
> moment, though from what I've seen in SVN, I believe this is still  
> an issue in the trunk.
>
> There are two non-durable subscribers on the queue via stomp, and  
> they consume more slowly than the producer, which publishes in  
> bursts.  After the first burst of 30 - 50k messages, I stop the  
> producer and let the consumers catch up.  Then I publish another  
> burst of messages.  This is usually where the freeze happens.
>
> First, I usually get a message like the following:
>
> ERROR RecoveryListenerAdapter        - Message id  
> ID:sand-52497-1205185863002-2:1:2:1:42787 could not be recovered  
> from the data store - already dispatched
>
> Then the queue stops dispatching.
> Here's what I first saw in the debugger, after the "already  
> dispatched" message appears:
>
> a) on the Queue, messages.hasNext() returns false, so the doPageIn()  
> method never pages anything in.
> b) messages.hasNext() -> currentCursor.hasNext() -> fillBatch() ->  
> doFillBatch() -> this.store.recoverNextMessages(this.maxBatchSize,  
> this) ->
> this.store.recoverNextMessages(this.maxBatchSize, this)
> c) KahaReferenceStore recoverNextMessages gets null back from  
> messageContainer.getNext(entry), because entry.nextItem = -1
>
> However, the message store usually has many thousands of messages  
> still in it, as evidenced by the 'size' attribute on  
> DiskIndexLinkedList.  So this is the first hint that the LinkedList  
> is corrupt.  I started looking more closely at DiskIndexLinkedList,  
> and noticed the following incorrect (I think) behavior:
>
> In DiskIndexLinkedList.getNextEntry(IndexItem current), line 274 is  
> "result = last".  On some occasions result.nextItem is -1, and  
> last.nextItem != -1.  Shouldn't last.nextItem always be -1?  I'm  
> wondering if the opposite was intended: to update "last".
>
> So I changed the following:
>
> Index: src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java
> ===================================================================
> --- src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java  (revision 635580)
> +++ src/main/java/org/apache/activemq/kaha/impl/index/ 
> DiskIndexLinkedList.java  (working copy)
> @@ -271,7 +271,7 @@
>              }
>              // essential last get's updated consistently
>              if (result != null && last != null &&  
> last.equals(result)) {
> -                       result = last;
> +                       last = result;
>              }
>              return result;
>      }
>
>
> And indeed, I no longer get the "already dispatched" message, and  
> queues continue dispatching after many cycles of the producer  
> flowing 10s of thousands of messages through.
>
> Hopefully this sheds some light on stability issues others may be  
> having.  I'm not sure I've fixed the problem 100%.  Is anyone else  
> seeing this?
>
> -Pete
>
>
Beautiful!!! Thx Pete - love it when other folks fix my bugs ;)


cheers,

Rob

http://open.iona.com/ -Enterprise Open Integration
http://rajdavies.blogspot.com/