You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Raphael Seebacher <rs...@open.ch> on 2014/02/21 17:35:49 UTC

Apache Apollo 1.6 corrupt STOMP messages in store

Hi

We use Apache Apollo 1.6 and STOMP for messaging in our production system. So far only one application uses this messaging system, but we plan to migrate more critical applications to this setup.

Setup / Configuration:
Approximately 2500 clients are permanently connected to the broker. Every 30 minutes these clients send a STOMP message to a persistent queue on the broker, whose default virtual host is configured to use a leveldb store. There are two consumers subscribed to the queue, processing these messages. They have subscribed with the following headers:
credit:1,0
ack:client-individual

Observed behaviour:
Every once in a while (without an obvious reason) a message's body gets truncated somewhere within the broker and the message is then stored in the leveldb store. When this corrupted message reaches the top of the queue, the broker sends the message to one of the attached consumers (as one would expect it to do). Since the message body has been truncated, the content-length header does no longer match the body's length and, as a consequence, the consumer tries to read more octets than are being sent by the broker. The web interface of the Apollo broker, however, shows that there is one message in "Transfer" and that it is "Waiting On" the "consumer". Obviously, this is a deadlock situation, as the broker waits for the consumer to acknowledge the message while the consumer waits for the missing octets to arrive, resulting in no messages being consumed and processed anymore.

Debugging:
- As you have already noted, I was not able to figure out where exactly the corruption of the message happens.
- I was not able to trick the broker into accepting a corrupted STOMP frame and thus suspect the corruption to happen within the broker itself.
- Unfortunately, I was not able to reproduce the aforementioned behaviour in our testing environment either. (Note that I tried to reproduce that with only 250 clients simultaneously connected, each of which sending 10 messages).
- No warnings nor errors were found in the log files.

Any insights or pointers on how to further debug/analyse this problem are greatly appreciated.


Thanks a lot for your help!
-Raphi


-- 
raphael seebacher
security engineer

open systems ag
raeffelstrasse 29
ch-8045 zurich
t: +41 58 100 10 10
f: +41 58 100 10 11

rse@open.ch

http://www.open.ch