You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by nlif <na...@dbnet.co.il> on 2006/07/02 14:39:07 UTC

Duplicating Large Messages


Hello all,

We need to be able to handle very large messages (200 Mb and more), which
should be sent to multiple consumers (several hundreds of consumers). We
would like to know what is the memory foot-print in the broker, when the
same message is placed in several queues. Is the message duplicated in
memory?

Let me elaborate on the reasons for this question:

At first, we have considered using a Topic, so that there is only a single
copy of each message, regardless of how many consumers should get it.
However, we need to be able to deliver to consumers who are not connected at
the time of sending. We know we can use durable subscription, but we assume
this results in a larger memory foot-print, as the broker keeps messages for
some time.

Another problem with the Topic approach, is that this requires we include a
recipient-list in the message (as JMS message properties), and provide the
consumers with a JMS selector, so that consumers will only consume messages
that are sent to them. However, this is a potential security problem, and we
prefer using a queue per consumer, and a recipient-list based router, which
will duplicate the original message and place a copy of it for each
consumer, in that consumer's queue.

The above led us to the conculusion we need a Queue per consumer, and to the
question - does ActiveMQ duplicate a message when placing it in several
Queues, or is it internally a single copy?

Another question - is the ActiveMQ feature of large input and output streams
relevant for our problem? 

We would appreciate any input - this is a major show-stopper for us.
Thanks.
-- 
View this message in context: http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5139730
Sent from the ActiveMQ - User forum at Nabble.com.


Re: Duplicating Large Messages

Posted by James Strachan <ja...@gmail.com>.
On 7/6/06, Sanjiv Jivan <sa...@gmail.com> wrote:
> How does sending a large message via Active MQ compare to a solution where
> the client just saves the data contents to a database and only send some
> sort of locator id to the recipeints. The recipients which are aware of the
> database can then retrieve the contents by the locator ID. Is this
> effectively the same as the ActiveMQ broker storing the message and clients
> retrieving them? Or is the database approach more efficient with respect to
> memory usage, etc?

JMS is optimised for streaming of data round a network & for dealing
with one-to-many streams such as topics. The database approach is
certainly possible but it introduces unnecessary blocking
request-response and database locking whereas the JMS approach is
non-blocking, typically asychronous and one-way.


> In general, is sending such large messages over a (JMS) message bus
> legitimate use case?

Sure

> Can you describe some use cases where they are doing so
> in production.

Folks often want to send large batch files around in JMS. Though its
usually better to strip the batch file into indivdual rows and send
those instead - but when dealing with legacy systems sending large
files is often useful
-- 

James
-------
http://radio.weblogs.com/0112098/

Re: Duplicating Large Messages

Posted by nlif <na...@dbnet.co.il>.

Are you asking James or me?

Anyway - of course we would have used the approach you described if we
could, but our collaborating applications cannot share a database. They only
communicate via JMS messages. However, I admit that at first we didnt'
realize we'll be required to handle such large files. Most of the messages
we send are commands and events, which are much more suitable for messaging,
and the requirement for sending large files emerged later. In fact, had
James and his streams not come to our rescue :-), we would have had no
choice but to reconsider our architecture.

-- 
View this message in context: http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5201627
Sent from the ActiveMQ - User forum at Nabble.com.


Re: Duplicating Large Messages

Posted by Sanjiv Jivan <sa...@gmail.com>.
How does sending a large message via Active MQ compare to a solution where
the client just saves the data contents to a database and only send some
sort of locator id to the recipeints. The recipients which are aware of the
database can then retrieve the contents by the locator ID. Is this
effectively the same as the ActiveMQ broker storing the message and clients
retrieving them? Or is the database approach more efficient with respect to
memory usage, etc?

In general, is sending such large messages over a (JMS) message bus
legitimate use case? Can you describe some use cases where they are doing so
in production. When dealing with such large amounts of data, I normally
think ETL.

Sanjiv

On 7/6/06, nlif <na...@dbnet.co.il> wrote:
>
>
> Thanks a lot for all your answers, James. It seems all our requirements
> can
> be met using ActiveMQ.
>
> As for feedback on the site: I have been keeping an eye on ActiveMQ for
> about 6 months now, and there is a definite improvement in the amount and
> quality of the on-line documentation. Furthermore, I am constantly
> impressed
> by the activeMQ team's positive and helpful responses when posting
> questions/problems in the user forum. Just out of curiosity, is there any
> intention of publishing a book? "ActiveMQ in Action" has a nice sound to
> it,
> wouldn't you say :-)
>
> Naaman
> --
> View this message in context:
> http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5198671
> Sent from the ActiveMQ - User forum at Nabble.com.
>
>

Re: Duplicating Large Messages

Posted by James Strachan <ja...@gmail.com>.
On 7/6/06, nlif <na...@dbnet.co.il> wrote:
>
> Thanks a lot for all your answers, James. It seems all our requirements can
> be met using ActiveMQ.
>
> As for feedback on the site: I have been keeping an eye on ActiveMQ for
> about 6 months now, and there is a definite improvement in the amount and
> quality of the on-line documentation. Furthermore, I am constantly impressed
> by the activeMQ team's positive and helpful responses when posting
> questions/problems in the user forum.

Thanks for the great feedback.

> Just out of curiosity, is there any
> intention of publishing a book? "ActiveMQ in Action" has a nice sound to it,
> wouldn't you say :-)

A book would be a great idea! Now if we could just find some spare time...

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: Duplicating Large Messages

Posted by "Christopher G. Stach II" <cg...@ldsys.net>.
nlif wrote:
> questions/problems in the user forum. Just out of curiosity, is there any
> intention of publishing a book? "ActiveMQ in Action" has a nice sound to it,
> wouldn't you say :-)
> 
> Naaman

Sounds retarded to me. :)  Like every other book on OSS projects, by the
time it gets to print, it's completely obsolete.  Besides, who uses
paper anymore?

-- 
Christopher G. Stach II

Re: Duplicating Large Messages

Posted by nlif <na...@dbnet.co.il>.
Thanks a lot for all your answers, James. It seems all our requirements can
be met using ActiveMQ.

As for feedback on the site: I have been keeping an eye on ActiveMQ for
about 6 months now, and there is a definite improvement in the amount and
quality of the on-line documentation. Furthermore, I am constantly impressed
by the activeMQ team's positive and helpful responses when posting
questions/problems in the user forum. Just out of curiosity, is there any
intention of publishing a book? "ActiveMQ in Action" has a nice sound to it,
wouldn't you say :-)

Naaman
-- 
View this message in context: http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5198671
Sent from the ActiveMQ - User forum at Nabble.com.


Re: Duplicating Large Messages

Posted by James Strachan <ja...@gmail.com>.
On 7/4/06, nlif <na...@dbnet.co.il> wrote:
>
> Thank you. This was very helpful. I think we now have answers to almost all
> of our concerns. Just one more question, if I may :-)
>
> I described my motivation as the need to split a single, but very big,
> message, into smaller chunks, just for optimized transfer. However, the need
> is two-fold: In some cases, I do need to send large binary files, for which
> streams are a great solution. But in some (other) cases, I need to send
> messages, which, while not necessarily very large, are comprised of several
> messages. For example: a data file (JMS BytesMessage), along with a command
> object (ObjectMessage or TextMessage if it's an XML) that instructs what to
> do with the data. So my "message" is in fact two JMS messages, that are only
> meaningful together.
>
> In this case, I may also require something along the lines of an aggregator,
> but I'm not sure the streams are my best choice. While browsing the ActiveMQ
> documentation, which has lately become my favorite pastime :-),

:)

BTW we're always on the lookout for feedback on how we can improve the
website and make it eaiser to navigate.

http://incubator.apache.org/activemq/how-does-the-website-work.html


>  I came
> across the exclusive-consumer and message-groups features. Am I correct in
> considering this as a solution for the second scenario I described? (That is
> - streams for sending very large files, and message-groups for sending
> several separate messages, that belong together logically).

Yes, absolutely right. If you make up some message group header and
add it to your related messages then they are guarrenteed to be
processed by the same consumer (unless the consumer dies). You could
enforce a number of messages being processed atomically using this
feature together with JMS transactions etc.

So just set the JMSXGroupId header to something unique for the related
messages and you should be fine
http://incubator.apache.org/activemq/message-groups.html


-- 

James
-------
http://radio.weblogs.com/0112098/

Re: Duplicating Large Messages

Posted by nlif <na...@dbnet.co.il>.
Thank you. This was very helpful. I think we now have answers to almost all
of our concerns. Just one more question, if I may :-) 

I described my motivation as the need to split a single, but very big,
message, into smaller chunks, just for optimized transfer. However, the need
is two-fold: In some cases, I do need to send large binary files, for which
streams are a great solution. But in some (other) cases, I need to send
messages, which, while not necessarily very large, are comprised of several
messages. For example: a data file (JMS BytesMessage), along with a command
object (ObjectMessage or TextMessage if it's an XML) that instructs what to
do with the data. So my "message" is in fact two JMS messages, that are only
meaningful together. 

In this case, I may also require something along the lines of an aggregator,
but I'm not sure the streams are my best choice. While browsing the ActiveMQ
documentation, which has lately become my favorite pastime :-), I came
across the exclusive-consumer and message-groups features. Am I correct in
considering this as a solution for the second scenario I described? (That is
- streams for sending very large files, and message-groups for sending
several separate messages, that belong together logically).


Thanks again for your patience and insights.
Naaman
-- 
View this message in context: http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5167738
Sent from the ActiveMQ - User forum at Nabble.com.


Re: Duplicating Large Messages

Posted by James Strachan <ja...@gmail.com>.
On 7/4/06, nlif <na...@dbnet.co.il> wrote:
>
> Thanks James. This is very valuable input.
>
> Consumer prefetch - this is a great tip. I was unware of this ActiveMQ
> feature. However, at this point, consumer memory is NOT my problem, since
> each consumer will recieve one copy of the large message, and they should be
> able to handle it.

The chances are a consumer could receive many messages (unless you
only send a message to one consumer then wait for a message response
before ever sending again).

> My problem is at the producer side, because it has to
> duplicate the messages.

FWIW using composite destinations you can send a single message to
multiple destinations.

http://incubator.apache.org/activemq/composite-destinations.html

this avoids the duplication on the producer but just pushes the
problem back onto the broker.


> Queue vs Topic - From your reply I understand that while using multiple
> queues will indeed result in message duplication - using a topic will not.
> And if using a durable topic will also take care of a scenario in which not
> all of the consumers are reachable at the time of sending, while not costing
> me in memory - then this may be a viable solution.  (you say that durable
> topics use the disk, right?)

Yes - they use a RAM cache which can be evicted and reloaded from disk.


> The reason I think I have to use selectors when using a topic, is because I
> sometime want to send something to some, but not all, of my consumers, and
> the recipients are determined dynamically. For example, I may need to send
> message A to consumers 1,3,5 and then message B to consumers 1,3,4. The only
> way I can think of is using a single Topic, adding a recipient list to the
> message header, and adding selectors to the consumers. Is there any other
> way? I would be happy to know.

That sounds fine to me. You'd get the benefits of durable topics but
the effect of kinda logical queues per consumer.


> ActiveMQ JMS Streams - Thanks for this tip! Does this mean that even when
> sending a huge file, at any given moment only a small portion of it is in
> memory? If so - this can be very helpful to us. Great feature!

Yes! Basically any massive file is split up into chunks (by default of
64K) so that any client (producer/consumer) or broker only has to keep
a few messages in RAM at any point in time so arbitrarily large files
can be exchanged using small amounts of RAM.


> In fact, one of the things I considered for handling very large messages,
> was splitting them to smaller ones, sending them separately, and then using
> an Aggregator on the consumer side (I read about this is Gregor Hohpe's
> excellent "Enterprise Integration Patterns").

Thats pretty much how JMS Streams work :)

> However, it seems to me that
> when running in a cluster - an aggregator cannot work, since parts of the
> message can end up in different machines on the cluster. Is there any way to
> split a message to smaller chunks and then ensure they all get to the same
> place, even in a cluster?

Yes - JMS Streams :)

-- 

James
-------
http://radio.weblogs.com/0112098/

Re: Duplicating Large Messages

Posted by nlif <na...@dbnet.co.il>.
Thanks James. This is very valuable input.

Consumer prefetch - this is a great tip. I was unware of this ActiveMQ
feature. However, at this point, consumer memory is NOT my problem, since
each consumer will recieve one copy of the large message, and they should be
able to handle it. My problem is at the producer side, because it has to
duplicate the messages. 

Queue vs Topic - From your reply I understand that while using multiple
queues will indeed result in message duplication - using a topic will not.
And if using a durable topic will also take care of a scenario in which not
all of the consumers are reachable at the time of sending, while not costing
me in memory - then this may be a viable solution.  (you say that durable
topics use the disk, right?)

The reason I think I have to use selectors when using a topic, is because I
sometime want to send something to some, but not all, of my consumers, and
the recipients are determined dynamically. For example, I may need to send
message A to consumers 1,3,5 and then message B to consumers 1,3,4. The only
way I can think of is using a single Topic, adding a recipient list to the
message header, and adding selectors to the consumers. Is there any other
way? I would be happy to know.

ActiveMQ JMS Streams - Thanks for this tip! Does this mean that even when
sending a huge file, at any given moment only a small portion of it is in
memory? If so - this can be very helpful to us. Great feature!

In fact, one of the things I considered for handling very large messages,
was splitting them to smaller ones, sending them separately, and then using
an Aggregator on the consumer side (I read about this is Gregor Hohpe's
excellent "Enterprise Integration Patterns"). However, it seems to me that
when running in a cluster - an aggregator cannot work, since parts of the
message can end up in different machines on the cluster. Is there any way to
split a message to smaller chunks and then ensure they all get to the same
place, even in a cluster?

Thanks a lot for your time and insights.
Naaman











-- 
View this message in context: http://www.nabble.com/Duplicating-Large-Messages-tf1880194.html#a5162189
Sent from the ActiveMQ - User forum at Nabble.com.


Re: Duplicating Large Messages

Posted by James Strachan <ja...@gmail.com>.
On 7/2/06, nlif <na...@dbnet.co.il> wrote:
> Hello all,
>
> We need to be able to handle very large messages (200 Mb and more), which
> should be sent to multiple consumers (several hundreds of consumers). We
> would like to know what is the memory foot-print in the broker, when the
> same message is placed in several queues. Is the message duplicated in
> memory?

Typically yes. Also on consumers they typically buffer up messages as
well so you might wanna reduce prefetch sizes to something really
small (say 1)...

http://incubator.apache.org/activemq/what-is-the-prefetch-limit-for.html


> Let me elaborate on the reasons for this question:
>
> At first, we have considered using a Topic, so that there is only a single
> copy of each message, regardless of how many consumers should get it.
> However, we need to be able to deliver to consumers who are not connected at
> the time of sending. We know we can use durable subscription, but we assume
> this results in a larger memory foot-print, as the broker keeps messages for
> some time.

The broker keeps messages on disk for a longer time but generally
doesn't have to keep them all in RAM


> Another problem with the Topic approach, is that this requires we include a
> recipient-list in the message (as JMS message properties), and provide the
> consumers with a JMS selector, so that consumers will only consume messages
> that are sent to them.

Don't you just send to the topic and let consumers decide what topics
to consume? Or is it that there is no easy way to introduce a logical
topic so want to put the consumer IDs on the message.


> However, this is a potential security problem, and we
> prefer using a queue per consumer, and a recipient-list based router, which
> will duplicate the original message and place a copy of it for each
> consumer, in that consumer's queue.

OK; its a fair bit slower to use many messages in different queues
than one message in a topic (as durable topics are specifically
optimised to do one-to-many delivery) but it should work. Though for
200M messages you'll generally need a monster heap.


> The above led us to the conculusion we need a Queue per consumer, and to the
> question - does ActiveMQ duplicate a message when placing it in several
> Queues, or is it internally a single copy?

Its a copy. For durable topics its shared.


> Another question - is the ActiveMQ feature of large input and output streams
> relevant for our problem?

Yes, definitely


> We would appreciate any input - this is a major show-stopper for us.

Given the requirements I'd recommend using the input/output streams
with multiple queues and using durable queues.
http://incubator.apache.org/activemq/jms-streams.html

which allows massive 'messages' to be sent to consumers without
requiring monster heaps. Since things are persistent we can evict
things from RAM to avoid having huge heaps.

If you are worried about performance & don't need the atomic fail-fast
guarrentee of persistent messaging you could enable async sends
http://incubator.apache.org/activemq/async-sends.html

-- 

James
-------
http://radio.weblogs.com/0112098/