You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Stefano Bagnara <ap...@bago.org> on 2008/05/08 10:56:06 UTC
SMTP Server (Apache James) spooling hints
Hi all,
I'm an Apache JAMES committer and I'm "almost" new to ActiveMQ.
I'm starting analysis on how to replace our default spool with ActiveMQ
and I hope you can give me some hints :-)
It would be better to use ActiveMQ via JMS (more flexibility) but if
there is any better solution to our problems by using specific ActiveMQ
APIs then why not!!
Our scenario is an SMTP Server so we have something like this:
1) SMTP Server receives messages and put them to the spool. The spool
have to be persistent because once the message has been posted via SMTP
we cannot loose it. Most time the message will be consumed very fast, so
in past I looked at using Kaha directly for this, but maybe the 5.0 AMQ
Message Store already handle this one in a performant way?
2) Our current spooling have this architecture:
we have a single "spool" that contains messages with a "state". We read
a random message from the spool, look at its state and then start the
processing depending on the state itself at the end of the processing we
can alter the state and leave the message in the spool, or we can remove
it from the spool. In the processing we could even push more messages
into the spool (e.g: to split the message to 2 different paths). ATM the
re is no transaction management.
The processing from a state to another (or to delete) is a sequence of
micro-processings (named matchers/mailets in james), so the actual
status depends also on what matchers/mailets have been processed so far,
but we currently keep this in memory and never store this. So if
something goes wrong (given that we don't have transactions) we simply
start from the beginning of that "state processor" (I'd like to improve
this issue, too, with the new ActiveMQ based spool).
Some times the message is simply moved from one state to another a few
times and then it is removed from the spool because of 2 causes:
a) it has been moved to the "outgoing spool" (the spool for the messages
to be sent to other smtp servers)
b) it has been posted to an user inbox.
Other times the message is altered in its content.
So you see in James we currently have a single "message store" and we
can "lock on a message" (so no other thread will take it) "retrieve it",
"update and unlock it" (alter its state or state+content) or "remove
it". How would you manage this with ActiveMQ?
3) Outgoing spool:
The outgoing spool in JAMES is a spool like the main spool, with the
difference that a message delivery could fail and there is a retry
schedule. So we try to send a message, on failure we try again 10
minutes later, then 30 minutes later, then 2 hours later (it is
configurable) and so on. ATM we store the "next-attempt-date" and then
each "deliverer" simply take the message with the minor
next-attempt-date and if it is due for delivery it starts its work,
otherwise it will simply wait the needed time (one deliverer is noticed
when a *new* message enter this spool / They all "wait" on the spool and
the spool is noticed one at each store).
The most common case is:
a) the message we received at #1 entered the spool #2 and is processed
very fast and it ends in the outgoing spool #3 where it is delivered on
the first attempt. In this case it would be cool if the message was in
memory and simply written once for safety because the processing should
be fast and it would be slow to read it again from the disk.
b) we fail our first attempt, then it does not make sense to keep it in
memory because we know we won't need it in the next X minutes/hours.
Any suggestions on how to do this with ActiveMQ?
As a last point we have to take care of 2 different use-cases:
I) most traffic is done by fastmoving small messages but
II) many messages are 1-10MB in size, and a few message could be even
100MB or even more: how should we handle this messages in ActiveMQ given
that we can't take them in memory but we simply want to stream then in
and out from the server?
I understand this is a lot of questions, but I would really appreciate
any hint, even partial. I'm collecting ideas :-)
Stefano
PS: we are also evaluating using JCR for inboxes if you was wondering,
but this is another story, for another list ;-)
Re: SMTP Server (Apache James) spooling hints
Posted by Stefano Bagnara <ap...@bago.org>.
James Strachan ha scritto:
> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>> James Strachan ha scritto:
>>> Another option is to use durable topics where a message is written
>>> once and all durable topic subscribers just get a kinda pointer to it.
>> I'm not sure I understand how this would work :-(
>
> So imagine you've 5 mailets that need to process a message. You can
> write the message to 5 queues; or write the message to a single topic
> and have 5 'durable topic subscribers' for each maillet. That way the
> message is written once and each durable topic subscriber basically
> keeps a pointer to the message.
I think I'm lost :-(
A standard scenario is that I have 3 processors:
"root" => this is where new mail (e.g: incoming from smtp) enters.
"filter" => where we decide if it is spam, local, or remote
"outgoing" => where we deliver it.
A processor in James language is a sequence of "matcher/mailets".
Currently a random mail is took from the spool, then we look at the
current state (root, filter, outgoing) and run the processing for that
state. The processing works like this: run the first matcher, if it
matches run its mailet, if they didn't change the status then move to
the second matcher, and so on. At the end of the processor the status
have to be changed somewhere. (setting the status to "ghost" means drop
the mail). So change the status is like moving to another queue. At each
status change we update the status on the queue (or the whole message if
it changed) and "unlock" it for another thread to take care of
processing it through the new processor later.
(there is also a detail that a matcher can partially match so a copy of
the message is created for the 2 paths to be followed, but I'm ignoring
this at this level).
The 1:1 mapping would be to have 3 queues and do everything else like we
do now, another "more granular" approach would be to identify queue for
status+"matcher/mailet position" so that we have root-1, root-2,
filter-1, filter-2, filter-3, filter-n, outgoing-1 as separate queue and
this would give us persistence of the status in a more granular way, but
maybe this is not needed.
You say that I can use the "durable topics" but I don't get how.
The smtpserver receive a new message and publish it in the topic, who is
subscribed to this topic? if I subscribe each of my "processors" (root,
filter, outgoing) how do they know that only root have to check it and
then IF root move it to filter/outgoing status then the others will have
to take care of it? Maybe I misunderstood and you use a combination of
durable topics for some data and queues for some other data, but I'm
lost on this....
I would be tempted to go for the use of the JCR to store the full
mimemessage as soon as I receive the message and then simply put the
"envelope" in the messaging system with a reference to the JCR so that
the message to be moved from queue to queue (persistents) will be very
small, but this way I'll pay the JCR storage every time even for simple
messages I simply have to relay (that in case of AMQ Message Store I
would simply write to the datalog), and I guess that writing to JCR does
cost MUCH MORE than writing to ActiveMQ datalog, is this a correct guess?
Thank you,
Stefano
Re: SMTP Server (Apache James) spooling hints
Posted by James Strachan <ja...@gmail.com>.
2008/5/9 Stefano Bagnara <ap...@bago.org>:
> James Strachan ha scritto:
>>
>> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>>>
>>> What does it happen under the hood when I use so many queues? Is the
>>> message fully written to disk each time I move it from a queue to another
>>> or
>>> does it simply update a reference when it belongs to the same store?
>>
>> Yeah, currently we do that.
>
> It was an "or" question, but I guess from the following sentence that you
> mean that you write the full message for each queue "move", right?
Yeah
>> Another option is to use durable topics where a message is written
>> once and all durable topic subscribers just get a kinda pointer to it.
>
> I'm not sure I understand how this would work :-(
So imagine you've 5 mailets that need to process a message. You can
write the message to 5 queues; or write the message to a single topic
and have 5 'durable topic subscribers' for each maillet. That way the
message is written once and each durable topic subscriber basically
keeps a pointer to the message.
> I liked the multiple queue solution: is there any way to limit the "writes"
> on disk with some persistent+non-persistent + longtransactions strategy?
in ActiveMQ things are either persistent; where they are written to
disk ASAP (though its up to the producer to decide if it wants to
block for it to be written completely to disk - the default - or if it
is happy to get on with something else while the write occurs) - or
they are non-persistent.
See http://activemq.apache.org/what-is-the-difference-between-persistent-and-non-persistent-delivery.html
With non-persistent we now support spooling to disk if you are running
out of RAM as another hybrid option.
The main QoS to decide really is, if you kill & restart a broker are
you happy to loose stuff?
> The fact is that my of my "most common scenario" is a input mail being
> processed through many states wihtout being altered and after 5-6 state
> changes (processor changes/queue changes) each one having 3-5
> matchers/mailets it is delivered remotely or stored locally.
> I could always store the payload to JCR so to not rewrite it multiple times,
> but I fear that even for the simple JMS message writing it once for queue
> (or even worse, once for each mailet) would be a performance issue (current
> james run an UPDATE spool set state = #newstate# where ID = #id# for status
> change and does not track persistently the "substatust" of the specific
> mailet being processed, because all the mailets in a given processor are
> processed at once for a given message).
>
>> [...]
>> https://issues.apache.org/jira/browse/INFRA-1607
>>
>> feel free to vote for it :)
>
> Done!
> I also checked on confluence administration side to see if something was
> wrong with the snippet plugin but it seems to be ok, so we'll have to wait
> for the infra team.
Yeah :(
>> As an aside - for a while I've been pondering about adding a maillet
>> support into Camel for easy Camel <-> JAMES integration.
>>
>> Something wacky to think about - which might be a bit too much Camel
>> internals for now but bear with me..
>> [... a lot of interesting technical stuff...]
>
> ATM it is very hard for me to follow you on this. I think I will have to
> read this again once I'll be more familiar with camel/activemq :-)
I thought so - never mind; if you ever get hooked on Camel come back
and read it again later and it might make a bit more sense, hopefully
:)
> But be sure that I bookmarked it and I want to try the road are trying to
> show me!!
:)
--
James
-------
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com
Re: SMTP Server (Apache James) spooling hints
Posted by Stefano Bagnara <ap...@bago.org>.
James Strachan ha scritto:
> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>> What does it happen under the hood when I use so many queues? Is the
>> message fully written to disk each time I move it from a queue to another or
>> does it simply update a reference when it belongs to the same store?
>
> Yeah, currently we do that.
It was an "or" question, but I guess from the following sentence that
you mean that you write the full message for each queue "move", right?
> Another option is to use durable topics where a message is written
> once and all durable topic subscribers just get a kinda pointer to it.
I'm not sure I understand how this would work :-(
I liked the multiple queue solution: is there any way to limit the
"writes" on disk with some persistent+non-persistent + longtransactions
strategy?
The fact is that my of my "most common scenario" is a input mail being
processed through many states wihtout being altered and after 5-6 state
changes (processor changes/queue changes) each one having 3-5
matchers/mailets it is delivered remotely or stored locally.
I could always store the payload to JCR so to not rewrite it multiple
times, but I fear that even for the simple JMS message writing it once
for queue (or even worse, once for each mailet) would be a performance
issue (current james run an UPDATE spool set state = #newstate# where ID
= #id# for status change and does not track persistently the
"substatust" of the specific mailet being processed, because all the
mailets in a given processor are processed at once for a given message).
> [...]
> https://issues.apache.org/jira/browse/INFRA-1607
>
> feel free to vote for it :)
Done!
I also checked on confluence administration side to see if something was
wrong with the snippet plugin but it seems to be ok, so we'll have to
wait for the infra team.
> As an aside - for a while I've been pondering about adding a maillet
> support into Camel for easy Camel <-> JAMES integration.
>
> Something wacky to think about - which might be a bit too much Camel
> internals for now but bear with me..
> [... a lot of interesting technical stuff...]
ATM it is very hard for me to follow you on this. I think I will have to
read this again once I'll be more familiar with camel/activemq :-)
But be sure that I bookmarked it and I want to try the road are trying
to show me!!
Thank you,
Stefano
Re: SMTP Server (Apache James) spooling hints
Posted by James Strachan <ja...@gmail.com>.
>> I looked at the website and found an error in this page:
>> http://activemq.apache.org/camel/spring-xml-extensions.html
>> "An error occurred: Connection refused. The system administrator has been
>> notified."
> Unfortunately its due to the recent svn issues we've had at Apache.
> Snippets that were working totally fine in loads of confluence wikis
> are now totally borked :(
> https://issues.apache.org/jira/browse/INFRA-1607
Its fixed!
Here's those pages working...
http://cwiki.apache.org/CAMEL/spring.html
http://cwiki.apache.org/CAMEL/spring-xml-extensions.html
--
James
-------
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com
Re: SMTP Server (Apache James) spooling hints
Posted by James Strachan <ja...@gmail.com>.
2008/5/9 Stefano Bagnara <ap...@bago.org>:
> James Strachan ha scritto:
>
> >
> > 2008/5/8 Stefano Bagnara <ap...@bago.org>:
> >
> > > I'm starting analysis on how to replace our default spool with ActiveMQ
> and
> > > [...] in James we currently have a single "message store" and we can
> > >
> > > "lock on a message" (so no other thread will take it) "retrieve it",
> "update
> > > and unlock it" (alter its state or state+content) or "remove it". How
> would
> > > you manage this with ActiveMQ?
> > >
> >
> >
> > With ActiveMQ you'd use a queue per state/maillet, remove it from the
> > queue, do something with it then put it on some other queue(s) (either
> > changed or the same message). The simple JMS/MOM model of sending to a
> > queue or consuming from a queue turns out to be very fast; allowing a
> > highly SEDA based asynchronous model to go really fast since there's
> > no locking or leasing required - and messages can flow very
> > asynchronously to boost throughput.
> >
>
> What does it happen under the hood when I use so many queues? Is the
> message fully written to disk each time I move it from a queue to another or
> does it simply update a reference when it belongs to the same store?
Yeah, currently we do that.
Another option is to use durable topics where a message is written
once and all durable topic subscribers just get a kinda pointer to it.
> I looked at the website and found an error in this page:
> http://activemq.apache.org/camel/spring-xml-extensions.html
> "An error occurred: Connection refused. The system administrator has been
> notified."
> I looked at the CWIKI sources
> (http://cwiki.removeme_apache.org/confluence/display/CAMEL/Spring+XML+Extensions)
> and I see this:
>
> {snippet:id=e3|lang=xml|url=activemq/camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/builder/spring_route_builder_test.xml}
> Not sure but maybe you have to add svn.apache.org/repos/asf/ in front of
> it?
Unfortunately its due to the recent svn issues we've had at Apache.
Snippets that were working totally fine in loads of confluence wikis
are now totally borked :(
https://issues.apache.org/jira/browse/INFRA-1607
feel free to vote for it :)
> > > I understand this is a lot of questions, but I would really appreciate
> any
> > > hint, even partial. I'm collecting ideas :-)
> > >
> >
> > :)
> >
>
> Thank you! Your answers are even more than what I expected! You're
> suggestion seems to be very very useful and I think you saved me weeks of
> thoughts!
You're most welcome! :)
As an aside - for a while I've been pondering about adding a maillet
support into Camel for easy Camel <-> JAMES integration.
Something wacky to think about - which might be a bit too much Camel
internals for now but bear with me..
Camel has a really neat extensible type conversion library...
http://activemq.apache.org/camel/type-converter.html
so that you can grab a message body or header as any type you like; be
it a stream, string, byte[], Document, TrAX Source or whatever. Very
handy for wiring things together!
When you invoke beans in a route like this...
f rom("activemq:SomeQueue").bean(SomeBean.class)
we use the bean integration to figure out how to invoke the bean
method from a message...
http://activemq.apache.org/camel/bean-integration.html
One of the little known things is that to invoke a bean, Camel first
tries to coerce the bean into a Processor and if it can it uses that
http://activemq.apache.org/camel/processor.html
An example of this is the ActiveMQ component for Camel which allows
you to invoke any JMS MessageListener within any Camel route -
irrespective of what message is being used...
http://activemq.apache.org/camel/activemq.html
This is implemented by writing a Camel Type Converter that can turn
any MessageListener instance into a Camel Processor - see the
toProcessor() method
https://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/camel/converter/ActiveMQMessageConverter.java
So we could have awesome JAMES integration in Camel by doing the same
thing; creating converters between Camel's Message / Exchange types
and JAMES/JavaMail's APIs for messages, or for creating a Processor
from a Maillet so that we can invoke a Maillet within any Camel route
- whether the message is coming from JMS, file system, database or
JavaMail/JAMES etc
> I'll start with your hints and I'll come back with more questions as soon
> as I'll have rode the camel! ;-)
Great! :)
--
James
-------
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com
Re: SMTP Server (Apache James) spooling hints
Posted by Stefano Bagnara <ap...@bago.org>.
James Strachan ha scritto:
> 2008/5/8 Stefano Bagnara <ap...@bago.org>:
>> I'm starting analysis on how to replace our default spool with ActiveMQ and
>> [...] in James we currently have a single "message store" and we can
>> "lock on a message" (so no other thread will take it) "retrieve it", "update
>> and unlock it" (alter its state or state+content) or "remove it". How would
>> you manage this with ActiveMQ?
>
> With ActiveMQ you'd use a queue per state/maillet, remove it from the
> queue, do something with it then put it on some other queue(s) (either
> changed or the same message). The simple JMS/MOM model of sending to a
> queue or consuming from a queue turns out to be very fast; allowing a
> highly SEDA based asynchronous model to go really fast since there's
> no locking or leasing required - and messages can flow very
> asynchronously to boost throughput.
What does it happen under the hood when I use so many queues? Is the
message fully written to disk each time I move it from a queue to
another or does it simply update a reference when it belongs to the same
store?
> If you do find you wanna grab - edit - put back type thing alot you
> could look at using JavaSpaces (or Entity Bean :). But I think for
> JAMES then messaging could work well as it sounds to me (as a newbie
> JAMES person) like what you're doing processing mail is kinda a pipes
> and filters type model...
> http://activemq.apache.org/camel/pipes-and-filters.html
>
> which maps very well to messaging and queues.
Cool! This is very interesting and I never read about it. So I'm going
to study it and to play with it a bit.
> For more background see :
> http://activemq.apache.org/camel/enterprise-integration-patterns.html
>
>> 3) Outgoing spool:
>> The outgoing spool in JAMES is a spool like the main spool, with the
>> difference that a message delivery could fail and there is a retry schedule.
>> [...]
>> Any suggestions on how to do this with ActiveMQ?
>
> It sounds like you could use the delayer pattern...
> http://activemq.apache.org/camel/delayer.html
>
> Then have separate queues for '30 mins later', '1 hour later', '2 hours later'.
> [...]
I looked at the website and found an error in this page:
http://activemq.apache.org/camel/spring-xml-extensions.html
"An error occurred: Connection refused. The system administrator has
been notified."
I looked at the CWIKI sources
(http://cwiki.removeme_apache.org/confluence/display/CAMEL/Spring+XML+Extensions)
and I see this:
{snippet:id=e3|lang=xml|url=activemq/camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/builder/spring_route_builder_test.xml}
Not sure but maybe you have to add svn.apache.org/repos/asf/ in front of it?
>> I understand this is a lot of questions, but I would really appreciate any
>> hint, even partial. I'm collecting ideas :-)
>
> :)
Thank you! Your answers are even more than what I expected! You're
suggestion seems to be very very useful and I think you saved me weeks
of thoughts!
I'll start with your hints and I'll come back with more questions as
soon as I'll have rode the camel! ;-)
>> PS: we are also evaluating using JCR for inboxes if you was wondering, but
>> this is another story, for another list ;-)
>
> You could store the mail in JCR and use messaging for the process
> flow. e.g. the JMS messages could just contain a reference (URL?) to
> the message payload.
>
> How often is the payload of the message mutated as it goes through
> maillets? If it remains kinda static and its more the headers, states
> & mailets that change mostly, it could be worth putting the payload in
> some file system / REST resource / JCR and just referring to the
> payload for large messages (say over 1-10MB)?
This really depends on custom configurations. We provides many mailets
that will alter the payload and many that simply run checks and route
the message. I guess an estimation of a generic use case could be this:
- 100% of messages we spool will have some of their header changed.
- 30% of messages will have their body changed a couple of times.
Very much appreciated, thank you again,
Stefano
Re: SMTP Server (Apache James) spooling hints
Posted by James Strachan <ja...@gmail.com>.
2008/5/8 Stefano Bagnara <ap...@bago.org>:
> Hi all,
>
> I'm an Apache JAMES committer and I'm "almost" new to ActiveMQ.
Welcome :)
> I'm starting analysis on how to replace our default spool with ActiveMQ and
> I hope you can give me some hints :-)
> It would be better to use ActiveMQ via JMS (more flexibility) but if there
> is any better solution to our problems by using specific ActiveMQ APIs then
> why not!!
I'd be tempted to use the JMS API as (i) you can if you ever need to
switch JMS providers and (ii) lots of the internal APIs to things like
data stores & transaction logs and the like do change over time.
Though maybe Camel is even easier (more in this later...)
> Our scenario is an SMTP Server so we have something like this:
>
> 1) SMTP Server receives messages and put them to the spool. The spool have
> to be persistent because once the message has been posted via SMTP we cannot
> loose it. Most time the message will be consumed very fast, so in past I
> looked at using Kaha directly for this, but maybe the 5.0 AMQ Message Store
> already handle this one in a performant way?
Yeah - I'd use the default persistence engine in ActiveMQ 5.x, the AMQ
Store which is very fast...
http://activemq.apache.org/amq-message-store.html
basically just use the out-of-the-box config :)
> 2) Our current spooling have this architecture:
> we have a single "spool" that contains messages with a "state". We read a
> random message from the spool, look at its state and then start the
> processing depending on the state itself at the end of the processing we can
> alter the state and leave the message in the spool, or we can remove it from
> the spool. In the processing we could even push more messages into the spool
> (e.g: to split the message to 2 different paths). ATM the re is no
> transaction management.
> The processing from a state to another (or to delete) is a sequence of
> micro-processings (named matchers/mailets in james), so the actual status
> depends also on what matchers/mailets have been processed so far, but we
> currently keep this in memory and never store this. So if something goes
> wrong (given that we don't have transactions) we simply start from the
> beginning of that "state processor" (I'd like to improve this issue, too,
> with the new ActiveMQ based spool).
Using transactions is a good idea; then you can atomically process a
number of messages and they are either processed or not in an ACID
way. To improve performance you might wanna use batches; say
processing 1000 messages in a single transaction; which means that
most of the operations are all asynchronous & fast other than the
transaction commit which does a sync-to-disk.
http://activemq.apache.org/should-i-use-transactions.html
> Some times the message is simply moved from one state to another a few
> times and then it is removed from the spool because of 2 causes:
> a) it has been moved to the "outgoing spool" (the spool for the messages to
> be sent to other smtp servers)
> b) it has been posted to an user inbox.
> Other times the message is altered in its content.
> So you see in James we currently have a single "message store" and we can
> "lock on a message" (so no other thread will take it) "retrieve it", "update
> and unlock it" (alter its state or state+content) or "remove it". How would
> you manage this with ActiveMQ?
With ActiveMQ you'd use a queue per state/maillet, remove it from the
queue, do something with it then put it on some other queue(s) (either
changed or the same message). The simple JMS/MOM model of sending to a
queue or consuming from a queue turns out to be very fast; allowing a
highly SEDA based asynchronous model to go really fast since there's
no locking or leasing required - and messages can flow very
asynchronously to boost throughput.
If you do find you wanna grab - edit - put back type thing alot you
could look at using JavaSpaces (or Entity Bean :). But I think for
JAMES then messaging could work well as it sounds to me (as a newbie
JAMES person) like what you're doing processing mail is kinda a pipes
and filters type model...
http://activemq.apache.org/camel/pipes-and-filters.html
which maps very well to messaging and queues.
For more background see :
http://activemq.apache.org/camel/enterprise-integration-patterns.html
btw you could maybe use Camel to describe how mail is routed from
JAMES to different maillets & queues? Then you wouldn't have to worry
about learning the JMS API (and we could switch to different spool
implementations later on if need be). It'd also then make it easier to
decide when to use queues. e.g. you might have 5 mailets; you could
put each one of them on a queue; or rather than 5 writes to a queue
you could invoke all 5 maillets in one go (in the same transaction) -
or something in between.
> 3) Outgoing spool:
> The outgoing spool in JAMES is a spool like the main spool, with the
> difference that a message delivery could fail and there is a retry schedule.
> So we try to send a message, on failure we try again 10 minutes later, then
> 30 minutes later, then 2 hours later (it is configurable) and so on. ATM we
> store the "next-attempt-date" and then each "deliverer" simply take the
> message with the minor next-attempt-date and if it is due for delivery it
> starts its work, otherwise it will simply wait the needed time (one
> deliverer is noticed when a *new* message enter this spool / They all "wait"
> on the spool and the spool is noticed one at each store).
> The most common case is:
> a) the message we received at #1 entered the spool #2 and is processed very
> fast and it ends in the outgoing spool #3 where it is delivered on the first
> attempt. In this case it would be cool if the message was in memory and
> simply written once for safety because the processing should be fast and it
> would be slow to read it again from the disk.
> b) we fail our first attempt, then it does not make sense to keep it in
> memory because we know we won't need it in the next X minutes/hours.
> Any suggestions on how to do this with ActiveMQ?
It sounds like you could use the delayer pattern...
http://activemq.apache.org/camel/delayer.html
Then have separate queues for '30 mins later', '1 hour later', '2 hours later'.
If delivery fails you send it to the next queue where messages are
attempted to be delivered in order; but just X mins from the time they
are added to the queue.
Something kinda like this in pseudo camel code...
from("activemq:outout.dispatch.attempt.1").bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.2").delay(thirtyMins).bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.3").delay(oneHour).bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.4").delay(twoHours).bean(MyDispatchThingy.class);
Then we'd just need to use the try/catch mechanism or a custom ErrorHandler
http://activemq.apache.org/camel/error-handler.html
so that if MyDispatchThingy fails to dispatch the message we dispatch
it to the next queue in the list (or delete it if we're on attempt 4
etc).
> As a last point we have to take care of 2 different use-cases:
> I) most traffic is done by fastmoving small messages but
The nice thing about the above is that you can then control
concurrency on each one of the attempt queues. So you could have, say,
1000 threads doing attempt1, and 10 threads doing attempt2 and just
one thread doing attempt 3 or 4 etc.
> II) many messages are 1-10MB in size, and a few message could be even 100MB
> or even more: how should we handle this messages in ActiveMQ given that we
> can't take them in memory but we simply want to stream then in and out from
> the server?
JMS/MOM is designed for relatively modest messages as JMS clients and
brokers try and keep messages around in RAM for maximum caching,
performance and throughput.
So you might wanna implement some kinda mechanism where messages over
a certain size; say over 10MB use BlobMessages - that is to say out of
band payloads...
http://activemq.apache.org/blob-messages.html
so you use JMS/ActiveMQ for the high performance reliable load
balancing across a cluster of boxes; but keep the message payloads on
some file system/JCR etc. Or maybe you try a middle ground where you
keep the message headers in the JMS message but leave the body as a
separate out of band entity; so you could use smart JMS routing using
message headers.
> I understand this is a lot of questions, but I would really appreciate any
> hint, even partial. I'm collecting ideas :-)
:)
> Stefano
>
> PS: we are also evaluating using JCR for inboxes if you was wondering, but
> this is another story, for another list ;-)
You could store the mail in JCR and use messaging for the process
flow. e.g. the JMS messages could just contain a reference (URL?) to
the message payload.
How often is the payload of the message mutated as it goes through
maillets? If it remains kinda static and its more the headers, states
& mailets that change mostly, it could be worth putting the payload in
some file system / REST resource / JCR and just referring to the
payload for large messages (say over 1-10MB)?
If a message has to go through, say, 5 different steps that you might
wanna load balance and cluster using different queues; it'd be painful
to read/write a 100Mb email body for each 5 steps if the payload never
changes through the 5 steps.
--
James
-------
http://macstrac.blogspot.com/
Open Source Integration
http://open.iona.com