You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Stefano Bagnara <ap...@bago.org> on 2008/05/08 10:56:06 UTC

SMTP Server (Apache James) spooling hints

Hi all,

I'm an Apache JAMES committer and I'm "almost" new to ActiveMQ.

I'm starting analysis on how to replace our default spool with ActiveMQ 
and I hope you can give me some hints :-)
It would be better to use ActiveMQ via JMS (more flexibility) but if 
there is any better solution to our problems by using specific ActiveMQ 
APIs then why not!!

Our scenario is an SMTP Server so we have something like this:

1) SMTP Server receives messages and put them to the spool. The spool 
have to be persistent because once the message has been posted via SMTP 
we cannot loose it. Most time the message will be consumed very fast, so 
in past I looked at using Kaha directly for this, but maybe the 5.0  AMQ 
Message Store already handle this one in a performant way?

2) Our current spooling have this architecture:
we have a single "spool" that contains messages with a "state". We read 
a random message from the spool, look at its state and then start the 
processing depending on the state itself at the end of the processing we 
can alter the state and leave the message in the spool, or we can remove 
it from the spool. In the processing we could even push more messages 
into the spool (e.g: to split the message to 2 different paths). ATM the 
re is no transaction management.
The processing from a state to another (or to delete) is a sequence of 
micro-processings (named matchers/mailets in james), so the actual 
status depends also on what matchers/mailets have been processed so far, 
but we currently keep this in memory and never store this. So if 
something goes wrong (given that we don't have transactions) we simply 
start from the beginning of that "state processor" (I'd like to improve 
this issue, too, with the new ActiveMQ based spool).
Some times the message is simply moved from one state to another a few 
times and then it is removed from the spool because of 2 causes:
a) it has been moved to the "outgoing spool" (the spool for the messages 
to be sent to other smtp servers)
b) it has been posted to an user inbox.
Other times the message is altered in its content.
So you see in James we currently have a single "message store" and we 
can "lock on a message" (so no other thread will take it) "retrieve it", 
"update and unlock it" (alter its state or state+content) or "remove 
it". How would you manage this with ActiveMQ?

3) Outgoing spool:
The outgoing spool in JAMES is a spool like the main spool, with the 
difference that a message delivery could fail and there is a retry 
schedule. So we try to send a message, on failure we try again 10 
minutes later, then 30 minutes later, then 2 hours later (it is 
configurable) and so on. ATM we store the "next-attempt-date" and then 
each "deliverer" simply take the message with the minor 
next-attempt-date and if it is due for delivery it starts its work, 
otherwise it will simply wait the needed time (one deliverer is noticed 
when a *new* message enter this spool / They all "wait" on the spool and 
the spool is noticed one at each store).
The most common case is:
a) the message we received at #1 entered the spool #2 and is processed 
very fast and it ends in the outgoing spool #3 where it is delivered on 
the first attempt. In this case it would be cool if the message was in 
memory and simply written once for safety because the processing should 
be fast and it would be slow to read it again from the disk.
b) we fail our first attempt, then it does not make sense to keep it in 
memory because we know we won't need it in the next X minutes/hours.
Any suggestions on how to do this with ActiveMQ?

As a last point we have to take care of 2 different use-cases:
I) most traffic is done by fastmoving small messages but
II) many messages are 1-10MB in size, and a few message could be even 
100MB or even more: how should we handle this messages in ActiveMQ given 
that we can't take them in memory but we simply want to stream then in 
and out from the server?

I understand this is a lot of questions, but I would really appreciate 
any hint, even partial. I'm collecting ideas :-)

Stefano

PS: we are also evaluating using JCR for inboxes if you was wondering, 
but this is another story, for another list ;-)

Re: SMTP Server (Apache James) spooling hints

Posted by Stefano Bagnara <ap...@bago.org>.

James Strachan ha scritto:
> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>> James Strachan ha scritto:
>>> Another option is to use durable topics where a message is written
>>> once and all durable topic subscribers just get a kinda pointer to it.
>> I'm not sure I understand how this would work :-(
> 
> So imagine you've 5 mailets that need to process a message. You can
> write the message to 5 queues; or write the message to a single topic
> and have 5 'durable topic subscribers' for each maillet. That way the
> message is written once and each durable topic subscriber basically
> keeps a pointer to the message.

I think I'm lost :-(

A standard scenario is that I have 3 processors:
"root" => this is where new mail (e.g: incoming from smtp) enters.
"filter" => where we decide if it is spam, local, or remote
"outgoing" => where we deliver it.

A processor in James language is a sequence of "matcher/mailets".

Currently a random mail is took from the spool, then we look at the 
current state (root, filter, outgoing) and run the processing for that 
state. The processing works like this: run the first matcher, if it 
matches run its mailet, if they didn't change the status then move to 
the second matcher, and so on. At the end of the processor the status 
have to be changed somewhere. (setting the status to "ghost" means drop 
the mail). So change the status is like moving to another queue. At each 
status change we update the status on the queue (or the whole message if 
it changed) and "unlock" it for another thread to take care of 
processing it through the new processor later.

(there is also a detail that a matcher can partially match so a copy of 
the message is created for the 2 paths to be followed, but I'm ignoring 
this at this level).

The 1:1 mapping would be to have 3 queues and do everything else like we 
do now, another "more granular" approach would be to identify queue for 
status+"matcher/mailet position" so that we have root-1, root-2, 
filter-1, filter-2, filter-3, filter-n, outgoing-1 as separate queue and 
this would give us persistence of the status in a more granular way, but 
maybe this is not needed.

You say that I can use the "durable topics" but I don't get how.
The smtpserver receive a new message and publish it in the topic, who is 
subscribed to this topic? if I subscribe each of my "processors" (root, 
filter, outgoing) how do they know that only root have to check it and 
then IF root move it to filter/outgoing status then the others will have 
to take care of it? Maybe I misunderstood and you use a combination of 
durable topics for some data and queues for some other data, but I'm 
lost on this....

I would be tempted to go for the use of the JCR to store the full 
mimemessage as soon as I receive the message and then simply put the 
"envelope" in the messaging system with a reference to the JCR so that 
the message to be moved from queue to queue (persistents) will be very 
small, but this way I'll pay the JCR storage every time even for simple 
messages I simply have to relay (that in case of AMQ Message Store I 
would simply write to the datalog), and I guess that writing to JCR does 
cost MUCH MORE than writing to ActiveMQ datalog, is this a correct guess?

Thank you,
Stefano

Re: SMTP Server (Apache James) spooling hints

Posted by James Strachan <ja...@gmail.com>.

2008/5/9 Stefano Bagnara <ap...@bago.org>:
> James Strachan ha scritto:
>>
>> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>>>
>>>  What does it happen under the hood when I use so many queues? Is the
>>> message fully written to disk each time I move it from a queue to another
>>> or
>>> does it simply update a reference when it belongs to the same store?
>>
>> Yeah, currently we do that.
>
> It was an "or" question, but I guess from the following sentence that you
> mean that you write the full message for each queue "move", right?

Yeah

>> Another option is to use durable topics where a message is written
>> once and all durable topic subscribers just get a kinda pointer to it.
>
> I'm not sure I understand how this would work :-(

So imagine you've 5 mailets that need to process a message. You can
write the message to 5 queues; or write the message to a single topic
and have 5 'durable topic subscribers' for each maillet. That way the
message is written once and each durable topic subscriber basically
keeps a pointer to the message.


> I liked the multiple queue solution: is there any way to limit the "writes"
> on disk with some persistent+non-persistent + longtransactions strategy?

in ActiveMQ things are either persistent; where they are written to
disk ASAP (though its up to the producer to decide if it wants to
block for it to be written completely to disk - the default - or if it
is happy to get on with something else while the write occurs) - or
they are non-persistent.

See http://activemq.apache.org/what-is-the-difference-between-persistent-and-non-persistent-delivery.html

With non-persistent we now support spooling to disk if you are running
out of RAM as another hybrid option.

The main QoS to decide really is, if you kill & restart a broker are
you happy to loose stuff?


> The fact is that my of my "most common scenario" is a input mail being
> processed through many states wihtout being altered and after 5-6 state
> changes (processor changes/queue changes) each one having 3-5
> matchers/mailets it is delivered remotely or stored locally.
> I could always store the payload to JCR so to not rewrite it multiple times,
> but I fear that even for the simple JMS message writing it once for queue
> (or even worse, once for each mailet) would be a performance issue (current
> james run an UPDATE spool set state = #newstate# where ID = #id# for status
> change and does not track persistently the "substatust" of the specific
> mailet being processed, because all the mailets in a given processor are
> processed at once for a given message).
>
>> [...]
>> https://issues.apache.org/jira/browse/INFRA-1607
>>
>> feel free to vote for it :)
>
> Done!
> I also checked on confluence administration side to see if something was
> wrong with the snippet plugin but it seems to be ok, so we'll have to wait
> for the infra team.

Yeah :(


>> As an aside - for a while I've been pondering about adding a maillet
>> support into Camel for easy Camel <-> JAMES integration.
>>
>> Something wacky to think about - which might be a bit too much Camel
>> internals for now but bear with me..
>> [... a lot of interesting technical stuff...]
>
> ATM it is very hard for me to follow you on this. I think I will have to
> read this again once I'll be more familiar with camel/activemq :-)

I thought so - never mind; if you ever get hooked on Camel come back
and read it again later and it might make a bit more sense, hopefully
:)


> But be sure that I bookmarked it and I want to try the road are trying to
> show me!!

:)
-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: SMTP Server (Apache James) spooling hints

Posted by Stefano Bagnara <ap...@bago.org>.

James Strachan ha scritto:
> 2008/5/9 Stefano Bagnara <ap...@bago.org>:
>>  What does it happen under the hood when I use so many queues? Is the
>> message fully written to disk each time I move it from a queue to another or
>> does it simply update a reference when it belongs to the same store?
> 
> Yeah, currently we do that.

It was an "or" question, but I guess from the following sentence that 
you mean that you write the full message for each queue "move", right?

> Another option is to use durable topics where a message is written
> once and all durable topic subscribers just get a kinda pointer to it.

I'm not sure I understand how this would work :-(
I liked the multiple queue solution: is there any way to limit the 
"writes" on disk with some persistent+non-persistent + longtransactions 
strategy?

The fact is that my of my "most common scenario" is a input mail being 
processed through many states wihtout being altered and after 5-6 state 
changes (processor changes/queue changes) each one having 3-5 
matchers/mailets it is delivered remotely or stored locally.
I could always store the payload to JCR so to not rewrite it multiple 
times, but I fear that even for the simple JMS message writing it once 
for queue (or even worse, once for each mailet) would be a performance 
issue (current james run an UPDATE spool set state = #newstate# where ID 
= #id# for status change and does not track persistently the 
"substatust" of the specific mailet being processed, because all the 
mailets in a given processor are processed at once for a given message).

> [...]
> https://issues.apache.org/jira/browse/INFRA-1607
> 
> feel free to vote for it :)

Done!
I also checked on confluence administration side to see if something was 
wrong with the snippet plugin but it seems to be ok, so we'll have to 
wait for the infra team.

> As an aside - for a while I've been pondering about adding a maillet
> support into Camel for easy Camel <-> JAMES integration.
> 
> Something wacky to think about - which might be a bit too much Camel
> internals for now but bear with me..
> [... a lot of interesting technical stuff...]

ATM it is very hard for me to follow you on this. I think I will have to 
read this again once I'll be more familiar with camel/activemq :-)
But be sure that I bookmarked it and I want to try the road are trying 
to show me!!

Thank you,
Stefano

Re: SMTP Server (Apache James) spooling hints

Posted by James Strachan <ja...@gmail.com>.

>>  I looked at the website and found an error in this page:
>>  http://activemq.apache.org/camel/spring-xml-extensions.html
>>  "An error occurred: Connection refused. The system administrator has been
>> notified."

> Unfortunately its due to the recent svn issues we've had at Apache.
> Snippets that were working totally fine in loads of confluence wikis
> are now totally borked :(
> https://issues.apache.org/jira/browse/INFRA-1607

Its fixed!

Here's those pages working...
http://cwiki.apache.org/CAMEL/spring.html
http://cwiki.apache.org/CAMEL/spring-xml-extensions.html

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: SMTP Server (Apache James) spooling hints

Posted by James Strachan <ja...@gmail.com>.

2008/5/9 Stefano Bagnara <ap...@bago.org>:
> James Strachan ha scritto:
>
> >
> > 2008/5/8 Stefano Bagnara <ap...@bago.org>:
> >
> > >  I'm starting analysis on how to replace our default spool with ActiveMQ
> and
> > >  [...] in James we currently have a single "message store" and we can
> > >
> > > "lock on a message" (so no other thread will take it) "retrieve it",
> "update
> > > and unlock it" (alter its state or state+content) or "remove it". How
> would
> > > you manage this with ActiveMQ?
> > >
> >
> >
> > With ActiveMQ you'd use a queue per state/maillet, remove it from the
> > queue, do something with it then put it on some other queue(s) (either
> > changed or the same message). The simple JMS/MOM model of sending to a
> > queue or consuming from a queue turns out to be very fast; allowing a
> > highly SEDA based asynchronous model to go really fast since there's
> > no locking or leasing required - and messages can flow very
> > asynchronously to boost throughput.
> >
>
>  What does it happen under the hood when I use so many queues? Is the
> message fully written to disk each time I move it from a queue to another or
> does it simply update a reference when it belongs to the same store?

Yeah, currently we do that.

Another option is to use durable topics where a message is written
once and all durable topic subscribers just get a kinda pointer to it.



>  I looked at the website and found an error in this page:
>  http://activemq.apache.org/camel/spring-xml-extensions.html
>  "An error occurred: Connection refused. The system administrator has been
> notified."

>  I looked at the CWIKI sources
> (http://cwiki.removeme_apache.org/confluence/display/CAMEL/Spring+XML+Extensions)
> and I see this:
>
> {snippet:id=e3|lang=xml|url=activemq/camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/builder/spring_route_builder_test.xml}
>  Not sure but maybe you have to add svn.apache.org/repos/asf/ in front of
> it?

Unfortunately its due to the recent svn issues we've had at Apache.
Snippets that were working totally fine in loads of confluence wikis
are now totally borked :(
https://issues.apache.org/jira/browse/INFRA-1607

feel free to vote for it :)


> > >  I understand this is a lot of questions, but I would really appreciate
> any
> > > hint, even partial. I'm collecting ideas :-)
> > >
> >
> > :)
> >
>
>  Thank you! Your answers are even more than what I expected! You're
> suggestion seems to be very very useful and I think you saved me weeks of
> thoughts!

You're most welcome! :)

As an aside - for a while I've been pondering about adding a maillet
support into Camel for easy Camel <-> JAMES integration.

Something wacky to think about - which might be a bit too much Camel
internals for now but bear with me..

Camel has a really neat extensible type conversion library...
http://activemq.apache.org/camel/type-converter.html

so that you can grab a message body or header as any type you like; be
it a stream, string, byte[], Document, TrAX Source or whatever. Very
handy for wiring things together!

When you invoke beans in a route like this...
f  rom("activemq:SomeQueue").bean(SomeBean.class)

we use the bean integration to figure out how to invoke the bean
method from a message...
http://activemq.apache.org/camel/bean-integration.html

One of the little known things is that to invoke a bean, Camel first
tries to coerce the bean into a Processor and if it can it uses that
http://activemq.apache.org/camel/processor.html

An example of this is the ActiveMQ component for Camel which allows
you to invoke any JMS MessageListener within any Camel route -
irrespective of what message is being used...
http://activemq.apache.org/camel/activemq.html

This is implemented by writing a Camel Type Converter that can turn
any MessageListener instance into a Camel Processor - see the
toProcessor() method
https://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/camel/converter/ActiveMQMessageConverter.java

So we could have awesome JAMES integration in Camel by doing the same
thing; creating converters between Camel's Message / Exchange types
and JAMES/JavaMail's APIs for messages, or for creating a Processor
from a Maillet so that we can invoke a Maillet within any Camel route
- whether the message is coming from JMS, file system, database or
JavaMail/JAMES etc



>  I'll start with your hints and I'll come back with more questions as soon
> as I'll have rode the camel! ;-)

Great! :)

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: SMTP Server (Apache James) spooling hints

Posted by Stefano Bagnara <ap...@bago.org>.

James Strachan ha scritto:
> 2008/5/8 Stefano Bagnara <ap...@bago.org>:
>>  I'm starting analysis on how to replace our default spool with ActiveMQ and
>>  [...] in James we currently have a single "message store" and we can
>> "lock on a message" (so no other thread will take it) "retrieve it", "update
>> and unlock it" (alter its state or state+content) or "remove it". How would
>> you manage this with ActiveMQ?
> 
> With ActiveMQ you'd use a queue per state/maillet, remove it from the
> queue, do something with it then put it on some other queue(s) (either
> changed or the same message). The simple JMS/MOM model of sending to a
> queue or consuming from a queue turns out to be very fast; allowing a
> highly SEDA based asynchronous model to go really fast since there's
> no locking or leasing required - and messages can flow very
> asynchronously to boost throughput.

What does it happen under the hood when I use so many queues? Is the 
message fully written to disk each time I move it from a queue to 
another or does it simply update a reference when it belongs to the same 
store?

> If you do find you wanna grab - edit - put back type thing alot you
> could look at using JavaSpaces (or Entity Bean :). But I think for
> JAMES then messaging could work well as it sounds to me (as a newbie
> JAMES person) like what you're doing processing mail is kinda a pipes
> and filters type model...
> http://activemq.apache.org/camel/pipes-and-filters.html
> 
> which maps very well to messaging and queues.

Cool! This is very interesting and I never read about it. So I'm going 
to study it and to play with it a bit.

> For more background see :
> http://activemq.apache.org/camel/enterprise-integration-patterns.html
> 
>>  3) Outgoing spool:
>>  The outgoing spool in JAMES is a spool like the main spool, with the
>> difference that a message delivery could fail and there is a retry schedule.
>> [...]
>>  Any suggestions on how to do this with ActiveMQ?
> 
> It sounds like you could use the delayer pattern...
> http://activemq.apache.org/camel/delayer.html
> 
> Then have separate queues for '30 mins later', '1 hour later', '2 hours later'.
> [...]

I looked at the website and found an error in this page:
http://activemq.apache.org/camel/spring-xml-extensions.html
"An error occurred: Connection refused. The system administrator has 
been notified."
I looked at the CWIKI sources 
(http://cwiki.removeme_apache.org/confluence/display/CAMEL/Spring+XML+Extensions) 
and I see this:
{snippet:id=e3|lang=xml|url=activemq/camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/builder/spring_route_builder_test.xml}
Not sure but maybe you have to add svn.apache.org/repos/asf/ in front of it?

>>  I understand this is a lot of questions, but I would really appreciate any
>> hint, even partial. I'm collecting ideas :-)
> 
> :)

Thank you! Your answers are even more than what I expected! You're 
suggestion seems to be very very useful and I think you saved me weeks 
of thoughts!
I'll start with your hints and I'll come back with more questions as 
soon as I'll have rode the camel! ;-)

>>  PS: we are also evaluating using JCR for inboxes if you was wondering, but
>> this is another story, for another list ;-)
> 
> You could store the mail in JCR and use messaging for the process
> flow. e.g. the JMS messages could just contain a reference (URL?) to
> the message payload.
> 
> How often is the payload of the message mutated as it goes through
> maillets? If it remains kinda static and its more the headers, states
> & mailets that change mostly, it could be worth putting the payload in
> some file system / REST resource / JCR and just referring to the
> payload for large messages (say over 1-10MB)?

This really depends on custom configurations. We provides many mailets 
that will alter the payload and many that simply run checks and route 
the message. I guess an estimation of a generic use case could be this:
- 100% of messages we spool will have some of their header changed.
-  30% of messages will have their body changed a couple of times.

Very much appreciated, thank you again,
Stefano

Re: SMTP Server (Apache James) spooling hints

Posted by James Strachan <ja...@gmail.com>.

2008/5/8 Stefano Bagnara <ap...@bago.org>:
> Hi all,
>
>  I'm an Apache JAMES committer and I'm "almost" new to ActiveMQ.

Welcome :)

>  I'm starting analysis on how to replace our default spool with ActiveMQ and
> I hope you can give me some hints :-)
>  It would be better to use ActiveMQ via JMS (more flexibility) but if there
> is any better solution to our problems by using specific ActiveMQ APIs then
> why not!!

I'd be tempted to use the JMS API as (i) you can if you ever need to
switch JMS providers and (ii) lots of the internal APIs to things like
data stores & transaction logs and the like do change over time.
Though maybe Camel is even easier (more in this later...)


>  Our scenario is an SMTP Server so we have something like this:
>
>  1) SMTP Server receives messages and put them to the spool. The spool have
> to be persistent because once the message has been posted via SMTP we cannot
> loose it. Most time the message will be consumed very fast, so in past I
> looked at using Kaha directly for this, but maybe the 5.0  AMQ Message Store
> already handle this one in a performant way?

Yeah - I'd use the default persistence engine in ActiveMQ 5.x, the AMQ
Store which is very fast...
http://activemq.apache.org/amq-message-store.html

basically just use the out-of-the-box config :)


>  2) Our current spooling have this architecture:
>  we have a single "spool" that contains messages with a "state". We read a
> random message from the spool, look at its state and then start the
> processing depending on the state itself at the end of the processing we can
> alter the state and leave the message in the spool, or we can remove it from
> the spool. In the processing we could even push more messages into the spool
> (e.g: to split the message to 2 different paths). ATM the re is no
> transaction management.
>  The processing from a state to another (or to delete) is a sequence of
> micro-processings (named matchers/mailets in james), so the actual status
> depends also on what matchers/mailets have been processed so far, but we
> currently keep this in memory and never store this. So if something goes
> wrong (given that we don't have transactions) we simply start from the
> beginning of that "state processor" (I'd like to improve this issue, too,
> with the new ActiveMQ based spool).

Using transactions is a good idea; then you can atomically process a
number of messages and they are either processed or not in an ACID
way. To improve performance you might wanna use batches; say
processing 1000 messages in a single transaction; which means that
most of the operations are all asynchronous & fast other than the
transaction commit which does a sync-to-disk.
http://activemq.apache.org/should-i-use-transactions.html


>  Some times the message is simply moved from one state to another a few
> times and then it is removed from the spool because of 2 causes:
>  a) it has been moved to the "outgoing spool" (the spool for the messages to
> be sent to other smtp servers)
>  b) it has been posted to an user inbox.
>  Other times the message is altered in its content.
>  So you see in James we currently have a single "message store" and we can
> "lock on a message" (so no other thread will take it) "retrieve it", "update
> and unlock it" (alter its state or state+content) or "remove it". How would
> you manage this with ActiveMQ?

With ActiveMQ you'd use a queue per state/maillet, remove it from the
queue, do something with it then put it on some other queue(s) (either
changed or the same message). The simple JMS/MOM model of sending to a
queue or consuming from a queue turns out to be very fast; allowing a
highly SEDA based asynchronous model to go really fast since there's
no locking or leasing required - and messages can flow very
asynchronously to boost throughput.

If you do find you wanna grab - edit - put back type thing alot you
could look at using JavaSpaces (or Entity Bean :). But I think for
JAMES then messaging could work well as it sounds to me (as a newbie
JAMES person) like what you're doing processing mail is kinda a pipes
and filters type model...
http://activemq.apache.org/camel/pipes-and-filters.html

which maps very well to messaging and queues.

For more background see :
http://activemq.apache.org/camel/enterprise-integration-patterns.html

btw you could maybe use Camel to describe how mail is routed from
JAMES to different maillets & queues? Then you wouldn't have to worry
about learning the JMS API (and we could switch to different spool
implementations later on if need be). It'd also then make it easier to
decide when to use queues. e.g. you might have 5 mailets; you could
put each one of them on a queue; or rather than 5 writes to a queue
you could invoke all 5 maillets in one go (in the same transaction) -
or something in between.


>  3) Outgoing spool:
>  The outgoing spool in JAMES is a spool like the main spool, with the
> difference that a message delivery could fail and there is a retry schedule.
> So we try to send a message, on failure we try again 10 minutes later, then
> 30 minutes later, then 2 hours later (it is configurable) and so on. ATM we
> store the "next-attempt-date" and then each "deliverer" simply take the
> message with the minor next-attempt-date and if it is due for delivery it
> starts its work, otherwise it will simply wait the needed time (one
> deliverer is noticed when a *new* message enter this spool / They all "wait"
> on the spool and the spool is noticed one at each store).
>  The most common case is:
>  a) the message we received at #1 entered the spool #2 and is processed very
> fast and it ends in the outgoing spool #3 where it is delivered on the first
> attempt. In this case it would be cool if the message was in memory and
> simply written once for safety because the processing should be fast and it
> would be slow to read it again from the disk.
>  b) we fail our first attempt, then it does not make sense to keep it in
> memory because we know we won't need it in the next X minutes/hours.
>  Any suggestions on how to do this with ActiveMQ?

It sounds like you could use the delayer pattern...
http://activemq.apache.org/camel/delayer.html


Then have separate queues for '30 mins later', '1 hour later', '2 hours later'.

If delivery fails you send it to the next queue where messages are
attempted to be delivered in order; but just X mins from the time they
are added to the queue.

Something kinda like this in pseudo camel code...

from("activemq:outout.dispatch.attempt.1").bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.2").delay(thirtyMins).bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.3").delay(oneHour).bean(MyDispatchThingy.class);
from("activemq:output.dispatch.attempt.4").delay(twoHours).bean(MyDispatchThingy.class);

Then we'd just need to use the try/catch mechanism or a custom ErrorHandler
http://activemq.apache.org/camel/error-handler.html

so that if MyDispatchThingy fails to dispatch the message we dispatch
it to the next queue in the list (or delete it if we're on attempt 4
etc).



>  As a last point we have to take care of 2 different use-cases:
>  I) most traffic is done by fastmoving small messages but


The nice thing about the above is that you can then control
concurrency on each one of the attempt queues. So you could have, say,
1000 threads doing attempt1, and 10 threads doing attempt2 and just
one thread doing attempt 3 or 4 etc.


>  II) many messages are 1-10MB in size, and a few message could be even 100MB
> or even more: how should we handle this messages in ActiveMQ given that we
> can't take them in memory but we simply want to stream then in and out from
> the server?

JMS/MOM is designed for relatively modest messages as JMS clients and
brokers try and keep messages around in RAM for maximum caching,
performance and throughput.

So you might wanna implement some kinda mechanism where messages over
a certain size; say over 10MB use BlobMessages - that is to say out of
band payloads...
http://activemq.apache.org/blob-messages.html

so you use JMS/ActiveMQ for the high performance reliable load
balancing across a cluster of boxes; but keep the message payloads on
some file system/JCR etc. Or maybe you try a middle ground where you
keep the message headers in the JMS message but leave the body as a
separate out of band entity; so you could use smart JMS routing using
message headers.


>  I understand this is a lot of questions, but I would really appreciate any
> hint, even partial. I'm collecting ideas :-)

:)

>  Stefano
>
>  PS: we are also evaluating using JCR for inboxes if you was wondering, but
> this is another story, for another list ;-)

You could store the mail in JCR and use messaging for the process
flow. e.g. the JMS messages could just contain a reference (URL?) to
the message payload.

How often is the payload of the message mutated as it goes through
maillets? If it remains kinda static and its more the headers, states
& mailets that change mostly, it could be worth putting the payload in
some file system / REST resource / JCR and just referring to the
payload for large messages (say over 1-10MB)?

If a message has to go through, say, 5 different steps that you might
wanna load balance and cluster using different queues; it'd be painful
to read/write a 100Mb email body for each 5 steps if the payload never
changes through the 5 steps.

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com