You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Tellier Benoit <bt...@apache.org> on 2015/11/28 19:09:05 UTC

James new event system

Hi,

I just wanted to present my work on James event system.

## What is James event system ?

The mailbox event system conveys notifications about modifications of
the mailboxes and messages states. You can register listener to it so
that you can be notified.

## What it is used for ?

It is used for :

 - IMAP IDLE : allow one to subscribe to a specific mailbox and gets
notified about changes without to pull the mailbox.

 - Quota system : updates about stored quota are made outside the
MailboxManager as it may involve large quota calculations

 - Indexing of messages for the Search feature (ElasticSearch and Lucene
implementation )

 - IMAP Sequence Number handling.

 - Cache invalidation (caching project, not yet exposed to configuration)

 - Many others

## Why do we need it to be distributed ?

I want to see this feature distributed as I personally really love IDLE
feature. I want my Thunderbird to be allowed to use this in a
distributed environment.

I also think one might be interested to make several James work in
parallel with any kind of architecture (Quotas, messages search indexes).

## What are different configuration options ?

I reviewed the event system.

First thing is to explicitly specify a listener distributed status. It
can be either :

 - Registered per mailbox
 - The listener needs just to be notified about all local events
 - The listener needs to be notified about all events in your James cluster.

Then, we keep the in memory default implementation (little reworked
using guava). And I added two other architectures for the event system.

#### Registration based event system

With this implementation, you want to exchange events on the network.
You want a James system to be only notified about events it explicitly
registered to. Because of that :

 - This approach is thought for architecture with a large number of
James server
 - It does not support event listener that needs to be notified of all
events in the cluster.

Each server listens on a message queue and a registration mechanism is
used to identify to which server we need to send the events. Of course
you have event serialization / deserialization.

Today :
 - Kafka is used for the messaging
 - Cassandra is used for registration management

This solution was presented at Paris Cassandra Meet-up.

#### Broadcast event system

With this implementation, you want to have several James working
together but you relies on Mailbox Listeners that needs to be notified
about every event in your data center.

These listeners could be :

 - Lucene document indexing
 - In memory quotas
 - In memory cache

The idea here is to naively broadcast the events to all your James. They
are notified about every events (so scalability will be limited).

You also have to be aware that events can be duplicated /non emitted
(james server crash, network partitions) so local data might be
inconsistent. It seems OK for instance for quota calculation.

## What do I need to know as an administrator ?

Distributed use of Message Sequence Number (that demands high degree of
coordination) is risky. The inconsistency window between server may be
large, and the corresponding between UID and message sequence number is
not eventually consistent. This topic is in discussion on the dev
mailing list.

I corrected an issue I spotted month before : a faulty mailbox listener
might stop the event delivery chain and generate IMAP service
unavailability. I added a commit to not propagate errors inside mailbox
Listeners.

I want to finish this section by speaking of event serialization. You
can either choose :

 - JSON
 - MessagePack

The first one is faster to compute but larger. So it let you trade
compute power versus network.

## Event delivery modes

As you might have noticed, Mailbox Listener can take a long time to
execute, and for some of them, they can safely be executed
asynchronously (IDLE, indexation and even quotas).

I added an Event Delivery abstraction. Thanks to this, you can configure
your James to :

 - Synchronously deliver events (todays behavior)
 - Asynchronously deliver events ( returns before having delivered
events, Mailbox Listener are notified in parallel in a thread pool)
 - Mixed mode : Every Mailbox Listener indicates if it should be
synchronously or asynchronously executed.

The asynchronous option can be considered as risky. The mixed one is
safe, and significantly reduces latencies if you rely on document indexing.

## Re indexers

I also added the availability to re index documents in a Message Search
index using the CLI :

 - per mailbox : the event system is used to track changes made to the
given mailbox and significantly reduce the concurrent changes window.
 - your whole James mailboxes : the event system is used to keep track
of deleted mailboxes.

## My future works on the event system.

Finish the work on MAILBOX-257 : one should be able to recalculate quotas.

Unfortunately it is not yet planned in my todo list...

Benoit



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James new event system

Posted by Benoit Tellier <be...@minet.net>.
Hi Robert,

While traveling back from holidays, I took some time to write a quick
blog post on the topic :

http://blog.btellier.com/article/72

If you are interrested...

And if you have any question, don't hesitate to ask me directly.

Benoit





Le 02/01/2016 04:06, Robert Munn a écrit :
> Benoit,
> 
> This is very interesting work, thank you for contributing. I am interested in learning more about the event system.
> 
> Robert
> 
> 
>> On Nov 28, 2015, at 10:09 AM, Tellier Benoit <bt...@apache.org> wrote:
>>
>> Hi,
>>
>> I just wanted to present my work on James event system.
>>
>> ## What is James event system ?
>>
>> The mailbox event system conveys notifications about modifications of
>> the mailboxes and messages states. You can register listener to it so
>> that you can be notified.
>>
>> ## What it is used for ?
>>
>> It is used for :
>>
>> - IMAP IDLE : allow one to subscribe to a specific mailbox and gets
>> notified about changes without to pull the mailbox.
>>
>> - Quota system : updates about stored quota are made outside the
>> MailboxManager as it may involve large quota calculations
>>
>> - Indexing of messages for the Search feature (ElasticSearch and Lucene
>> implementation )
>>
>> - IMAP Sequence Number handling.
>>
>> - Cache invalidation (caching project, not yet exposed to configuration)
>>
>> - Many others
>>
>> ## Why do we need it to be distributed ?
>>
>> I want to see this feature distributed as I personally really love IDLE
>> feature. I want my Thunderbird to be allowed to use this in a
>> distributed environment.
>>
>> I also think one might be interested to make several James work in
>> parallel with any kind of architecture (Quotas, messages search indexes).
>>
>> ## What are different configuration options ?
>>
>> I reviewed the event system.
>>
>> First thing is to explicitly specify a listener distributed status. It
>> can be either :
>>
>> - Registered per mailbox
>> - The listener needs just to be notified about all local events
>> - The listener needs to be notified about all events in your James cluster.
>>
>> Then, we keep the in memory default implementation (little reworked
>> using guava). And I added two other architectures for the event system.
>>
>> #### Registration based event system
>>
>> With this implementation, you want to exchange events on the network.
>> You want a James system to be only notified about events it explicitly
>> registered to. Because of that :
>>
>> - This approach is thought for architecture with a large number of
>> James server
>> - It does not support event listener that needs to be notified of all
>> events in the cluster.
>>
>> Each server listens on a message queue and a registration mechanism is
>> used to identify to which server we need to send the events. Of course
>> you have event serialization / deserialization.
>>
>> Today :
>> - Kafka is used for the messaging
>> - Cassandra is used for registration management
>>
>> This solution was presented at Paris Cassandra Meet-up.
>>
>> #### Broadcast event system
>>
>> With this implementation, you want to have several James working
>> together but you relies on Mailbox Listeners that needs to be notified
>> about every event in your data center.
>>
>> These listeners could be :
>>
>> - Lucene document indexing
>> - In memory quotas
>> - In memory cache
>>
>> The idea here is to naively broadcast the events to all your James. They
>> are notified about every events (so scalability will be limited).
>>
>> You also have to be aware that events can be duplicated /non emitted
>> (james server crash, network partitions) so local data might be
>> inconsistent. It seems OK for instance for quota calculation.
>>
>> ## What do I need to know as an administrator ?
>>
>> Distributed use of Message Sequence Number (that demands high degree of
>> coordination) is risky. The inconsistency window between server may be
>> large, and the corresponding between UID and message sequence number is
>> not eventually consistent. This topic is in discussion on the dev
>> mailing list.
>>
>> I corrected an issue I spotted month before : a faulty mailbox listener
>> might stop the event delivery chain and generate IMAP service
>> unavailability. I added a commit to not propagate errors inside mailbox
>> Listeners.
>>
>> I want to finish this section by speaking of event serialization. You
>> can either choose :
>>
>> - JSON
>> - MessagePack
>>
>> The first one is faster to compute but larger. So it let you trade
>> compute power versus network.
>>
>> ## Event delivery modes
>>
>> As you might have noticed, Mailbox Listener can take a long time to
>> execute, and for some of them, they can safely be executed
>> asynchronously (IDLE, indexation and even quotas).
>>
>> I added an Event Delivery abstraction. Thanks to this, you can configure
>> your James to :
>>
>> - Synchronously deliver events (todays behavior)
>> - Asynchronously deliver events ( returns before having delivered
>> events, Mailbox Listener are notified in parallel in a thread pool)
>> - Mixed mode : Every Mailbox Listener indicates if it should be
>> synchronously or asynchronously executed.
>>
>> The asynchronous option can be considered as risky. The mixed one is
>> safe, and significantly reduces latencies if you rely on document indexing.
>>
>> ## Re indexers
>>
>> I also added the availability to re index documents in a Message Search
>> index using the CLI :
>>
>> - per mailbox : the event system is used to track changes made to the
>> given mailbox and significantly reduce the concurrent changes window.
>> - your whole James mailboxes : the event system is used to keep track
>> of deleted mailboxes.
>>
>> ## My future works on the event system.
>>
>> Finish the work on MAILBOX-257 : one should be able to recalculate quotas.
>>
>> Unfortunately it is not yet planned in my todo list...
>>
>> Benoit
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>> For additional commands, e-mail: server-user-help@james.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James new event system

Posted by Robert Munn <ro...@gmail.com>.
Benoit,

This is very interesting work, thank you for contributing. I am interested in learning more about the event system.

Robert


> On Nov 28, 2015, at 10:09 AM, Tellier Benoit <bt...@apache.org> wrote:
> 
> Hi,
> 
> I just wanted to present my work on James event system.
> 
> ## What is James event system ?
> 
> The mailbox event system conveys notifications about modifications of
> the mailboxes and messages states. You can register listener to it so
> that you can be notified.
> 
> ## What it is used for ?
> 
> It is used for :
> 
> - IMAP IDLE : allow one to subscribe to a specific mailbox and gets
> notified about changes without to pull the mailbox.
> 
> - Quota system : updates about stored quota are made outside the
> MailboxManager as it may involve large quota calculations
> 
> - Indexing of messages for the Search feature (ElasticSearch and Lucene
> implementation )
> 
> - IMAP Sequence Number handling.
> 
> - Cache invalidation (caching project, not yet exposed to configuration)
> 
> - Many others
> 
> ## Why do we need it to be distributed ?
> 
> I want to see this feature distributed as I personally really love IDLE
> feature. I want my Thunderbird to be allowed to use this in a
> distributed environment.
> 
> I also think one might be interested to make several James work in
> parallel with any kind of architecture (Quotas, messages search indexes).
> 
> ## What are different configuration options ?
> 
> I reviewed the event system.
> 
> First thing is to explicitly specify a listener distributed status. It
> can be either :
> 
> - Registered per mailbox
> - The listener needs just to be notified about all local events
> - The listener needs to be notified about all events in your James cluster.
> 
> Then, we keep the in memory default implementation (little reworked
> using guava). And I added two other architectures for the event system.
> 
> #### Registration based event system
> 
> With this implementation, you want to exchange events on the network.
> You want a James system to be only notified about events it explicitly
> registered to. Because of that :
> 
> - This approach is thought for architecture with a large number of
> James server
> - It does not support event listener that needs to be notified of all
> events in the cluster.
> 
> Each server listens on a message queue and a registration mechanism is
> used to identify to which server we need to send the events. Of course
> you have event serialization / deserialization.
> 
> Today :
> - Kafka is used for the messaging
> - Cassandra is used for registration management
> 
> This solution was presented at Paris Cassandra Meet-up.
> 
> #### Broadcast event system
> 
> With this implementation, you want to have several James working
> together but you relies on Mailbox Listeners that needs to be notified
> about every event in your data center.
> 
> These listeners could be :
> 
> - Lucene document indexing
> - In memory quotas
> - In memory cache
> 
> The idea here is to naively broadcast the events to all your James. They
> are notified about every events (so scalability will be limited).
> 
> You also have to be aware that events can be duplicated /non emitted
> (james server crash, network partitions) so local data might be
> inconsistent. It seems OK for instance for quota calculation.
> 
> ## What do I need to know as an administrator ?
> 
> Distributed use of Message Sequence Number (that demands high degree of
> coordination) is risky. The inconsistency window between server may be
> large, and the corresponding between UID and message sequence number is
> not eventually consistent. This topic is in discussion on the dev
> mailing list.
> 
> I corrected an issue I spotted month before : a faulty mailbox listener
> might stop the event delivery chain and generate IMAP service
> unavailability. I added a commit to not propagate errors inside mailbox
> Listeners.
> 
> I want to finish this section by speaking of event serialization. You
> can either choose :
> 
> - JSON
> - MessagePack
> 
> The first one is faster to compute but larger. So it let you trade
> compute power versus network.
> 
> ## Event delivery modes
> 
> As you might have noticed, Mailbox Listener can take a long time to
> execute, and for some of them, they can safely be executed
> asynchronously (IDLE, indexation and even quotas).
> 
> I added an Event Delivery abstraction. Thanks to this, you can configure
> your James to :
> 
> - Synchronously deliver events (todays behavior)
> - Asynchronously deliver events ( returns before having delivered
> events, Mailbox Listener are notified in parallel in a thread pool)
> - Mixed mode : Every Mailbox Listener indicates if it should be
> synchronously or asynchronously executed.
> 
> The asynchronous option can be considered as risky. The mixed one is
> safe, and significantly reduces latencies if you rely on document indexing.
> 
> ## Re indexers
> 
> I also added the availability to re index documents in a Message Search
> index using the CLI :
> 
> - per mailbox : the event system is used to track changes made to the
> given mailbox and significantly reduce the concurrent changes window.
> - your whole James mailboxes : the event system is used to keep track
> of deleted mailboxes.
> 
> ## My future works on the event system.
> 
> Finish the work on MAILBOX-257 : one should be able to recalculate quotas.
> 
> Unfortunately it is not yet planned in my todo list...
> 
> Benoit
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org