You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/05/14 22:20:32 UTC

Questions on the Mail and MailRepository interfaces

Hi,

I'm working on the JCR mail repository for James, and I'd like to
better understand the Mail and MailRepository interfaces.

a) Should implementations of the MailRepository class be thread-safe?

b) Should the Avalon lifecycle interfaces be used when implementing
MailRepository?

c) What parts of a mail message need to be stored when
MailRepository.store(Mail) is called? For example should the mail name
(Mail.getName()) or attributes (Mail.hasAttributes(), etc.) be stored?

d) Should mailRepository.store(mailRepository.retrieve(...)) update
the existing message or create a new one?

e) Should mailRepository.retrieve(...).setMessage(...) update the mail
repository?

f) Should mailRepository.retrieve(...).getMessage().setFrom(...)
update the mail repository?

g) Should mail.setRecipients(...) affect mail.getMessage().getAllRecipients()?

h) Should mail.getMessage().addRecipients(...) affect mail.getRecipients()?

I'll probably come up with more questions, but I guess these are a
good starting point. :-)

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
> Hi,
> 
> On 5/14/07, Jukka Zitting <ju...@gmail.com> wrote:
>> I'll probably come up with more questions, [...]
> 
> Locking:
> 
> * Should a message lock be visible to other processes that access the
> same repository?

Visible?
Well, it depends on what you mean by visible.

SpoolRepository.accept() take care of returning non locked objects.
MailRepository.remove try to lock the message and fails if someone else
already locked.

> * Should a message lock automatically prevent other clients form
> modifying the node or just from acquiring the lock?

IIRC MailRepository does not have correct support for this scenario. If
the store is a mailrepository then it never happen (in real cases) that
we access the same object with multiple thread concurrently to change
the same message. In case it is a spoolrepository you're supposed to
obtain messages via accept method, and in this case you have an
exclusive lock on it.

Everything in that interfaces can be improved with much more strict
contracts.

> * Who can release a mesage lock? I notice that the Lock utility class
> always binds a lock to the current thread.

The thread that own the lock can unlock. Of course if we plan to move to
SEDA this will need to be changed or it will break our spoolrepositories
locking.

> * What happens if a client fails to unlock a message?

I think fail on unlock is not handled at all (is it possible?).

> Mail lifecycle:
> 
> * Is there a defined lifecycle for Mail instances? I'm thinking about
> associating a JCR session with a Mail instance to allow on-demand
> loading and incremental updates of mail messages, but to properly
> release the session I'd need some way to know when the Mail instance
> is no longer in use.

We always call ContainerUtil.dispose(mailobject). If the Mail
implementation implements Disposable we call mailobject.dispose().

In our default Mail implementation (MailImpl) we also take care to call
ContainerUtil.dispose(message) and if it is a
MimeMessageCopyOnWriteProxy this updates the referencecounters, and if
no one reference to the MimeMessageWrapper then also dispose for the
MimeMessageWrapper is called.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/14/07, Jukka Zitting <ju...@gmail.com> wrote:
> I'll probably come up with more questions, [...]

Locking:

* Should a message lock be visible to other processes that access the
same repository?

* Should a message lock automatically prevent other clients form
modifying the node or just from acquiring the lock?

* Who can release a mesage lock? I notice that the Lock utility class
always binds a lock to the current thread.

* What happens if a client fails to unlock a message?

Mail lifecycle:

* Is there a defined lifecycle for Mail instances? I'm thinking about
associating a JCR session with a Mail instance to allow on-demand
loading and incremental updates of mail messages, but to properly
release the session I'd need some way to know when the Mail instance
is no longer in use.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
> On 5/15/07, Danny Angus <da...@apache.org> wrote:
>> On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
>>
>> > My suggestion is to place the core non-avalon code in the main class
>> and
>> > then write an avalonized wrapper to that class (or an extended version
>> > implementing avalon interfaces).
>>
>> +1 I would be stronger and say that that is a pattern which we would
>> like to apply to all of James, separate POJO's with single
>> responsibilities from wrappers which impose or implement lifecycle.
> 
> i'd like to go even further than that :-)
> 
> IMHO each JAMES subproject should be a pure component containing POJOs
> suitable for reuse with no coupling on JAMES server. a library module
> in the server should adapt the component for use in JAMES. a
> deployment module should tool the component for deployment by
> providing configuration and other services.

Can you define "JAMES subprojects" ?

Are you referring to JAMES server modules or to real JAMES project
products (like jspf/mime4j ?)

In the latter case we already do what you already suggest.

If instead you refer to JAMES Server modules then I agree that
theoretically it should be done that way: I reserve to give a more
complete comments on proposed concrete solutions, or better code patches.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Danny Angus ha scritto:
> On 5/15/07, robert burrell donkin <ro...@gmail.com> wrote:
> 
>> IMHO each JAMES subproject should be a pure component containing POJOs
>> suitable for reuse with no coupling on JAMES server. a library module
>> in the server should adapt the component for use in JAMES. a
>> deployment module should tool the component for deployment by
>> providing configuration and other services.
> 
> +1 This has been my "vision" for years, I just didn't want to be
> provocative (again!).
> 
> d,

I think there is nothing provocative there: it is probably the only
thing we all agreed upon multiple times ;-) (well, sometimes we even all
agreed on something and then someone complained against the one trying
to do what we agreed upon <--- THIS is provocative! :-P )

The main problem is that people that didn't place the hands in the code
consider this a "minor task" and everytime I point them to some old
thread where I analyzed what we have to do in practice and what are some
of the unresolved problems with serviceable children objects generation
and so on.

IMO many JAMES Server components are obsolete and should be rewritten
from scratch. The JCR+JMS approach for repositories is for sure a good
step. There are many to do.

As another example my "controversial" (never applied) Fetchmail
refactoring was done also in this direction. I extracted 3 top level
objects that are more easy to be refactored in non-avalonized components
in a following step).

Good luck to everyone trying the POJO road. If *he* will ask help I'll
be really glad to collaborate.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/15/07, robert burrell donkin <ro...@gmail.com> wrote:

> IMHO each JAMES subproject should be a pure component containing POJOs
> suitable for reuse with no coupling on JAMES server. a library module
> in the server should adapt the component for use in JAMES. a
> deployment module should tool the component for deployment by
> providing configuration and other services.

+1 This has been my "vision" for years, I just didn't want to be
provocative (again!).

d,

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/15/07, Danny Angus <da...@apache.org> wrote:
> On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
>
> > My suggestion is to place the core non-avalon code in the main class and
> > then write an avalonized wrapper to that class (or an extended version
> > implementing avalon interfaces).
>
> +1 I would be stronger and say that that is a pattern which we would
> like to apply to all of James, separate POJO's with single
> responsibilities from wrappers which impose or implement lifecycle.

i'd like to go even further than that :-)

IMHO each JAMES subproject should be a pure component containing POJOs
suitable for reuse with no coupling on JAMES server. a library module
in the server should adapt the component for use in JAMES. a
deployment module should tool the component for deployment by
providing configuration and other services.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:

> My suggestion is to place the core non-avalon code in the main class and
> then write an avalonized wrapper to that class (or an extended version
> implementing avalon interfaces).

+1 I would be stronger and say that that is a pattern which we would
like to apply to all of James, separate POJO's with single
responsibilities from wrappers which impose or implement lifecycle.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/15/07, Danny Angus <da...@apache.org> wrote:
> On 5/15/07, Jukka Zitting <ju...@gmail.com> wrote:
> > Is the repository implementation required to keep the original key
> > (from mail.getName()) when stroring a new message, or can it replace
> > it with an internal identifier?
>
> Good question, In practice all our current implementations do.
> The benefit is that it is possible to trace a message name in the
> logs, however there is nothing (AFAIK) in the functionality which
> depends upon the name staying the same. Each cycle of activity on a
> Mail object begins with getting the name (or list of names) from the
> repository, and any Store should (but may not) mark the end of a
> cycle.
>
> I guess you have to weigh up the cost of developing the code to keep
> the name against the risk of name changes breaking something.

JCR assigns an internally generated UUID to all "referenceable" nodes.
This would be a perfect message key for a JCR based mail repository,
since it is guaranteed to be unique within a workspace, it enables
very efficient message lookups, and it allows all sorts of other nice
things like hard references,  etc.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/15/07, Jukka Zitting <ju...@gmail.com> wrote:

> OK. I guess the attributes can be any serializable objects, so the
> implementation should use standard Java serialization in case it
> doesn't know the type of the attribute.

I *think* that we originally intended attributes to be String's, but
if they aren't strongly typed then I guess Serializable is correct.

> Is the repository implementation required to keep the original key
> (from mail.getName()) when stroring a new message, or can it replace
> it with an internal identifier?

Good question, In practice all our current implementations do.
The benefit is that it is possible to trace a message name in the
logs, however there is nothing (AFAIK) in the functionality which
depends upon the name staying the same. Each cycle of activity on a
Mail object begins with getting the name (or list of names) from the
repository, and any Store should (but may not) mark the end of a
cycle.

I guess you have to weigh up the cost of developing the code to keep
the name against the risk of name changes breaking something.

I would suggest that if this is a significant factor then we could
look in  more detail at James and try to design in this variability.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/16/07, robert burrell donkin <ro...@gmail.com> wrote:

> (i came up against more difficulties when trying to implement
> OpenPGP/MIME but i can't recall then in detail right now)

Its a royal pain in the ass to figure out. Was that one?

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Norman Maurer <no...@apache.org>.
robert burrell donkin schrieb:
> On 5/16/07, Norman Maurer <no...@apache.org> wrote:
>> robert burrell donkin schrieb:
>> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
>> >> robert burrell donkin ha scritto:
>> >> > On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
>> >
>> > <snip>
>> >
>> >> > 2 MimeMessage does not play well with nio
>> >>
>> >> I agree, but: does mime4j helps in this? I dedicated it really few
>> time
>> >> and a lot of months ago, but if I remember correctly also mime4j
>> gave me
>> >> problems when I approached seda.
>> >
>> > not sure whether mime4j helps directly
>> >
>> > a richer writing API would be useful for server side code. with a
>> > better variety of operations (for example, writing out all headers or
>> > writing out a particular mime par) and a better variety of outputs
>> > (for example, buffers or channels as well as streams).
>>
>> Writting all headers is allready possible and writting bodyparts too. I
>> introduced this methods some month ago . But I think mim4j is not really
>> nio friendly yet :-/
>
> but this could be added (unlike javamail)
>
> - robert

For sure.. Help is welcome ;-)

bye
Norman



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
>> > a richer writing API would be useful for server side code. with a
>> > better variety of operations (for example, writing out all headers or
>> > writing out a particular mime par) and a better variety of outputs
>> > (for example, buffers or channels as well as streams).
>>
>> Writting all headers is allready possible and writting bodyparts too. I
>> introduced this methods some month ago . But I think mim4j is not really
>> nio friendly yet :-/
> 
> but this could be added (unlike javamail)
> 
> - robert

True.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Norman Maurer <no...@apache.org> wrote:
> robert burrell donkin schrieb:
> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> >> robert burrell donkin ha scritto:
> >> > On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> >
> > <snip>
> >
> >> > 2 MimeMessage does not play well with nio
> >>
> >> I agree, but: does mime4j helps in this? I dedicated it really few time
> >> and a lot of months ago, but if I remember correctly also mime4j gave me
> >> problems when I approached seda.
> >
> > not sure whether mime4j helps directly
> >
> > a richer writing API would be useful for server side code. with a
> > better variety of operations (for example, writing out all headers or
> > writing out a particular mime par) and a better variety of outputs
> > (for example, buffers or channels as well as streams).
>
> Writting all headers is allready possible and writting bodyparts too. I
> introduced this methods some month ago . But I think mim4j is not really
> nio friendly yet :-/

but this could be added (unlike javamail)

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Norman Maurer <no...@apache.org>.
robert burrell donkin schrieb:
> On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
>> robert burrell donkin ha scritto:
>> > On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
>
> <snip>
>
>> > 2 MimeMessage does not play well with nio
>>
>> I agree, but: does mime4j helps in this? I dedicated it really few time
>> and a lot of months ago, but if I remember correctly also mime4j gave me
>> problems when I approached seda.
>
> not sure whether mime4j helps directly
>
> a richer writing API would be useful for server side code. with a
> better variety of operations (for example, writing out all headers or
> writing out a particular mime par) and a better variety of outputs
> (for example, buffers or channels as well as streams).

Writting all headers is allready possible and writting bodyparts too. I
introduced this methods some month ago . But I think mim4j is not really
nio friendly yet :-/


>
>> > 3 the automatic type conversions are unhelpful
>>
>> This is true for the mail server, not true (IMHO) for the mailet
>> specification: mailet author have probably better life with
>> automatically converted mime objects.
>
> depends on the mailet. being given given the choice of raw or
> processed data would be great.

+1

>
>> > 4 automatic encoding conversions are unhelpful
>>
>> Same considerations as for 3.
>
> +1
>
> - robert

+1

Norman



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> robert burrell donkin ha scritto:
> > On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:

<snip>

> > 2 MimeMessage does not play well with nio
>
> I agree, but: does mime4j helps in this? I dedicated it really few time
> and a lot of months ago, but if I remember correctly also mime4j gave me
> problems when I approached seda.

not sure whether mime4j helps directly

a richer writing API would be useful for server side code. with a
better variety of operations (for example, writing out all headers or
writing out a particular mime par) and a better variety of outputs
(for example, buffers or channels as well as streams).

> > 3 the automatic type conversions are unhelpful
>
> This is true for the mail server, not true (IMHO) for the mailet
> specification: mailet author have probably better life with
> automatically converted mime objects.

depends on the mailet. being given given the choice of raw or
processed data would be great.

> > 4 automatic encoding conversions are unhelpful
>
> Same considerations as for 3.

+1

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
> On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
>> robert burrell donkin ha scritto:
>> > i've come to the conclusion that MimeMessage really isn't suitable for
>> > use in a protocol like IMAP. IMHO a different server side mail
>> > abstraction is needed.
>>
>> I not necessarily disagree with this, but I never found a real technical
>> motivation for this. Can you list yours?
> 
> here are some off the top of my head:
> 
> 1 a number of methods in MimeMessage return may bad data values rather
> than throwing exceptions

I agree.

> 2 MimeMessage does not play well with nio

I agree, but: does mime4j helps in this? I dedicated it really few time
and a lot of months ago, but if I remember correctly also mime4j gave me
problems when I approached seda.

> 3 the automatic type conversions are unhelpful

This is true for the mail server, not true (IMHO) for the mailet
specification: mailet author have probably better life with
automatically converted mime objects.

> 4 automatic encoding conversions are unhelpful

Same considerations as for 3.

> 5 MimeMessage has a tendency to thrown unhelpful runtime exceptions

I never found (or remember) this specific issue with MimeMessage, but I
admin this happens often in javamail classes.

> (i came up against more difficulties when trying to implement
> OpenPGP/MIME but i can't recall then in detail right now)
> 
> - robert

Thank you, really.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> robert burrell donkin ha scritto:
> > i've come to the conclusion that MimeMessage really isn't suitable for
> > use in a protocol like IMAP. IMHO a different server side mail
> > abstraction is needed.
>
> I not necessarily disagree with this, but I never found a real technical
> motivation for this. Can you list yours?

here are some off the top of my head:

1 a number of methods in MimeMessage return may bad data values rather
than throwing exceptions
2 MimeMessage does not play well with nio
3 the automatic type conversions are unhelpful
4 automatic encoding conversions are unhelpful
5 MimeMessage has a tendency to thrown unhelpful runtime exceptions

(i came up against more difficulties when trying to implement
OpenPGP/MIME but i can't recall then in detail right now)

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
> i've come to the conclusion that MimeMessage really isn't suitable for
> use in a protocol like IMAP. IMHO a different server side mail
> abstraction is needed.

I not necessarily disagree with this, but I never found a real technical
motivation for this. Can you list yours?

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> Stefano Bagnara ha scritto:
> > Jukka Zitting ha scritto:
> >> I'd be interested in pursuing an approach where the full message is
> >> never exists outside the repository, i.e. the message is streamed
> >> directly into the repository during an SMTP DATA command (or the
> >> equivalent in other protocols) and only the message key returned by
> >> the repository is used to keep track of the message.
> >
> > MimeMessageSource is there for this precise task.
> >
> > Look at implementations of that interface.
> >
> > You create a MimeMessageWrapper (extends MimeMessage) initializing it
> > with a MimeMessageSource (you will have to create a JCRMimeMessageSource).
> >
> > Stefano
>
> I probably should add that MimeMessageCopyOnWriteProxy and
> MimeMessageWrapper are there to optimize the usage of MimeMessages in
> the JAMES Server processing. The copy on write proxy simply share a
> mimemessagesource between multiple copies unless it is changed and the
> wrapper is able to avoid loading the body if it is not needed.

i've come to the conclusion that MimeMessage really isn't suitable for
use in a protocol like IMAP. IMHO a different server side mail
abstraction is needed.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Stefano Bagnara ha scritto:
> Jukka Zitting ha scritto:
>> I'd be interested in pursuing an approach where the full message is
>> never exists outside the repository, i.e. the message is streamed
>> directly into the repository during an SMTP DATA command (or the
>> equivalent in other protocols) and only the message key returned by
>> the repository is used to keep track of the message.
> 
> MimeMessageSource is there for this precise task.
> 
> Look at implementations of that interface.
> 
> You create a MimeMessageWrapper (extends MimeMessage) initializing it
> with a MimeMessageSource (you will have to create a JCRMimeMessageSource).
> 
> Stefano

I probably should add that MimeMessageCopyOnWriteProxy and
MimeMessageWrapper are there to optimize the usage of MimeMessages in
the JAMES Server processing. The copy on write proxy simply share a
mimemessagesource between multiple copies unless it is changed and the
wrapper is able to avoid loading the body if it is not needed.

Maybe reading the AvalonMailRepository source code will help you
understanding how this is managed for file based mail repositories.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
> I'd be interested in pursuing an approach where the full message is
> never exists outside the repository, i.e. the message is streamed
> directly into the repository during an SMTP DATA command (or the
> equivalent in other protocols) and only the message key returned by
> the repository is used to keep track of the message.

MimeMessageSource is there for this precise task.

Look at implementations of that interface.

You create a MimeMessageWrapper (extends MimeMessage) initializing it
with a MimeMessageSource (you will have to create a JCRMimeMessageSource).

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/15/07, robert burrell donkin <ro...@gmail.com> wrote:

> > I'd be interested in pursuing an approach where the full message is
> > never exists outside the repository, i.e. the message is streamed
> > directly into the repository during an SMTP DATA command (or the
> > equivalent in other protocols) and only the message key returned by
> > the repository is used to keep track of the message.
>
> +1
>

+1 again. If this approach means that the message name is set once
(and generated by jackrabbit rather than James) then I think we can be
comfortable that nothing will break.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> robert burrell donkin ha scritto:
> >> > The MOST IMPORTANT thing at all is that if I store a message and I
> >> later
> >> > retrieve it every single space, every single header, everything is
> >> > exactly as I wrote it. Even if it was malformed.
> >>
> >> Is this a hard requirement? If yes, then I could just model the entire
> >> mime message as a normal nt:resource node, in which case the JCR
> >> repository would act just like an advanced file system with
> >> transactions and some search features.
> >
> > IMAP is *VERY* sensitive about malformed messages: a MIME message
> > *MUST* be well formed. it's all too easy to crash modern IMAP clients
> > with malformed emails.
>
> I know this, but we have to make sure we know how the original mail
> looked like because in SMTP relaying we need this.
> Then we may want to do any operation for IMAP, but this must not be a
> limit of the JCR repository. (IMO).

exploding the representation seems like the way to go

the initial (top level) representation would include very basic audit
data plus the raw mail

later processes could then explode the representation: parsing the raw
data and adding new nodes containing the results. this process could
be controlled with a fine granularity: some processes may just explode
raw headers from the raw blob.

> > one approach would be to take advantage of the typing available in
> > JCRs to help the server understand mail. malformed MIME could
> > gracefully degrade to RFC822 and malformed RFC882 to a general mail
> > type.
>
> I think we should also be sure we don't loose time parsing a message if
> we don't need to parse it. If I use JAMES as a relay only mail server I
> don't want to waste resources by parsing every message.

but IMAP will be too slow if it has to parse the message every time

again, i think exploding will satisfy both needs

<snip>

> >> > To achieve performance we'll probably have to avoid parsing the mime
> >> > structure at all: we don't need this for most SMTP/POP3 operations.
> >> Some
> >> > IMAP operation needs this, but this should probably done on demand and
> >> > not when writing the message to the repository.
> >>
> >> One possible approach, at the expense of storing potentially redundant
> >> duplicate data, is that the original message source is stored as a
> >> verbatim binary stream and the message content is automatically
> >> "exploded" when the first client that actually needs to parse the
> >> message.
> >
> > i really like this idea :-)
> >
> > there is no need to wait until the first client with SEDA: exploding
> > would just another task to execute
> >
> > IMAP is write rarely and read regularly. unless MIME messages are
> > parsed and stored as separate parts, performance will be very poor in
> > normal operation.
>
> Once the message is in an IMAP folder imho there is no problem in
> translating it from rawmessage to structured/parsed message.

with a JCR there should be less need for actual message transfer. once
the message has been entered into permanent storage, there is usually
no need for the node to be routinely copied.

the extra data can be exploded on demand

> The important fact is that in SMTP relay we need to store messages and
> to retrieve them as simple streams: a 1GB message should be streamed
> reading a writing without using more than 1MB memory and without having
> to parse it.

+1

but i think that exploding will work for both cases

> > but again, the key is to be able to access the original, raw data when
> > needed
>
> Is this important also when the message is already in an IMAP repository
> and we don't need to relay it to another SMTP server? Or in this case a
> reconstructed message would suffice?

it's important in the case of malformed mail

one of the major issues i have with IMAP-fetchmail ATM is that if the
messages are malformed then the storage really mangles them and tends
to crash clients. i don't want to lose mail. i also want to be able to
debug cases where the parsing fails.

i think that exploding should satisfy both use cases

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
>> > The MOST IMPORTANT thing at all is that if I store a message and I
>> later
>> > retrieve it every single space, every single header, everything is
>> > exactly as I wrote it. Even if it was malformed.
>>
>> Is this a hard requirement? If yes, then I could just model the entire
>> mime message as a normal nt:resource node, in which case the JCR
>> repository would act just like an advanced file system with
>> transactions and some search features.
> 
> IMAP is *VERY* sensitive about malformed messages: a MIME message
> *MUST* be well formed. it's all too easy to crash modern IMAP clients
> with malformed emails.

I know this, but we have to make sure we know how the original mail
looked like because in SMTP relaying we need this.
Then we may want to do any operation for IMAP, but this must not be a
limit of the JCR repository. (IMO).

> one approach would be to take advantage of the typing available in
> JCRs to help the server understand mail. malformed MIME could
> gracefully degrade to RFC822 and malformed RFC882 to a general mail
> type.

I think we should also be sure we don't loose time parsing a message if
we don't need to parse it. If I use JAMES as a relay only mail server I
don't want to waste resources by parsing every message.

>> Personally I don't see the exact storage requirement as essential, as
>> the mail specs explicitly allow all sorts of intermediate nodes to
>> perform various types of reformattings on messages while in transit.
>> Things should be fine as long as the original intended content is
>> preserved.
> 
> it's important to be able to get the raw as well as the processed

+1

> it's good being able to have smooth access to a parsed set of
> addresses but the raw header also needs to be preserved

+1

>> > To achieve performance we'll probably have to avoid parsing the mime
>> > structure at all: we don't need this for most SMTP/POP3 operations.
>> Some
>> > IMAP operation needs this, but this should probably done on demand and
>> > not when writing the message to the repository.
>>
>> One possible approach, at the expense of storing potentially redundant
>> duplicate data, is that the original message source is stored as a
>> verbatim binary stream and the message content is automatically
>> "exploded" when the first client that actually needs to parse the
>> message.
> 
> i really like this idea :-)
> 
> there is no need to wait until the first client with SEDA: exploding
> would just another task to execute
> 
> IMAP is write rarely and read regularly. unless MIME messages are
> parsed and stored as separate parts, performance will be very poor in
> normal operation.

Once the message is in an IMAP folder imho there is no problem in
translating it from rawmessage to structured/parsed message.

The important fact is that in SMTP relay we need to store messages and
to retrieve them as simple streams: a 1GB message should be streamed
reading a writing without using more than 1MB memory and without having
to parse it.

> but again, the key is to be able to access the original, raw data when
> needed

Is this important also when the message is already in an IMAP repository
and we don't need to relay it to another SMTP server? Or in this case a
reconstructed message would suffice?

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:
> As I suggested in another email I think the best solution is to have
> support of both type of objects in the repository and leave to the
> application the duty to convert from raw to parsed and back if needed.

I would rather make this decision right when the message enters the
system to avoid extra complexity and unnecessary processing steps.

We should probably keep the API generic enough that you can use both
types of mail repositories interoperably if you want, but it would be
up to the administrator to select whether the deployed server is a
relay or an originator (in which case there is no original source),
gateway (for example when modifying messages using mailets), or
delivery (IMAP/webmail, perhaps even POP) system, and to choose the
appropriate repository implementation for the intended purpose.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Danny Angus ha scritto:
> On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:
> 
> cenarios.
>> Otherwise it will fit a very limited usecase and we'll have to skip the
>> "let's use JCR as THE interface" proposed by Noel.
> 
> I'm not convinced about that, we have two interfaces because we have
> two distinct responsibilities and two different use cases (spool and
> mail)
> 
> I think that we really do need to find out what the performance is
> actually like before we start trying to "fix" imaginary performance
> issues for uncommon deployment scenarios.

Well, I can only report you what happened to me.
We also have postage if you want to stress test some scenario.

Again, I don't share your definition of "uncommon deployment scenarios",
but I think this is not important.

IMHO the important thing is to have new stuff: if it is architecturally
wrong or limited to low traffic servers we can anyway start a different
backend (or even a different server at all) when who have the problem
will also have the time to work on it :-)

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:

cenarios.
> Otherwise it will fit a very limited usecase and we'll have to skip the
> "let's use JCR as THE interface" proposed by Noel.


I'm not convinced about that, we have two interfaces because we have
two distinct responsibilities and two different use cases (spool and
mail)

I think that we really do need to find out what the performance is
actually like before we start trying to "fix" imaginary performance
issues for uncommon deployment scenarios.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Danny Angus ha scritto:
> On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:
> 
>> My JAMES Server deployment with bigger throughput does never parse
>> mimemessages.
>>
>> I don't use any mailet using properties from the MimeMessage object, and
>> SMTP+POP3+RemoteDelivery simply works without parsing.
>>
>> Maybe I could not deliver 1 million mails per day if I had mime parsing
>> somewhere...
> 
> Seems to me we need to get some benchmarks on the hit parsing adds,
> I'm not convinced it needs to be so significant.
> 
> d.

I worked on lazy-loading (and no-loading at all) optimizations for
MimeMessageWrapper (and its CoW) mainly because I was not able to reach
that throughput before.

In JAMES Server 2.x specific case, having javamail parsing the messages
in memory I also often hit OutOfMemory errors before that change.

In the case of outgoing mail (mail generated by the server, submitted
via smtp from authorized network/users, forwarded because we are
secondary host or because we simply do relay for some domain) even the
"simple storage" is critical in performance for high-throughput: that's
why I also investigated the use of ActiveMQ's Kaha Persistence engine
for faster (lightweight) operations in case of many small short living
messages.

Btw I don't want to take this discussion too far. I just simply wanted
to tell Jukka that if the JCR is able to support simple streams with low
overhead, with a good usage of streams (never load in memory full
contents) and maybe good support for thousands of slow connections in a
SEDA architecture imho we'll have it much more usable in many scenarios.
Otherwise it will fit a very limited usecase and we'll have to skip the
"let's use JCR as THE interface" proposed by Noel.

Either way, I'm happy to see some new code and new people working on it.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:

> My JAMES Server deployment with bigger throughput does never parse
> mimemessages.
>
> I don't use any mailet using properties from the MimeMessage object, and
> SMTP+POP3+RemoteDelivery simply works without parsing.
>
> Maybe I could not deliver 1 million mails per day if I had mime parsing
> somewhere...

Seems to me we need to get some benchmarks on the hit parsing adds,
I'm not convinced it needs to be so significant.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Danny Angus ha scritto:
> I think I liked the idea that we could create the stream by taking
> streams to each part and sucessively attaching them to the output
> stream. I don't see any benefit from keeping the raw input once we've
> managed to get it into the repository.
> 
> I can see no case where having the raw stream or the raw bytes gives
> us a tangible benefit, apart from one where we choose not to use any
> of JAMES's functionality beyond SMTP in and remote delivery. How
> likely is this?
> 
> d.

My JAMES Server deployment with bigger throughput does never parse
mimemessages.

I don't use any mailet using properties from the MimeMessage object, and
SMTP+POP3+RemoteDelivery simply works without parsing.

Maybe I could not deliver 1 million mails per day if I had mime parsing
somewhere...

Apart my specific usecase (that I think is not so uncommon) I think we
should at least support smtp relaying in an RFC compliant way. I linked
a lot of paragraphs with "MUST NOT" about parsing the content in a
previous message...

Once you decide to alter the message using a mailet it is perfectly ok
to save a structured version, as the message will not be like the
original anymore and it does not make sense to keep the "original".

As I suggested in another email I think the best solution is to have
support of both type of objects in the repository and leave to the
application the duty to convert from raw to parsed and back if needed.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
I think I liked the idea that we could create the stream by taking
streams to each part and sucessively attaching them to the output
stream. I don't see any benefit from keeping the raw input once we've
managed to get it into the repository.

I can see no case where having the raw stream or the raw bytes gives
us a tangible benefit, apart from one where we choose not to use any
of JAMES's functionality beyond SMTP in and remote delivery. How
likely is this?

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Maybe we can do this way:

We can have processor that works on the raw streams and processor that
works on a structured mail.

THe input is a raw stream, and we keep the stream until some processor
require parsed data access. When this happens we convert the raw message
to parsed message.

If a processor require a stream from a parsed message then the stream is
regenerated from the structure.

Maybe we simply have to support both raw-stream and parsed-message
objects in the JCR and then have tools to convert from one to another on
top of this JCR.

If the application wants to keep both representation it could store 2
different objects and manually manage the relationship (and their
synchronization).

WDYT?
Stefano

Jukka Zitting ha scritto:
> Hi,
> 
> On 5/17/07, robert burrell donkin <ro...@gmail.com> wrote:
>> On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
>> > One possible approach, at the expense of storing potentially redundant
>> > duplicate data, is that the original message source is stored as a
>> > verbatim binary stream and the message content is automatically
>> > "exploded" when the first client that actually needs to parse the
>> > message.
>>
>> i really like this idea :-)
> 
> There's one major caveat with this approach: redundant information and
> the performance cost of maintaining that.
> 
> Maintaining updates in the raw message stream is in some (many?) cases
> much more expensive than in a fully parsed representation. Consider
> for example a mailet that wants to modify a subject line or add a
> footer to all messages. Such operations would require that we update
> the original message content as well as the individual header property
> or body part in question. Updating the raw message source can in such
> case easily take an order of magnitude more time than updating the
> parsed representation.
> 
> Note that I believe that it is possible to parse an incoming message
> into a JCR node tree and recreate it back into a byte stream in the
> same O(n) time and O(1) memory as is required to stream the raw
> message source to a traditional spool file.
> 
> Perhaps we should have two modes for the JCR mail repository
> implementation: one for pure relaying and one for more complex
> processing. The former satisfies the relaying requirements of the SMTP
> spec, while the latter is optimized for message transformations and
> complex access patterns like in IMAP or webmail clients.
> 
> BR,
> 
> Jukka Zitting



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
> i didn't mean that intermediary spooling would be the only way but an
> architecture that could support it would be worthwhile. being able to
> use an intermediary spool file enables some designs which would not be
> otherwise possible. for example, splitting the processing between two
> instances. this would allow the email parsing and processing to be
> done as non-root.

FWIW, in current trunk JAMES Server supports commons-daemon to drop
priviledges after ports have been bound.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/17/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 5/17/07, robert burrell donkin <ro...@gmail.com> wrote:
> > On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> > > One possible approach, at the expense of storing potentially redundant
> > > duplicate data, is that the original message source is stored as a
> > > verbatim binary stream and the message content is automatically
> > > "exploded" when the first client that actually needs to parse the
> > > message.
> >
> > i really like this idea :-)
>
> There's one major caveat with this approach: redundant information and
> the performance cost of maintaining that.

yes

> Maintaining updates in the raw message stream is in some (many?) cases
> much more expensive than in a fully parsed representation. Consider
> for example a mailet that wants to modify a subject line or add a
> footer to all messages. Such operations would require that we update
> the original message content as well as the individual header property
> or body part in question. Updating the raw message source can in such
> case easily take an order of magnitude more time than updating the
> parsed representation.

this cost is only required if we choose to update the original

> Note that I believe that it is possible to parse an incoming message
> into a JCR node tree and recreate it back into a byte stream in the
> same O(n) time and O(1) memory as is required to stream the raw
> message source to a traditional spool file.

i suspect that nio -> file will be quicker but let's save this
argument and let the number decide

i didn't mean that intermediary spooling would be the only way but an
architecture that could support it would be worthwhile. being able to
use an intermediary spool file enables some designs which would not be
otherwise possible. for example, splitting the processing between two
instances. this would allow the email parsing and processing to be
done as non-root.

> Perhaps we should have two modes for the JCR mail repository
> implementation: one for pure relaying and one for more complex
> processing. The former satisfies the relaying requirements of the SMTP
> spec, while the latter is optimized for message transformations and
> complex access patterns like in IMAP or webmail clients.

i don't think that two modes are necessary and it would be good if
this could be avoided . there is a danger that JAMES is drifting
towards become just a collection of unrelated protocol implementations
unless the data set is held together.

there is a case for retaining the original raw contents of a mail even
for rich patterns. this would allow better auditing and error
recovery.

exploding would work well when coupled with a changed flag. the
original message would be retained unaltered whatever the processing.
on demand, the original could be parsed and stored in a rich
representation. if the original cannot be parsed then the mail would
be marked.

if the mail is altered then a flag would be marked and the mail would
be reconstructed on demand from the rich representation. if the
message has not been transformed then the raw original can be used
directly.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/17/07, robert burrell donkin <ro...@gmail.com> wrote:
> On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> > One possible approach, at the expense of storing potentially redundant
> > duplicate data, is that the original message source is stored as a
> > verbatim binary stream and the message content is automatically
> > "exploded" when the first client that actually needs to parse the
> > message.
>
> i really like this idea :-)

There's one major caveat with this approach: redundant information and
the performance cost of maintaining that.

Maintaining updates in the raw message stream is in some (many?) cases
much more expensive than in a fully parsed representation. Consider
for example a mailet that wants to modify a subject line or add a
footer to all messages. Such operations would require that we update
the original message content as well as the individual header property
or body part in question. Updating the raw message source can in such
case easily take an order of magnitude more time than updating the
parsed representation.

Note that I believe that it is possible to parse an incoming message
into a JCR node tree and recreate it back into a byte stream in the
same O(n) time and O(1) memory as is required to stream the raw
message source to a traditional spool file.

Perhaps we should have two modes for the JCR mail repository
implementation: one for pure relaying and one for more complex
processing. The former satisfies the relaying requirements of the SMTP
spec, while the latter is optimized for message transformations and
complex access patterns like in IMAP or webmail clients.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> > Jukka Zitting ha scritto:
> > > A message consists of a envelope and the contained message. In JCR
> > > this is represented as the james:mail subclass of the standard nt:file
> > > node type (see http://wiki.apache.org/jackrabbit/nt%3afile):
> > >
> > >    [james:mail] > nt:file
> > >    - james:state (STRING)
> > >    - james:error (STRING)
> > >    - james:sender (STRING)
> > >    - james:recipients (STRING) multiple
> > >    - james:remotehost (STRING)
> > >    - james:remoteaddr (STRING)
> > >    - jamesattr:* (UNDEFINED)
> >
> > If we move to MessageRepository (JCR based) + EnvelopeRepository (JMS
> > based) model then we don't need the state, error, sender, recipients,
> > remotehost, remoteaddr, attributes stuff in the message repository.
>
> OK. Currently I'm just trying to store everything specified by the
> Mail interface, but modifying the content model won't be a problem. In
> fact I placed the envelope information on the nt:file parent node on
> purpose to avoid having them mixed with the message stuff in the
> content node.
>
> > Instead we may need some IMAP stuff in the MessageRepository (for the
> > IMAP stuff you may be interested in this document written by Joachim
> > months ago: http://www.joachim-draeger.de/JamesImap/drafts.html )
>
> I'll give it a look...
>
> > > [..]
> > > Normal mail messages are represented as a tree of MIME entities or
> > > parts. Each entity is individually referenceable (for easy linking and
> > > quick access) and contains associated the mail headers as string
> > > attributes:
> > > [...]
> > > I'm still undecided on how deep I should go in pre-parsing the message
> > > contents. For example should I parse Date headers and store them as
> > > JCR DATE properties to enable efficient date-based queries? Another
> > > complex question is how to best handle encryption and digital
> > > signature mechanisms like S/MIME...
> >
> > I'm not sure at all that the backend should be aware of the
> > content/structure of the message.
>
> I guess that depends on the requirements. If you're only interested in
> having a dumb message store that just passes messages back and forth
> as-is, then not parsing them is a good idea. But if you want to be
> able to efficiently search, manage, and manipulate the messages inside
> the repository, then understanding the content structure makes very
> much sense. A good requirement that I'm trying to achieve is the IMAP
> feature of selectively downloading parts of a multipart message. I
> wouldn't want to have to parse the entire multipart message over and
> over again to serve such client requests.
>
> More generally, I guess the question is whether you see the James mail
> repository as just a transient space where the message resides for a
> while until it is either forwarded via SMTP or retrieved over POP.
> What I'm trying (at least for now) to achieve is a more persistent
> mail storage that is actually used as the *endpoint* of the email
> delivery and accessed in-place through interfaces like IMAP or a
> webmail client. Perhaps there's some reasonable common ground?
>
> > The MOST IMPORTANT thing at all is that if I store a message and I later
> > retrieve it every single space, every single header, everything is
> > exactly as I wrote it. Even if it was malformed.
>
> Is this a hard requirement? If yes, then I could just model the entire
> mime message as a normal nt:resource node, in which case the JCR
> repository would act just like an advanced file system with
> transactions and some search features.

IMAP is *VERY* sensitive about malformed messages: a MIME message
*MUST* be well formed. it's all too easy to crash modern IMAP clients
with malformed emails.

one approach would be to take advantage of the typing available in
JCRs to help the server understand mail. malformed MIME could
gracefully degrade to RFC822 and malformed RFC882 to a general mail
type.

> Personally I don't see the exact storage requirement as essential, as
> the mail specs explicitly allow all sorts of intermediate nodes to
> perform various types of reformattings on messages while in transit.
> Things should be fine as long as the original intended content is
> preserved.

it's important to be able to get the raw as well as the processed

it's good being able to have smooth access to a parsed set of
addresses but the raw header also needs to be preserved

> > To achieve performance we'll probably have to avoid parsing the mime
> > structure at all: we don't need this for most SMTP/POP3 operations. Some
> > IMAP operation needs this, but this should probably done on demand and
> > not when writing the message to the repository.
>
> One possible approach, at the expense of storing potentially redundant
> duplicate data, is that the original message source is stored as a
> verbatim binary stream and the message content is automatically
> "exploded" when the first client that actually needs to parse the
> message.

i really like this idea :-)

there is no need to wait until the first client with SEDA: exploding
would just another task to execute

IMAP is write rarely and read regularly. unless MIME messages are
parsed and stored as separate parts, performance will be very poor in
normal operation.

but again, the key is to be able to access the original, raw data when needed

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/16/07, robert burrell donkin <ro...@gmail.com> wrote:

> hold on a minute - i think we might be being just a little too hasty!
>
> noel, peter, danny and myself were talking through some advanced SEDA
> based architectures that will cluster well. jukka's exploding mail
> idea fits very well into this.

+1.


> designing a complete mail (not just email) storage solution around
> relaying IMHO doesn't really make much sense. however, it is a very
> useful extreme border case.

+1

> there are various ways that various types of mail can enter the
> system. the consensus was that it's best for the least possible amount
> of processing to be done initially (perhaps even just writing a
> temporary file and then writing the data to the permanent store using
> another thread)

+1

> later stages of processing could then explode the mail on demand.

I think I'd rather consider that the mail is exploded into JCR the
first time it is actually needed, and then never again. No copies only
references.
However you need to be able clone a message when the route splits
because one route could include transformations which won't apply to
the other route.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/17/07, robert burrell donkin <ro...@gmail.com> wrote:

> perhaps danny would like to explain...

Oh right, well its about blending channels of communication.
Email is only one form of inbound/outbound communication between a
person and an organisation. An efficient customer-focused organisation
will want to manage all communications from the point of view of the
customer relationship and not the technology.

By working with JCR we have an opportunity to take the content of an
email and expose it as a "communication" in a way that can be blended
with telephony and scanned letters as well as SMS and IM by products
which are capable of performing that blending.

By keeping one copy of the communication (which is what robert meant
by "mail") and using meta-data to "route" the email we can send that
communication into skills based work queues, and customer accounts etc
by refrence to its location in the repository and not "by value" by
creating copies in other storage systems.

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/17/07, Stefano Bagnara <ap...@bago.org> wrote:
> Hi Robert, I noticed you often put emphasis on "mail" vs "email".
> Can you tell us what exactly you define as mail and email? (i want to be
> sure I'm not misunderstanding

perhaps danny would like to explain...

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Hi Robert, I noticed you often put emphasis on "mail" vs "email".
Can you tell us what exactly you define as mail and email? (i want to be
sure I'm not misunderstanding you)

Stefano

robert burrell donkin ha scritto:
>> > the first stage storage should just store the raw mail and basic
>> > meta-data about that mail. this should include some audit trail
>> > information about the original of the mail. (note mail here, not
>> > email.)
>>
>> +1, so we need at least to be able to process raw main as the first
>> step, like I was telling to Jukka.
> 
> i think so
> 
> we looked at adding more SEDA stages to the initial input pipeline
> 
> here's an example main line:
> 
> 1 protocol handling and temporary mail aggregation
> 2 permanent storage of raw data plus some protocol (not mail) meta-data
> 3 spool processing by main line spool mailets
> 
> probably good to do 2 using mailets this would allow unclustered
> relays which do not wish to store their mail permanently to operate in
> processing phase 2



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> robert burrell donkin ha scritto:
> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> >> Jukka Zitting wrote:
> >> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> >> >> I think SMTP relaying is a common use case for JAMES Server and one of
> >> >> the main goals so we should keep this in high consideration.
> >> >
> >> > ACK. I'll go for a single binary stream per message for now. We can
> >> > revisit that decision later if we want to look at better supporting
> >> > IMAP and webmail scenarios.
> >>
> >> ok!
> >
> > hold on a minute - i think we might be being just a little too hasty!
> >
> > noel, peter, danny and myself were talking through some advanced SEDA
> > based architectures that will cluster well. jukka's exploding mail
> > idea fits very well into this.
>
> IMHO the basic version MUST support a simple unparsed stream. Everything
> else should be done as an evolution.
>
> Btw, who is Peter?

royal

> > designing a complete mail (not just email) storage solution around
> > relaying IMHO doesn't really make much sense. however, it is a very
> > useful extreme border case.
>
> "border case"? I think many billions of mails are relayed every day on
> the internet ;-)
> Maybe this is not the best use case for JAMES Server, but this also
> happened because we don't correctly support large message
> streaming/relaying and big volume of messages.

border case in the sense of most extreme

one test for a good design is being able to handle border cases smoothly

> > there are various ways that various types of mail can enter the
> > system. the consensus was that it's best for the least possible amount
> > of processing to be done initially (perhaps even just writing a
> > temporary file and then writing the data to the permanent store using
> > another thread)
> >
> > the first stage storage should just store the raw mail and basic
> > meta-data about that mail. this should include some audit trail
> > information about the original of the mail. (note mail here, not
> > email.)
>
> +1, so we need at least to be able to process raw main as the first
> step, like I was telling to Jukka.

i think so

we looked at adding more SEDA stages to the initial input pipeline

here's an example main line:

1 protocol handling and temporary mail aggregation
2 permanent storage of raw data plus some protocol (not mail) meta-data
3 spool processing by main line spool mailets

probably good to do 2 using mailets this would allow unclustered
relays which do not wish to store their mail permanently to operate in
processing phase 2

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
robert burrell donkin ha scritto:
> On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
>> Jukka Zitting wrote:
>> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
>> >> I think SMTP relaying is a common use case for JAMES Server and one of
>> >> the main goals so we should keep this in high consideration.
>> >
>> > ACK. I'll go for a single binary stream per message for now. We can
>> > revisit that decision later if we want to look at better supporting
>> > IMAP and webmail scenarios.
>>
>> ok!
> 
> hold on a minute - i think we might be being just a little too hasty!
> 
> noel, peter, danny and myself were talking through some advanced SEDA
> based architectures that will cluster well. jukka's exploding mail
> idea fits very well into this.

IMHO the basic version MUST support a simple unparsed stream. Everything
else should be done as an evolution.

Btw, who is Peter?

> designing a complete mail (not just email) storage solution around
> relaying IMHO doesn't really make much sense. however, it is a very
> useful extreme border case.

"border case"? I think many billions of mails are relayed every day on
the internet ;-)
Maybe this is not the best use case for JAMES Server, but this also
happened because we don't correctly support large message
streaming/relaying and big volume of messages.

> there are various ways that various types of mail can enter the
> system. the consensus was that it's best for the least possible amount
> of processing to be done initially (perhaps even just writing a
> temporary file and then writing the data to the permanent store using
> another thread)
> 
> the first stage storage should just store the raw mail and basic
> meta-data about that mail. this should include some audit trail
> information about the original of the mail. (note mail here, not
> email.)

+1, so we need at least to be able to process raw main as the first
step, like I was telling to Jukka.

> later stages of processing could then explode the mail on demand. this
> would mean parsing the data and attaching the contents. the original
> data would still be available.
> 
> - robert

Maybe we don't even need to keep the duplicate copy, but I don't care
too much about this part. I will concentrate on requirements for
stmp/pop3 and for smtp filtering and smtp relay. I know you are there
for the imap checklist!

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> Jukka Zitting wrote:
> > On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> >> I think SMTP relaying is a common use case for JAMES Server and one of
> >> the main goals so we should keep this in high consideration.
> >
> > ACK. I'll go for a single binary stream per message for now. We can
> > revisit that decision later if we want to look at better supporting
> > IMAP and webmail scenarios.
>
> ok!

hold on a minute - i think we might be being just a little too hasty!

noel, peter, danny and myself were talking through some advanced SEDA
based architectures that will cluster well. jukka's exploding mail
idea fits very well into this.

> >> > Personally I don't see the exact storage requirement as essential, as
> >> > the mail specs explicitly allow all sorts of intermediate nodes to
> >> > perform various types of reformattings on messages while in transit.
> >> > Things should be fine as long as the original intended content is
> >> > preserved.
> >>
> >> Are you sure? I don't agree on this.
> >
> > See the notes on email gateways in the SMTP specs.
>
> Please note that a gateway is a very special use case: smtp spec define
> a gateway an SMTP server that is a brindge between 2 different transport
> layers.
>
> SMTP relaying was the scenario I referred and it is not related to
> gateways. Here are some interesting quotes from rfc2821:
>
>    --
>    In general, a relay SMTP SHOULD assume that
>    the message content it has received is valid and, assuming that the
>    envelope permits doing so, relay it without inspecting that content.
>
>    --
>    As discussed in section 2.4.1, a relay SMTP has no need to inspect or
>    act upon the headers or body of the message data and MUST NOT do so
>    except to add its own "Received:" header (section 4.4) and,
>    optionally, to attempt to detect looping in the mail system (see
>    section 6.2).
>
>    --
>    The following changes to a message being processed MAY be applied
>    when necessary by an originating SMTP server, or one used as the
>    target of SMTP as an initial posting protocol:
>    -  Addition of a message-id field when none appears
>    -  Addition of a date, time or time zone when none appears
>    -  Correction of addresses to proper FQDN format
>    The less information the server has about the client, the less likely
>    these changes are to be correct and the more caution and conservatism
>    should be applied when considering whether or not to perform fixes
>    and how.  These changes MUST NOT be applied by an SMTP server that
>    provides an intermediate relay function.
>
>    --
>    An Internet mail program MUST NOT change a Received: line that was
>    previously added to the message header.  SMTP servers MUST prepend
>    Received lines to messages; they MUST NOT change the order of
>    existing lines or insert Received lines in any other location.

designing a complete mail (not just email) storage solution around
relaying IMHO doesn't really make much sense. however, it is a very
useful extreme border case.

there are various ways that various types of mail can enter the
system. the consensus was that it's best for the least possible amount
of processing to be done initially (perhaps even just writing a
temporary file and then writing the data to the permanent store using
another thread)

the first stage storage should just store the raw mail and basic
meta-data about that mail. this should include some audit trail
information about the original of the mail. (note mail here, not
email.)

later stages of processing could then explode the mail on demand. this
would mean parsing the data and attaching the contents. the original
data would still be available.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting wrote:
> On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
>> I think SMTP relaying is a common use case for JAMES Server and one of
>> the main goals so we should keep this in high consideration.
> 
> ACK. I'll go for a single binary stream per message for now. We can
> revisit that decision later if we want to look at better supporting
> IMAP and webmail scenarios.

ok!

>> > Personally I don't see the exact storage requirement as essential, as
>> > the mail specs explicitly allow all sorts of intermediate nodes to
>> > perform various types of reformattings on messages while in transit.
>> > Things should be fine as long as the original intended content is
>> > preserved.
>>
>> Are you sure? I don't agree on this.
> 
> See the notes on email gateways in the SMTP specs.

Please note that a gateway is a very special use case: smtp spec define
a gateway an SMTP server that is a brindge between 2 different transport
layers.

SMTP relaying was the scenario I referred and it is not related to
gateways. Here are some interesting quotes from rfc2821:

   --
   In general, a relay SMTP SHOULD assume that
   the message content it has received is valid and, assuming that the
   envelope permits doing so, relay it without inspecting that content.

   --
   As discussed in section 2.4.1, a relay SMTP has no need to inspect or
   act upon the headers or body of the message data and MUST NOT do so
   except to add its own "Received:" header (section 4.4) and,
   optionally, to attempt to detect looping in the mail system (see
   section 6.2).

   --
   The following changes to a message being processed MAY be applied
   when necessary by an originating SMTP server, or one used as the
   target of SMTP as an initial posting protocol:
   -  Addition of a message-id field when none appears
   -  Addition of a date, time or time zone when none appears
   -  Correction of addresses to proper FQDN format
   The less information the server has about the client, the less likely
   these changes are to be correct and the more caution and conservatism
   should be applied when considering whether or not to perform fixes
   and how.  These changes MUST NOT be applied by an SMTP server that
   provides an intermediate relay function.

   --
   An Internet mail program MUST NOT change a Received: line that was
   previously added to the message header.  SMTP servers MUST prepend
   Received lines to messages; they MUST NOT change the order of
   existing lines or insert Received lines in any other location.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> Jukka Zitting ha scritto:
> >> The MOST IMPORTANT thing at all is that if I store a message and I later
> >> retrieve it every single space, every single header, everything is
> >> exactly as I wrote it. Even if it was malformed.
> >
> > Is this a hard requirement? If yes, then I could just model the entire
> > mime message as a normal nt:resource node, in which case the JCR
> > repository would act just like an advanced file system with
> > transactions and some search features.
>
> As JAMES Server can be used as a relay server, the SMTP specification
> tell us to not alter the messages (even if they are malformed). The only
> thing we're allowed to to is *prepending* a "Received" header, and
> converting from 8bit to 7bit if we support 8BITMIME.
>
> I think SMTP relaying is a common use case for JAMES Server and one of
> the main goals so we should keep this in high consideration.

ACK. I'll go for a single binary stream per message for now. We can
revisit that decision later if we want to look at better supporting
IMAP and webmail scenarios.

> > Personally I don't see the exact storage requirement as essential, as
> > the mail specs explicitly allow all sorts of intermediate nodes to
> > perform various types of reformattings on messages while in transit.
> > Things should be fine as long as the original intended content is
> > preserved.
>
> Are you sure? I don't agree on this.

See the notes on email gateways in the SMTP specs.

> Instead you break some signature mechanism if you do that. My interpretation
> of the SMTP spec is not that we can do what we want.

The possibility of such intermediate transformation is the very reason
why for example S/MIME defines the canonicalization rules to avoid
problems with a potentially mangled message format. I'm not too
familiar how well this works in practice, i.e. is the only safe
alternative really to preserve the exact message source.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
>> The MOST IMPORTANT thing at all is that if I store a message and I later
>> retrieve it every single space, every single header, everything is
>> exactly as I wrote it. Even if it was malformed.
> 
> Is this a hard requirement? If yes, then I could just model the entire
> mime message as a normal nt:resource node, in which case the JCR
> repository would act just like an advanced file system with
> transactions and some search features.

As JAMES Server can be used as a relay server, the SMTP specification
tell us to not alter the messages (even if they are malformed). The only
thing we're allowed to to is *prepending* a "Received" header, and
converting from 8bit to 7bit if we support 8BITMIME.

I think SMTP relaying is a common use case for JAMES Server and one of
the main goals so we should keep this in high consideration.

> Personally I don't see the exact storage requirement as essential, as
> the mail specs explicitly allow all sorts of intermediate nodes to
> perform various types of reformattings on messages while in transit.
> Things should be fine as long as the original intended content is
> preserved.

Are you sure? I don't agree on this. Instead you break some signature
mechanism if you do that. My interpretation of the SMTP spec is not that
we can do what we want. Maybe other JAMES committers can share their
experience and knowledge on this.

>> To achieve performance we'll probably have to avoid parsing the mime
>> structure at all: we don't need this for most SMTP/POP3 operations. Some
>> IMAP operation needs this, but this should probably done on demand and
>> not when writing the message to the repository.
> 
> One possible approach, at the expense of storing potentially redundant
> duplicate data, is that the original message source is stored as a
> verbatim binary stream and the message content is automatically
> "exploded" when the first client that actually needs to parse the
> message.

This is what we do now (MimeMessageWrapper does lazy parsing of
headers/body separately), but we never store the "parsed" version then.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/16/07, Stefano Bagnara <ap...@bago.org> wrote:
> Jukka Zitting ha scritto:
> > A message consists of a envelope and the contained message. In JCR
> > this is represented as the james:mail subclass of the standard nt:file
> > node type (see http://wiki.apache.org/jackrabbit/nt%3afile):
> >
> >    [james:mail] > nt:file
> >    - james:state (STRING)
> >    - james:error (STRING)
> >    - james:sender (STRING)
> >    - james:recipients (STRING) multiple
> >    - james:remotehost (STRING)
> >    - james:remoteaddr (STRING)
> >    - jamesattr:* (UNDEFINED)
>
> If we move to MessageRepository (JCR based) + EnvelopeRepository (JMS
> based) model then we don't need the state, error, sender, recipients,
> remotehost, remoteaddr, attributes stuff in the message repository.

OK. Currently I'm just trying to store everything specified by the
Mail interface, but modifying the content model won't be a problem. In
fact I placed the envelope information on the nt:file parent node on
purpose to avoid having them mixed with the message stuff in the
content node.

> Instead we may need some IMAP stuff in the MessageRepository (for the
> IMAP stuff you may be interested in this document written by Joachim
> months ago: http://www.joachim-draeger.de/JamesImap/drafts.html )

I'll give it a look...

> > [..]
> > Normal mail messages are represented as a tree of MIME entities or
> > parts. Each entity is individually referenceable (for easy linking and
> > quick access) and contains associated the mail headers as string
> > attributes:
> > [...]
> > I'm still undecided on how deep I should go in pre-parsing the message
> > contents. For example should I parse Date headers and store them as
> > JCR DATE properties to enable efficient date-based queries? Another
> > complex question is how to best handle encryption and digital
> > signature mechanisms like S/MIME...
>
> I'm not sure at all that the backend should be aware of the
> content/structure of the message.

I guess that depends on the requirements. If you're only interested in
having a dumb message store that just passes messages back and forth
as-is, then not parsing them is a good idea. But if you want to be
able to efficiently search, manage, and manipulate the messages inside
the repository, then understanding the content structure makes very
much sense. A good requirement that I'm trying to achieve is the IMAP
feature of selectively downloading parts of a multipart message. I
wouldn't want to have to parse the entire multipart message over and
over again to serve such client requests.

More generally, I guess the question is whether you see the James mail
repository as just a transient space where the message resides for a
while until it is either forwarded via SMTP or retrieved over POP.
What I'm trying (at least for now) to achieve is a more persistent
mail storage that is actually used as the *endpoint* of the email
delivery and accessed in-place through interfaces like IMAP or a
webmail client. Perhaps there's some reasonable common ground?

> The MOST IMPORTANT thing at all is that if I store a message and I later
> retrieve it every single space, every single header, everything is
> exactly as I wrote it. Even if it was malformed.

Is this a hard requirement? If yes, then I could just model the entire
mime message as a normal nt:resource node, in which case the JCR
repository would act just like an advanced file system with
transactions and some search features.

Personally I don't see the exact storage requirement as essential, as
the mail specs explicitly allow all sorts of intermediate nodes to
perform various types of reformattings on messages while in transit.
Things should be fine as long as the original intended content is
preserved.

> To achieve performance we'll probably have to avoid parsing the mime
> structure at all: we don't need this for most SMTP/POP3 operations. Some
> IMAP operation needs this, but this should probably done on demand and
> not when writing the message to the repository.

One possible approach, at the expense of storing potentially redundant
duplicate data, is that the original message source is stored as a
verbatim binary stream and the message content is automatically
"exploded" when the first client that actually needs to parse the
message.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
> A message consists of a envelope and the contained message. In JCR
> this is represented as the james:mail subclass of the standard nt:file
> node type (see http://wiki.apache.org/jackrabbit/nt%3afile):
> 
>    [james:mail] > nt:file
>    - james:state (STRING)
>    - james:error (STRING)
>    - james:sender (STRING)
>    - james:recipients (STRING) multiple
>    - james:remotehost (STRING)
>    - james:remoteaddr (STRING)
>    - jamesattr:* (UNDEFINED)

If we move to MessageRepository (JCR based) + EnvelopeRepository (JMS
based) model then we don't need the state, error, sender, recipients,
remotehost, remoteaddr, attributes stuff in the message repository.
Instead we may need some IMAP stuff in the MessageRepository (for the
IMAP stuff you may be interested in this document written by Joachim
months ago: http://www.joachim-draeger.de/JamesImap/drafts.html )

> [..]
> Normal mail messages are represented as a tree of MIME entities or
> parts. Each entity is individually referenceable (for easy linking and
> quick access) and contains associated the mail headers as string
> attributes:
> [...]
> I'm still undecided on how deep I should go in pre-parsing the message
> contents. For example should I parse Date headers and store them as
> JCR DATE properties to enable efficient date-based queries? Another
> complex question is how to best handle encryption and digital
> signature mechanisms like S/MIME...

I'm not sure at all that the backend should be aware of the
content/structure of the message.

The MOST IMPORTANT thing at all is that if I store a message and I later
retrieve it every single space, every single header, everything is
exactly as I wrote it. Even if it was malformed.

To achieve performance we'll probably have to avoid parsing the mime
structure at all: we don't need this for most SMTP/POP3 operations. Some
IMAP operation needs this, but this should probably done on demand and
not when writing the message to the repository.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/15/07, robert burrell donkin <ro...@gmail.com> wrote:
> how do you plan to model MIME messages?

I'm still looking at various alternative models, but the basic idea is
that the message is stored in as parsed format as possible. My current
thinking is:

A message consists of a envelope and the contained message. In JCR
this is represented as the james:mail subclass of the standard nt:file
node type (see http://wiki.apache.org/jackrabbit/nt%3afile):

    [james:mail] > nt:file
    - james:state (STRING)
    - james:error (STRING)
    - james:sender (STRING)
    - james:recipients (STRING) multiple
    - james:remotehost (STRING)
    - james:remoteaddr (STRING)
    - jamesattr:* (UNDEFINED)

See http://jackrabbit.apache.org/doc/nodetype/cnd.html for backgound
on the node type notation I'm using. The nt:file type defines a single
child node, called "jcr:content" that can contain anything, not just
the traditional binary stream. James would typically use one of the
following mime node types for the jcr:content node, but a nice feature
of this extensible content model is that we're not restricted to just
mail messages and could in fact use the same spool management code to
handle any kinds of message payloads like files, serialized java
objects, or even entire subtrees of content.

Normal mail messages are represented as a tree of MIME entities or
parts. Each entity is individually referenceable (for easy linking and
quick access) and contains associated the mail headers as string
attributes:

    [mail:part] > mix:referenceable
    - mail:* (STRING) multiple

Non-multipart entities are stored as mime:bodypart nodes that extend
the standard nt:resource node type (see
http://wiki.apache.org/jackrabbit/nt%3aresource) that defines
properties for the binary content and the associated content type:

    [mail:bodypart] > mail:part, nt:resource

Multipart entities are stored as mime:multipart nodes that extend the
standard nt:folder node type (see
http://wiki.apache.org/jackrabbit/nt%3afolder):

    [mail:multipart] > mail:part, nt:folder

Using the nt:folder type allows multipart messages to be easily
managed using standard JCR tools like the WebDAV interface included in
Jackrabbit.

I'm still undecided on how deep I should go in pre-parsing the message
contents. For example should I parse Date headers and store them as
JCR DATE properties to enable efficient date-based queries? Another
complex question is how to best handle encryption and digital
signature mechanisms like S/MIME...

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/15/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> > You probably noticed that we currenlty have SpoolRepository and
> > MailReposutory and that SpoolRepository is a MailRepository with some
> > added method (every MailRepository could expose SpoolRepository services
> > via a simple wrapper).
> >
> > We discussed many times about changing this so to have a
> > MessageRepository that should be similar to the current MailRepository
> > but only include mail.message (MimeMessage) informationd and an
> > EnvelopeRepository (to replace SpoolRepository) that should only store
> > the envelope (everything else you find in the Mail object but the
> > MimeMessage) and reference a Message from a MessageRepository.
> > [...]
> > some related link, if you are interested:
>
> Thanks, very interesting...
>
> I'd be interested in pursuing an approach where the full message is
> never exists outside the repository, i.e. the message is streamed
> directly into the repository during an SMTP DATA command (or the
> equivalent in other protocols) and only the message key returned by
> the repository is used to keep track of the message.

+1

how do you plan to model MIME messages?

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> You probably noticed that we currenlty have SpoolRepository and
> MailReposutory and that SpoolRepository is a MailRepository with some
> added method (every MailRepository could expose SpoolRepository services
> via a simple wrapper).
>
> We discussed many times about changing this so to have a
> MessageRepository that should be similar to the current MailRepository
> but only include mail.message (MimeMessage) informationd and an
> EnvelopeRepository (to replace SpoolRepository) that should only store
> the envelope (everything else you find in the Mail object but the
> MimeMessage) and reference a Message from a MessageRepository.
> [...]
> some related link, if you are interested:

Thanks, very interesting...

I'd be interested in pursuing an approach where the full message is
never exists outside the repository, i.e. the message is streamed
directly into the repository during an SMTP DATA command (or the
equivalent in other protocols) and only the message key returned by
the repository is used to keep track of the message.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Danny Angus <da...@apache.org>.
On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> (Maybe
> JCR better fits the MessageRepository and JMS better fits the
> EnvelopeRepository for their respective persistence natures)

Well done Stefano, in one line you've neatly summed up a week of
arguing at Apachecon!

d.

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
> OK. I guess the attributes can be any serializable objects, so the
> implementation should use standard Java serialization in case it
> doesn't know the type of the attribute.

Yes, Java serialization is the way to go.
"public Serializable setAttribute(String key, Serializable object)"

> Is the repository implementation required to keep the original key
> (from mail.getName()) when stroring a new message, or can it replace
> it with an internal identifier?

It is not documented, so to be safe you should keep the name.

On the other hand I'm almost sure (99%) our current codebase never keeps
"references" to the name.

The main problem with "repository generated keys" is that the current
interface does not give you a way to know what key has been assigned to
a stored message.

Just make sure that if I call store for a mail with an existing key you
update the same mail and do not create a new one.

>> Maybe some of this things should be changed, I simply replied with what
>> the current code does and expects.
> 
> ACK. Thanks a lot for the help!

You probably noticed that we currenlty have SpoolRepository and
MailReposutory and that SpoolRepository is a MailRepository with some
added method (every MailRepository could expose SpoolRepository services
via a simple wrapper).

We discussed many times about changing this so to have a
MessageRepository that should be similar to the current MailRepository
but only include mail.message (MimeMessage) informationd and an
EnvelopeRepository (to replace SpoolRepository) that should only store
the envelope (everything else you find in the Mail object but the
MimeMessage) and reference a Message from a MessageRepository. (Maybe
JCR better fits the MessageRepository and JMS better fits the
EnvelopeRepository for their respective persistence natures)

We never did that because this would break backward compatibility and
current trunk is still storage+config.xml compatible with 2.3.0/2.3.1
releases.

Imho backward compatibility is not a big issue: we should just take care
to tag the last trunk version before "breaking" it.

Thank you too,
Stefano

some related link, if you are interested:
http://issues.apache.org/jira/browse/JAMES-521
http://www.mail-archive.com/server-dev@james.apache.org/msg04986.html
http://www.mail-archive.com/server-dev@james.apache.org/msg07251.html
http://www.mail-archive.com/server-dev@james.apache.org/msg07261.html
http://www.mail-archive.com/server-dev@james.apache.org/msg07283.html


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/15/07, Stefano Bagnara <ap...@bago.org> wrote:
> Jukka Zitting ha scritto:
> > c) What parts of a mail message need to be stored when
> > MailRepository.store(Mail) is called? For example should the mail name
> > (Mail.getName()) or attributes (Mail.hasAttributes(), etc.) be stored?
>
> the name is the key used to retrieve it later.
> attributes must be persisted. Every property you see in the Mail object
> is persisted (name, sender, recipients, remoteaddr, remotehost,
> errormessage, message, state, lastupdated).

OK. I guess the attributes can be any serializable objects, so the
implementation should use standard Java serialization in case it
doesn't know the type of the attribute.

> > d) Should mailRepository.store(mailRepository.retrieve(...)) update
> > the existing message or create a new one?
>
> Update the existing message.
> Store using an existing key is an update. Store using a new key is an
> insert.

Is the repository implementation required to keep the original key
(from mail.getName()) when stroring a new message, or can it replace
it with an internal identifier?

> Maybe some of this things should be changed, I simply replied with what
> the current code does and expects.

ACK. Thanks a lot for the help!

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Questions on the Mail and MailRepository interfaces

Posted by Stefano Bagnara <ap...@bago.org>.
Jukka Zitting ha scritto:
> Hi,
> 
> I'm working on the JCR mail repository for James, and I'd like to
> better understand the Mail and MailRepository interfaces.
> 
> a) Should implementations of the MailRepository class be thread-safe?

YES

> b) Should the Avalon lifecycle interfaces be used when implementing
> MailRepository?

If you need a lifecycle yes.
If you don't have resources/dependencies/configuration/initialization
you can skip this.
My suggestion is to place the core non-avalon code in the main class and
then write an avalonized wrapper to that class (or an extended version
implementing avalon interfaces).

> c) What parts of a mail message need to be stored when
> MailRepository.store(Mail) is called? For example should the mail name
> (Mail.getName()) or attributes (Mail.hasAttributes(), etc.) be stored?

the name is the key used to retrieve it later.
attributes must be persisted. Every property you see in the Mail object
is persisted (name, sender, recipients, remoteaddr, remotehost,
errormessage, message, state, lastupdated).

> d) Should mailRepository.store(mailRepository.retrieve(...)) update
> the existing message or create a new one?

Update the existing message.
Store using an existing key is an update. Store using a new key is an
insert.

> e) Should mailRepository.retrieve(...).setMessage(...) update the mail
> repository?

no. You have to call store if you want to update the repository.

> f) Should mailRepository.retrieve(...).getMessage().setFrom(...)
> update the mail repository?

same as "e)"

> g) Should mail.setRecipients(...) affect
> mail.getMessage().getAllRecipients()?

No. mail recipients are the envelope recipients and are not related to
the mimemessage header recipients. A repository implementation should
completely ignore the "message" content and behaviour.

> h) Should mail.getMessage().addRecipients(...) affect mail.getRecipients()?

same as "g)".

> I'll probably come up with more questions, but I guess these are a
> good starting point. :-)

You're welcome :-)

> BR,
> 
> Jukka Zitting

Maybe some of this things should be changed, I simply replied with what
the current code does and expects.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org