You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Jean Helou <je...@gmail.com> on 2022/06/10 08:22:17 UTC

MailstoreApi (was Re: ElasticSearch upgrade to 8.2)

I fork the thread to respond on the MailRepository part :D


> fun quiz can you tell looking
> > only at the documentation and code comments the difference between :
> > MailRepositoryStore, MailRepositoryUrlStore and MailRepository all of
> which
> > are in mailrepository-api  )
> Game accepted.
>   - Mail repository is storage for email, with their processing context
> (long term storage, differs from mail queue which is a flow).
>   - Mail repository are identified by their URL
>   - Mail repository can be created through the use of mail repository
> store by supplying an URL
>   - MailRepositoryUrlStore is an implementation detail of
> MailRepositoryStore, and brings persistance to mail repositories (that
> are created through webadmin, configuration changes etc..)
>

Almost :D
MailRepositoryStore only has 2 implementations :  in memory and a spring
based

If I understand it correctly the MailRepositoryStore is actually a
computing cache.
Roughly  equivalent to Map<MailRepositoryUrl, MailRepository> with a
generic factory method to create the MailRepository when it is not already
in the cache.
The factory method relies on a statically injected config that maps
"protocols" to the FQDN of the corresponding implementation. When the
"store" resolves a MailRepository through its MailRepositoryUrl, it
retrieves the FQDN from the protocol part of the url then delegates to
spring, guice or a static map to actually get the corresponding
implementation.
The naming Store and InMemory got me mightily confused when I tried to sort
this out to inject the blob mail repository store.

MailRepositoryUrls start with a protocol such as cassandra:// blob://
file://, then a repository "id"

The MailRepositoryUrlStore has 3 implementations : cassandra, jpa and
inmemory. I am not yet clear on what having a persistent store brings over
using the in memory store.

Now to have a MailRepositoryStore not based on Cassandra, the memory
> implementation is good enough if manual creation of mail repository is
> forbidden (akka through webadmin) and if configuration is homogeneous in
> the James cluster.
>

Even if you were to create a mail repository manually, I don't understand
how anything would be stored in it if it is not mentioned in the server's
configuration (mailet container's config most likely).
Even if there is a way to dynamically make james store mails in a mail
repository that is not mentioned in the configuration, the in memory
implementation will still register it when it is used.  I guess that only
leaves discoverability of existing MailRepositoryUrls across restarts when
an Url is not used much. That leaves me wondering what the actual use case
is ...


> Ideally MailRepositoryUrlStore should not have had been in the API.
>

Interesting, according to git history it was introduced by
https://issues.apache.org/jira/browse/JAMES-2418 but that only says that
the last point I mention above ( the discoverability part)  is needed but
not why it is needed :D

cheers
jean


Cheers

Re: MailstoreApi (was Re: ElasticSearch upgrade to 8.2)

Posted by Jean Helou <je...@gmail.com>.
Thanks a lot for the additional context, benoit

I have added that as a comment on the issue to make it more readily
discoverable. When we reach cleanup phase, I'll try to add some
documentation to the interfaces in the code.

Cheers
jean

> One case is "it was mentioned in the server configuration, and no longer
> is".
>

> Without such persistence you could not, for instance, reprocess mail
> repositories that you had been using.
>
> --------------
>
> An other case is "parametric mail repository" ie cassandra://var/mail/
> customera.com/rejected
>
> One such exemple is Data Leak Prevention cf
> https://github.com/apache/james-project/tree/master/server/mailet/mailets/src/main/java/org/apache/james/transport/matchers/dlp
>
> And his friend
> https://github.com/apache/james-project/blob/master/server/mailet/mailets/src/main/java/org/apache/james/transport/mailets/ToSenderDomainRepository.java
>
> I might want to access a mail repository that exist, contains stuff, but
> is not provisionned localy because the James server I am using did not yet
> reject an email for this domain since it had been started.
>
> --------------
>
> Another thing is the difference between mailrepository URL / path (which I
> am not a fan of)
>
> The idea was not to leak through webadmin the underlying storage structure
>
> URL: cassandra://var/mail/error
> PATH: var/mail/error
>
> Then you need to do translation between the path and the URL, which is not
> trivial in face of several underlying storage technologies (jdbc + file for
> example)
>
> Even if there is a way to dynamically make james store mails in a mail
> repository that is not mentioned in the configuration, the in memory
> implementation will still register it when it is used.  I guess that only
> leaves discoverability of existing MailRepositoryUrls across restarts when
> an Url is not used much. That leaves me wondering what the actual use case
> is ...
>
> Well the one time I had to deal with mail repository with a customer,
> listing them was handy.
>
> That being said, I also share the feeling that "listing URLs in use"
> through MailRepositoryUrlStore might be overkill.
>
> Instead we could rely on each MailRepository implementation to list the
> URLs it do actually contain, thus drop MailRepositoryUrlStore alltogether,
> make it an implementation detail.
>
> We would get :
>   - MailRepositoryUrlSupplier interface with an implementation for each
> MailRepository implementation.
>   - Implementations can base decisions on their underlying storage thus
> removing the needs for additional metadata.
>
> I would support such a refactoring. One less Cassandra table makes me
> happy ;-)
>
>
>
>> Ideally MailRepositoryUrlStore should not have had been in the API.
>>
>
> Interesting, according to git history it was introduced by
> https://issues.apache.org/jira/browse/JAMES-2418 but that only says that
> the last point I mention above ( the discoverability part)  is needed but
> not why it is needed :D
>
> I hope I did get better at writting issues since then :-P
>
>
> cheers
> jean
>
>
> Cheers
>
>

Re: MailstoreApi (was Re: ElasticSearch upgrade to 8.2)

Posted by Jean Helou <je...@gmail.com>.
On Fri, Jun 10, 2022 at 11:28 AM Benoit TELLIER <bt...@apache.org> wrote:

> By the way, https://github.com/apache/james-project/pull/1046
>
> There is zero isolation between blob based mail repositories.
>
> /var/mail/error are mixed in with (eg) /var/mail/spam not allowing
> treatment based on the mail repositories the email was stored in
>
> I have commented on the PR, I feel the lack of isolation comes from an
invalid setup in the blob test. If you use the same approach as the one
used for the cassandra implementation you won't have any issue.


> I think a fix would be to infer the bucket used for storage, based on
> the mail repository URL that could be injected upon creation by the
> guice stack (just like Cassandra mail repository).
>
> Would you agree fixing this?
>

This is something we became aware of while implementing the guice module,
we are going to address it during our next session, see attached picture of
our wip marker :D

Regards,
Jean


> Regards,
>
> Benoit
>
> On 10/06/2022 15:53, Benoit TELLIER wrote:
> > Hello Jean,
> >
> > Answers inlined.
> >
> > On 10/06/2022 15:22, Jean Helou wrote:
> >> I fork the thread to respond on the MailRepository part :D
> >>
> >>
> >>     > fun quiz can you tell looking
> >>     > only at the documentation and code comments the difference
> >> between :
> >>     > MailRepositoryStore, MailRepositoryUrlStore and MailRepository
> >>     all of which
> >>     > are in mailrepository-api  )
> >>     Game accepted.
> >>       - Mail repository is storage for email, with their processing
> >>     context
> >>     (long term storage, differs from mail queue which is a flow).
> >>       - Mail repository are identified by their URL
> >>       - Mail repository can be created through the use of mail
> >> repository
> >>     store by supplying an URL
> >>       - MailRepositoryUrlStore is an implementation detail of
> >>     MailRepositoryStore, and brings persistance to mail repositories
> >>     (that
> >>     are created through webadmin, configuration changes etc..)
> >>
> >>
> >> Almost :D
> >> MailRepositoryStore only has 2 implementations :  in memory and a
> >> spring based
> >>
> >> If I understand it correctly the MailRepositoryStore is actually a
> >> computing cache.
> >> Roughly  equivalent to Map<MailRepositoryUrl, MailRepository> with a
> >> generic factory method to create the MailRepository when it is not
> >> already in the cache.
> > Yes.
> >> The factory method relies on a statically injected config that maps
> >> "protocols" to the FQDN of the corresponding implementation. When the
> >> "store" resolves a MailRepository through its MailRepositoryUrl, it
> >> retrieves the FQDN from the protocol part of the url then delegates
> >> to spring, guice or a static map to actually get the corresponding
> >> implementation.
> >> The naming Store and InMemory got me mightily confused when I tried
> >> to sort this out to inject the blob mail repository store.
> >>
> >> MailRepositoryUrls start with a protocol such as cassandra:// blob://
> >> file://, then a repository "id"
> >>
> >> The MailRepositoryUrlStore has 3 implementations : cassandra, jpa and
> >> inmemory. I am not yet clear on what having a persistent store brings
> >> over using the in memory store.
> >>
> >>     Now to have a MailRepositoryStore not based on Cassandra, the memory
> >>     implementation is good enough if manual creation of mail
> >>     repository is
> >>     forbidden (akka through webadmin) and if configuration is
> >>     homogeneous in
> >>     the James cluster.
> >>
> >>
> >> Even if you were to create a mail repository manually, I don't
> >> understand how anything would be stored in it if it is not mentioned
> >> in the server's configuration (mailet container's config most likely).
> > One case is "it was mentioned in the server configuration, and no
> > longer is".
> >
> > Without such persistence you could not, for instance, reprocess mail
> > repositories that you had been using.
> >
> > --------------
> >
> > An other case is "parametric mail repository" ie
> > cassandra://var/mail/customera.com/rejected
> >
> > One such exemple is Data Leak Prevention cf
> >
> https://github.com/apache/james-project/tree/master/server/mailet/mailets/src/main/java/org/apache/james/transport/matchers/dlp
> >
> > And his friend
> >
> https://github.com/apache/james-project/blob/master/server/mailet/mailets/src/main/java/org/apache/james/transport/mailets/ToSenderDomainRepository.java
> >
> > I might want to access a mail repository that exist, contains stuff,
> > but is not provisionned localy because the James server I am using did
> > not yet reject an email for this domain since it had been started.
> >
> > --------------
> >
> > Another thing is the difference between mailrepository URL / path
> > (which I am not a fan of)
> >
> > The idea was not to leak through webadmin the underlying storage
> > structure
> >
> > URL: cassandra://var/mail/error
> > PATH: var/mail/error
> >
> > Then you need to do translation between the path and the URL, which is
> > not trivial in face of several underlying storage technologies (jdbc +
> > file for example)
> >
> >> Even if there is a way to dynamically make james store mails in a
> >> mail repository that is not mentioned in the configuration, the in
> >> memory implementation will still register it when it is used.  I
> >> guess that only leaves discoverability of existing MailRepositoryUrls
> >> across restarts when an Url is not used much. That leaves me
> >> wondering what the actual use case is ...
> > Well the one time I had to deal with mail repository with a customer,
> > listing them was handy.
> >
> > That being said, I also share the feeling that "listing URLs in use"
> > through MailRepositoryUrlStore might be overkill.
> >
> > Instead we could rely on each MailRepository implementation to list
> > the URLs it do actually contain, thus drop MailRepositoryUrlStore
> > alltogether, make it an implementation detail.
> >
> > We would get :
> >   - MailRepositoryUrlSupplier interface with an implementation for
> > each MailRepository implementation.
> >   - Implementations can base decisions on their underlying storage
> > thus removing the needs for additional metadata.
> >
> > I would support such a refactoring. One less Cassandra table makes me
> > happy ;-)
> >
> >>     Ideally MailRepositoryUrlStore should not have had been in the API.
> >>
> >>
> >> Interesting, according to git history it was introduced by
> >> https://issues.apache.org/jira/browse/JAMES-2418 but that only says
> >> that the last point I mention above ( the discoverability part)  is
> >> needed but not why it is needed :D
> > I hope I did get better at writting issues since then :-P
> >>
> >> cheers
> >> jean
> >>
> >>
> >> Cheers
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>
>

Re: MailstoreApi (was Re: ElasticSearch upgrade to 8.2)

Posted by Benoit TELLIER <bt...@apache.org>.
By the way, https://github.com/apache/james-project/pull/1046

There is zero isolation between blob based mail repositories.

/var/mail/error are mixed in with (eg) /var/mail/spam not allowing 
treatment based on the mail repositories the email was stored in.

I think a fix would be to infer the bucket used for storage, based on 
the mail repository URL that could be injected upon creation by the 
guice stack (just like Cassandra mail repository).

Would you agree fixing this?

Regards,

Benoit

On 10/06/2022 15:53, Benoit TELLIER wrote:
> Hello Jean,
>
> Answers inlined.
>
> On 10/06/2022 15:22, Jean Helou wrote:
>> I fork the thread to respond on the MailRepository part :D
>>
>>
>>     > fun quiz can you tell looking
>>     > only at the documentation and code comments the difference 
>> between :
>>     > MailRepositoryStore, MailRepositoryUrlStore and MailRepository
>>     all of which
>>     > are in mailrepository-api  )
>>     Game accepted.
>>       - Mail repository is storage for email, with their processing
>>     context
>>     (long term storage, differs from mail queue which is a flow).
>>       - Mail repository are identified by their URL
>>       - Mail repository can be created through the use of mail 
>> repository
>>     store by supplying an URL
>>       - MailRepositoryUrlStore is an implementation detail of
>>     MailRepositoryStore, and brings persistance to mail repositories
>>     (that
>>     are created through webadmin, configuration changes etc..)
>>
>>
>> Almost :D
>> MailRepositoryStore only has 2 implementations :  in memory and a 
>> spring based
>>
>> If I understand it correctly the MailRepositoryStore is actually a 
>> computing cache.
>> Roughly  equivalent to Map<MailRepositoryUrl, MailRepository> with a 
>> generic factory method to create the MailRepository when it is not 
>> already in the cache.
> Yes.
>> The factory method relies on a statically injected config that maps 
>> "protocols" to the FQDN of the corresponding implementation. When the 
>> "store" resolves a MailRepository through its MailRepositoryUrl, it 
>> retrieves the FQDN from the protocol part of the url then delegates 
>> to spring, guice or a static map to actually get the corresponding 
>> implementation.
>> The naming Store and InMemory got me mightily confused when I tried 
>> to sort this out to inject the blob mail repository store.
>>
>> MailRepositoryUrls start with a protocol such as cassandra:// blob:// 
>> file://, then a repository "id"
>>
>> The MailRepositoryUrlStore has 3 implementations : cassandra, jpa and 
>> inmemory. I am not yet clear on what having a persistent store brings 
>> over using the in memory store.
>>
>>     Now to have a MailRepositoryStore not based on Cassandra, the memory
>>     implementation is good enough if manual creation of mail
>>     repository is
>>     forbidden (akka through webadmin) and if configuration is
>>     homogeneous in
>>     the James cluster.
>>
>>
>> Even if you were to create a mail repository manually, I don't 
>> understand how anything would be stored in it if it is not mentioned 
>> in the server's configuration (mailet container's config most likely).
> One case is "it was mentioned in the server configuration, and no 
> longer is".
>
> Without such persistence you could not, for instance, reprocess mail 
> repositories that you had been using.
>
> --------------
>
> An other case is "parametric mail repository" ie 
> cassandra://var/mail/customera.com/rejected
>
> One such exemple is Data Leak Prevention cf 
> https://github.com/apache/james-project/tree/master/server/mailet/mailets/src/main/java/org/apache/james/transport/matchers/dlp
>
> And his friend 
> https://github.com/apache/james-project/blob/master/server/mailet/mailets/src/main/java/org/apache/james/transport/mailets/ToSenderDomainRepository.java
>
> I might want to access a mail repository that exist, contains stuff, 
> but is not provisionned localy because the James server I am using did 
> not yet reject an email for this domain since it had been started.
>
> --------------
>
> Another thing is the difference between mailrepository URL / path 
> (which I am not a fan of)
>
> The idea was not to leak through webadmin the underlying storage 
> structure
>
> URL: cassandra://var/mail/error
> PATH: var/mail/error
>
> Then you need to do translation between the path and the URL, which is 
> not trivial in face of several underlying storage technologies (jdbc + 
> file for example)
>
>> Even if there is a way to dynamically make james store mails in a 
>> mail repository that is not mentioned in the configuration, the in 
>> memory implementation will still register it when it is used.  I 
>> guess that only leaves discoverability of existing MailRepositoryUrls 
>> across restarts when an Url is not used much. That leaves me 
>> wondering what the actual use case is ...
> Well the one time I had to deal with mail repository with a customer, 
> listing them was handy.
>
> That being said, I also share the feeling that "listing URLs in use" 
> through MailRepositoryUrlStore might be overkill.
>
> Instead we could rely on each MailRepository implementation to list 
> the URLs it do actually contain, thus drop MailRepositoryUrlStore 
> alltogether, make it an implementation detail.
>
> We would get :
>   - MailRepositoryUrlSupplier interface with an implementation for 
> each MailRepository implementation.
>   - Implementations can base decisions on their underlying storage 
> thus removing the needs for additional metadata.
>
> I would support such a refactoring. One less Cassandra table makes me 
> happy ;-)
>
>>     Ideally MailRepositoryUrlStore should not have had been in the API.
>>
>>
>> Interesting, according to git history it was introduced by 
>> https://issues.apache.org/jira/browse/JAMES-2418 but that only says 
>> that the last point I mention above ( the discoverability part)  is 
>> needed but not why it is needed :D
> I hope I did get better at writting issues since then :-P
>>
>> cheers
>> jean
>>
>>
>> Cheers

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: MailstoreApi (was Re: ElasticSearch upgrade to 8.2)

Posted by Benoit TELLIER <bt...@apache.org>.
Hello Jean,

Answers inlined.

On 10/06/2022 15:22, Jean Helou wrote:
> I fork the thread to respond on the MailRepository part :D
>
>
>     > fun quiz can you tell looking
>     > only at the documentation and code comments the difference between :
>     > MailRepositoryStore, MailRepositoryUrlStore and MailRepository
>     all of which
>     > are in mailrepository-api  )
>     Game accepted.
>       - Mail repository is storage for email, with their processing
>     context
>     (long term storage, differs from mail queue which is a flow).
>       - Mail repository are identified by their URL
>       - Mail repository can be created through the use of mail repository
>     store by supplying an URL
>       - MailRepositoryUrlStore is an implementation detail of
>     MailRepositoryStore, and brings persistance to mail repositories
>     (that
>     are created through webadmin, configuration changes etc..)
>
>
> Almost :D
> MailRepositoryStore only has 2 implementations :  in memory and a 
> spring based
>
> If I understand it correctly the MailRepositoryStore is actually a 
> computing cache.
> Roughly  equivalent to Map<MailRepositoryUrl, MailRepository> with a 
> generic factory method to create the MailRepository when it is not 
> already in the cache.
Yes.
> The factory method relies on a statically injected config that maps 
> "protocols" to the FQDN of the corresponding implementation. When the 
> "store" resolves a MailRepository through its MailRepositoryUrl, it 
> retrieves the FQDN from the protocol part of the url then delegates to 
> spring, guice or a static map to actually get the corresponding 
> implementation.
> The naming Store and InMemory got me mightily confused when I tried to 
> sort this out to inject the blob mail repository store.
>
> MailRepositoryUrls start with a protocol such as cassandra:// blob:// 
> file://, then a repository "id"
>
> The MailRepositoryUrlStore has 3 implementations : cassandra, jpa and 
> inmemory. I am not yet clear on what having a persistent store brings 
> over using the in memory store.
>
>     Now to have a MailRepositoryStore not based on Cassandra, the memory
>     implementation is good enough if manual creation of mail
>     repository is
>     forbidden (akka through webadmin) and if configuration is
>     homogeneous in
>     the James cluster.
>
>
> Even if you were to create a mail repository manually, I don't 
> understand how anything would be stored in it if it is not mentioned 
> in the server's configuration (mailet container's config most likely).
One case is "it was mentioned in the server configuration, and no longer 
is".

Without such persistence you could not, for instance, reprocess mail 
repositories that you had been using.

--------------

An other case is "parametric mail repository" ie 
cassandra://var/mail/customera.com/rejected

One such exemple is Data Leak Prevention cf 
https://github.com/apache/james-project/tree/master/server/mailet/mailets/src/main/java/org/apache/james/transport/matchers/dlp

And his friend 
https://github.com/apache/james-project/blob/master/server/mailet/mailets/src/main/java/org/apache/james/transport/mailets/ToSenderDomainRepository.java

I might want to access a mail repository that exist, contains stuff, but 
is not provisionned localy because the James server I am using did not 
yet reject an email for this domain since it had been started.

--------------

Another thing is the difference between mailrepository URL / path (which 
I am not a fan of)

The idea was not to leak through webadmin the underlying storage structure

URL: cassandra://var/mail/error
PATH: var/mail/error

Then you need to do translation between the path and the URL, which is 
not trivial in face of several underlying storage technologies (jdbc + 
file for example)

> Even if there is a way to dynamically make james store mails in a mail 
> repository that is not mentioned in the configuration, the in memory 
> implementation will still register it when it is used.  I guess that 
> only leaves discoverability of existing MailRepositoryUrls across 
> restarts when an Url is not used much. That leaves me wondering what 
> the actual use case is ...
Well the one time I had to deal with mail repository with a customer, 
listing them was handy.

That being said, I also share the feeling that "listing URLs in use" 
through MailRepositoryUrlStore might be overkill.

Instead we could rely on each MailRepository implementation to list the 
URLs it do actually contain, thus drop MailRepositoryUrlStore 
alltogether, make it an implementation detail.

We would get :
   - MailRepositoryUrlSupplier interface with an implementation for each 
MailRepository implementation.
   - Implementations can base decisions on their underlying storage thus 
removing the needs for additional metadata.

I would support such a refactoring. One less Cassandra table makes me 
happy ;-)

>     Ideally MailRepositoryUrlStore should not have had been in the API.
>
>
> Interesting, according to git history it was introduced by 
> https://issues.apache.org/jira/browse/JAMES-2418 but that only says 
> that the last point I mention above ( the discoverability part)  is 
> needed but not why it is needed :D
I hope I did get better at writting issues since then :-P
>
> cheers
> jean
>
>
> Cheers