You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Benoit TELLIER <bt...@apache.org> on 2022/05/09 10:47:14 UTC

"Branding" of pulsar-cassandra-smtp-relay

Hello all,

Recent work had been invested by Matthieu and Jean on integrating Apache 
James and Pulsar. The resulting artifact is pulsar-cassandra-smtp-relay.

The name makes it hard to grasp what goals are researched, we also miss 
a little statement/README about this new product.

Here is a little explaination of the goal of this application I can propose:

```
This artifact leverage Apache Pulsar to deliver a mailet container that 
scales your mail processing.
It can be used both for inbound and outbound behavior customization.
Thanks to Apache Pulsar, get your hands on a fully featured distributed 
mail queue to manage efficiently your email delivery.
It targets minimal dependencies: solely Apache Pulsar and an object store.
```

If we reach a consensus I would be happy to write its README, including 
sample configuration and start instructions.

I understand from recent tickets (JAMES-3761 JAMES-3762 and JAMES-3763) 
that it is intendeed to not get Cassandra as a dependency. Some features 
are optional and can easily be dropped (recipient rewritting) however 
some others would eventually need an alternative implementation (I am 
thinking to domains, users). What is the plan regarding this?

Regarding the artifact name I am uneasy with it. Generally we tend to be 
too technical on the artifact name and base it on backing technologies 
rather than focussing on the intent. Also, 'relay' is one of the many 
possible features, but there are others. As such I think the 
'smtp-relay' part of the name is too restrictive.

Alternative concepts for names:
  - Replace smtp-relay with smtp only to make it more generic
  - Replace smtp-relay by mail-processing which I think might be the 
main target of such a server.
  - Replace pulsar-cassandra by scaling or distributed prefix that would 
emphasis the intent rather than the technology
  - If we are to stick with technology names maybe just keep 'pulsar' as 
Cassandra usage looks accidental and contributors

Which would lead to (scalar combination of the above):
  - pulsar-smtp / distributed-smtp
  - pulsar-mail-processing / distributed-mail-processing / 
scaling-mail-processing

Thoughts?

Best regards,

Benoit TELLIER


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: "Branding" of pulsar-cassandra-smtp-relay

Posted by Jean Helou <je...@gmail.com>.
Hello :)

Did you had a look at configuration based server-data implementations?
>

We considered it for a while, maybe we should revisit it.

They could fit your goals while minimizing storage costs.
>

indeed, but xml is scary ;)

Jean


> I'm thinking to XMLDomainList, XMLRacipientRewriteTable. We would likely
> miss a corresponding XML user repository but this should not be hard to
> implement if needed.
>

> For the artifact name, +1 for scaling-pulsar-smtp.
>
> Best regards,
>
> Benoit
>
> On 11/05/2022 04:50, Jean Helou wrote:
> > Hello Benoit !
> >
> > Matthieu and I are trying to create a server instance to host our
> family's
> > email :)
> > Being the geeks that we are, we are not satisfied with a single node JPA
> > deployment and want
> > to preserve the scalability and resilience properties of distributed
> james
> > while minimizing the
> > run cost of a small scale deployment so as to keep our hobby in a
> > reasonable[1] price range.
> >
> > Since a mail server is such a  complex piece of software, we chose to
> > concentrate our efforts
> > on a smaller scope first: SMTP processing only. Our initial deployment
> will
> > thus be SMTP
> > only, then we intend to look into building an IMAP/JMAP instance, a bit
> > like what's described in
> > "distributed james - specialized instances" [2]. At least that's what's
> > guiding our forays into james
> > codebase.
> >
> > We looked at various options and found a pulsar as a service that's
> fairly
> > cheap over at clever-cloud, they also have a reasonably cheap S3 clone.
> > This is what led to the initial pulsar dev as the cost of other
> > MQ saas offering was much higher.
> > That leaves the primary datastore, we initially targeted cassandra
> because
> > we knew that worked but that made the run cost explode.
> >
> > At the moment we will intend to start with a hosted postgresql instance
> > instead. If only to finally be able to see the pulsar code run without
> > ruining ourselves :)
> >
> > With this clarified, I'll try to answer your mail :
> >
> > The name makes it hard to grasp what goals are researched, we also miss
> >> a little statement/README about this new product.
> >>
> > The goals did change a couple times but it always wanted to be a
> "simpler",
> > SMTP-specialized instance than what could be achieved with the main
> > distributed server app.
> >
> > Here is a little explaination of the goal of this application I can
> propose:
> >> ```
> >> This artifact leverage Apache Pulsar to deliver a mailet container that
> >> scales your mail processing.
> >> It can be used both for inbound and outbound behavior customization.
> >> Thanks to Apache Pulsar, get your hands on a fully featured distributed
> >> mail queue to manage efficiently your email delivery.
> >> It targets minimal dependencies: solely Apache Pulsar and an object
> store.
> >> ```
> >>
> > If we reach a consensus I would be happy to write its README, including
> >> sample configuration and start instructions.
> >>
> > Minimal dependencies to this level would be great but we looked into it
> and
> > we don't expect to be able to minimize them that much, in the
> short/medium
> > term.
> >
> >
> >> I understand from recent tickets (JAMES-3761 JAMES-3762 and JAMES-3763)
> >> that it is intendeed to not get Cassandra as a dependency. Some features
> >> are optional and can easily be dropped (recipient rewritting) however
> >> some others would eventually need an alternative implementation (I am
> >> thinking to domains, users). What is the plan regarding this?
> >>
> > Offering an alternative to cassandra which is an expensive fit for
> smaller
> > scale deployments would indeed require an alternative implementation to:
> > - DomainList (5 methods)
> > - RecipientRewriteTable with it's regex mapping (26 methods)
> > - UserRepository (15 methods)
> >
> > I am not sure we can completely drop RecipientRewriteTable, I think we
> > looked into it with Matthieu and it wasn't as optional as it first
> looked.
> > I think maybe it had something to do with error handling which made it a
> > mandatory dependency for the mailet container.
> > Maybe we will try dropping it again in our next attempt at starting the
> > app...
> >
> > If we can't drop it, it means 46 behaviours to implement. That's quite a
> > lot the time we can allocate for our
> > evening fun-first pair programming sessions :)
> > For the time being, we intend to run with JPA instead so we can get our
> > instances up and running, start accumulating feedback from a real world
> > deployment and work on documenting the configuration and run sides.
> >
> > I realize we could build our assembly privately or in a different
> > repository but we also need a demo place to plug in the implementations
> we
> > consider to be the most valuable for the community : the pulsar mailqueue
> > and the blob store mailrepository.
> > I do feel that these two are components that most James deployments would
> > benefit from.
> >
> > Regarding the artifact name I am uneasy with it. Generally we tend to be
> >> too technical on the artifact name and base it on backing technologies
> >> rather than focussing on the intent. Also, 'relay' is one of the many
> >> possible features, but there are others. As such I think the
> >> 'smtp-relay' part of the name is too restrictive.
> >>
> > I wholeheartedly agree, I am quite ashamed being the author of the
> current
> > naming.
> >
> > Alternative concepts for names:
> >>    - Replace smtp-relay with smtp only to make it more generic
> >>
> > +1
> >
> >>    - Replace smtp-relay by mail-processing which I think might be the
> >> main target of such a server.
> >>
> > `mail-processing` feels too generic to me, while it might be a good fit I
> > wouldn't be able to tell what such an app does. I think smtp is a better
> > name as that is the only mail protocol
> > spoken by the app.
> >
> >    - Replace pulsar-cassandra by scaling or distributed prefix that would
> >> emphasis the intent rather than the technology
> >>
> > Within the context of what I explained above, I am not sure which, if
> any,
> > is appropriate.
> > Maybe scaling is less constraining than distributed. If we end up
> depending
> > upon JPA we probably won't be able to say "distributed" :)
> >
> >    - If we are to stick with technology names maybe just keep 'pulsar' as
> >> Cassandra usage looks accidental and contributors
> >>
> > Casandra usage is indeed accidental, the initial app cloned the
> distributed
> > app,
> > simplified it and swapped rabbitMQ with pulsar.
> >
> > Which would lead to (scalar combination of the above):
> >>    - pulsar-smtp / distributed-smtp
> >>    - pulsar-mail-processing / distributed-mail-processing /
> >> scaling-mail-processing
> >>
> > So maybe scaling-pulsar-smtp ?
> >
> > Jean
> >
> > [1] For some definition of "reasonable" that we carefully avoid
> > investigating to closely, ignorance is bliss
> > [2]
> >
> https://github.com/apache/james-project/blob/master/server/apps/distributed-app/docs/modules/ROOT/pages/architecture/specialized-instances.adoc#distributed-james-server--specialized-instances
> > I couldn't navigate to this page from the james website, I remember
> having
> > done so accidentally in the past, but I couldn't find the path this time
> > around. Also https://james.apache.org/server/objectives.html has a dead
> > link to Distributed Email server
> > <https://james.staged.apache.org/james-distributed-app/3.7.0/index.html>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>
>

Re: "Branding" of pulsar-cassandra-smtp-relay

Posted by Benoit TELLIER <bt...@apache.org>.
Hello Jean,

Did you had a look at configuration based server-data implementations?

They could fit your goals while minimizing storage costs.

I'm thinking to XMLDomainList, XMLRacipientRewriteTable. We would likely 
miss a corresponding XML user repository but this should not be hard to 
implement if needed.

For the artifact name, +1 for scaling-pulsar-smtp.

Best regards,

Benoit

On 11/05/2022 04:50, Jean Helou wrote:
> Hello Benoit !
>
> Matthieu and I are trying to create a server instance to host our family's
> email :)
> Being the geeks that we are, we are not satisfied with a single node JPA
> deployment and want
> to preserve the scalability and resilience properties of distributed james
> while minimizing the
> run cost of a small scale deployment so as to keep our hobby in a
> reasonable[1] price range.
>
> Since a mail server is such a  complex piece of software, we chose to
> concentrate our efforts
> on a smaller scope first: SMTP processing only. Our initial deployment will
> thus be SMTP
> only, then we intend to look into building an IMAP/JMAP instance, a bit
> like what's described in
> "distributed james - specialized instances" [2]. At least that's what's
> guiding our forays into james
> codebase.
>
> We looked at various options and found a pulsar as a service that's fairly
> cheap over at clever-cloud, they also have a reasonably cheap S3 clone.
> This is what led to the initial pulsar dev as the cost of other
> MQ saas offering was much higher.
> That leaves the primary datastore, we initially targeted cassandra because
> we knew that worked but that made the run cost explode.
>
> At the moment we will intend to start with a hosted postgresql instance
> instead. If only to finally be able to see the pulsar code run without
> ruining ourselves :)
>
> With this clarified, I'll try to answer your mail :
>
> The name makes it hard to grasp what goals are researched, we also miss
>> a little statement/README about this new product.
>>
> The goals did change a couple times but it always wanted to be a "simpler",
> SMTP-specialized instance than what could be achieved with the main
> distributed server app.
>
> Here is a little explaination of the goal of this application I can propose:
>> ```
>> This artifact leverage Apache Pulsar to deliver a mailet container that
>> scales your mail processing.
>> It can be used both for inbound and outbound behavior customization.
>> Thanks to Apache Pulsar, get your hands on a fully featured distributed
>> mail queue to manage efficiently your email delivery.
>> It targets minimal dependencies: solely Apache Pulsar and an object store.
>> ```
>>
> If we reach a consensus I would be happy to write its README, including
>> sample configuration and start instructions.
>>
> Minimal dependencies to this level would be great but we looked into it and
> we don't expect to be able to minimize them that much, in the short/medium
> term.
>
>
>> I understand from recent tickets (JAMES-3761 JAMES-3762 and JAMES-3763)
>> that it is intendeed to not get Cassandra as a dependency. Some features
>> are optional and can easily be dropped (recipient rewritting) however
>> some others would eventually need an alternative implementation (I am
>> thinking to domains, users). What is the plan regarding this?
>>
> Offering an alternative to cassandra which is an expensive fit for smaller
> scale deployments would indeed require an alternative implementation to:
> - DomainList (5 methods)
> - RecipientRewriteTable with it's regex mapping (26 methods)
> - UserRepository (15 methods)
>
> I am not sure we can completely drop RecipientRewriteTable, I think we
> looked into it with Matthieu and it wasn't as optional as it first looked.
> I think maybe it had something to do with error handling which made it a
> mandatory dependency for the mailet container.
> Maybe we will try dropping it again in our next attempt at starting the
> app...
>
> If we can't drop it, it means 46 behaviours to implement. That's quite a
> lot the time we can allocate for our
> evening fun-first pair programming sessions :)
> For the time being, we intend to run with JPA instead so we can get our
> instances up and running, start accumulating feedback from a real world
> deployment and work on documenting the configuration and run sides.
>
> I realize we could build our assembly privately or in a different
> repository but we also need a demo place to plug in the implementations we
> consider to be the most valuable for the community : the pulsar mailqueue
> and the blob store mailrepository.
> I do feel that these two are components that most James deployments would
> benefit from.
>
> Regarding the artifact name I am uneasy with it. Generally we tend to be
>> too technical on the artifact name and base it on backing technologies
>> rather than focussing on the intent. Also, 'relay' is one of the many
>> possible features, but there are others. As such I think the
>> 'smtp-relay' part of the name is too restrictive.
>>
> I wholeheartedly agree, I am quite ashamed being the author of the current
> naming.
>
> Alternative concepts for names:
>>    - Replace smtp-relay with smtp only to make it more generic
>>
> +1
>
>>    - Replace smtp-relay by mail-processing which I think might be the
>> main target of such a server.
>>
> `mail-processing` feels too generic to me, while it might be a good fit I
> wouldn't be able to tell what such an app does. I think smtp is a better
> name as that is the only mail protocol
> spoken by the app.
>
>    - Replace pulsar-cassandra by scaling or distributed prefix that would
>> emphasis the intent rather than the technology
>>
> Within the context of what I explained above, I am not sure which, if any,
> is appropriate.
> Maybe scaling is less constraining than distributed. If we end up depending
> upon JPA we probably won't be able to say "distributed" :)
>
>    - If we are to stick with technology names maybe just keep 'pulsar' as
>> Cassandra usage looks accidental and contributors
>>
> Casandra usage is indeed accidental, the initial app cloned the distributed
> app,
> simplified it and swapped rabbitMQ with pulsar.
>
> Which would lead to (scalar combination of the above):
>>    - pulsar-smtp / distributed-smtp
>>    - pulsar-mail-processing / distributed-mail-processing /
>> scaling-mail-processing
>>
> So maybe scaling-pulsar-smtp ?
>
> Jean
>
> [1] For some definition of "reasonable" that we carefully avoid
> investigating to closely, ignorance is bliss
> [2]
> https://github.com/apache/james-project/blob/master/server/apps/distributed-app/docs/modules/ROOT/pages/architecture/specialized-instances.adoc#distributed-james-server--specialized-instances
> I couldn't navigate to this page from the james website, I remember having
> done so accidentally in the past, but I couldn't find the path this time
> around. Also https://james.apache.org/server/objectives.html has a dead
> link to Distributed Email server
> <https://james.staged.apache.org/james-distributed-app/3.7.0/index.html>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: "Branding" of pulsar-cassandra-smtp-relay

Posted by Jean Helou <je...@gmail.com>.
Hello Benoit !

Matthieu and I are trying to create a server instance to host our family's
email :)
Being the geeks that we are, we are not satisfied with a single node JPA
deployment and want
to preserve the scalability and resilience properties of distributed james
while minimizing the
run cost of a small scale deployment so as to keep our hobby in a
reasonable[1] price range.

Since a mail server is such a  complex piece of software, we chose to
concentrate our efforts
on a smaller scope first: SMTP processing only. Our initial deployment will
thus be SMTP
only, then we intend to look into building an IMAP/JMAP instance, a bit
like what's described in
"distributed james - specialized instances" [2]. At least that's what's
guiding our forays into james
codebase.

We looked at various options and found a pulsar as a service that's fairly
cheap over at clever-cloud, they also have a reasonably cheap S3 clone.
This is what led to the initial pulsar dev as the cost of other
MQ saas offering was much higher.
That leaves the primary datastore, we initially targeted cassandra because
we knew that worked but that made the run cost explode.

At the moment we will intend to start with a hosted postgresql instance
instead. If only to finally be able to see the pulsar code run without
ruining ourselves :)

With this clarified, I'll try to answer your mail :

The name makes it hard to grasp what goals are researched, we also miss
> a little statement/README about this new product.
>

The goals did change a couple times but it always wanted to be a "simpler",
SMTP-specialized instance than what could be achieved with the main
distributed server app.

Here is a little explaination of the goal of this application I can propose:
>
> ```
> This artifact leverage Apache Pulsar to deliver a mailet container that
> scales your mail processing.
> It can be used both for inbound and outbound behavior customization.
> Thanks to Apache Pulsar, get your hands on a fully featured distributed
> mail queue to manage efficiently your email delivery.
> It targets minimal dependencies: solely Apache Pulsar and an object store.
> ```
>
If we reach a consensus I would be happy to write its README, including
> sample configuration and start instructions.
>

Minimal dependencies to this level would be great but we looked into it and
we don't expect to be able to minimize them that much, in the short/medium
term.


> I understand from recent tickets (JAMES-3761 JAMES-3762 and JAMES-3763)
> that it is intendeed to not get Cassandra as a dependency. Some features
> are optional and can easily be dropped (recipient rewritting) however
> some others would eventually need an alternative implementation (I am
> thinking to domains, users). What is the plan regarding this?
>

Offering an alternative to cassandra which is an expensive fit for smaller
scale deployments would indeed require an alternative implementation to:
- DomainList (5 methods)
- RecipientRewriteTable with it's regex mapping (26 methods)
- UserRepository (15 methods)

I am not sure we can completely drop RecipientRewriteTable, I think we
looked into it with Matthieu and it wasn't as optional as it first looked.
I think maybe it had something to do with error handling which made it a
mandatory dependency for the mailet container.
Maybe we will try dropping it again in our next attempt at starting the
app...

If we can't drop it, it means 46 behaviours to implement. That's quite a
lot the time we can allocate for our
evening fun-first pair programming sessions :)
For the time being, we intend to run with JPA instead so we can get our
instances up and running, start accumulating feedback from a real world
deployment and work on documenting the configuration and run sides.

I realize we could build our assembly privately or in a different
repository but we also need a demo place to plug in the implementations we
consider to be the most valuable for the community : the pulsar mailqueue
and the blob store mailrepository.
I do feel that these two are components that most James deployments would
benefit from.

Regarding the artifact name I am uneasy with it. Generally we tend to be
> too technical on the artifact name and base it on backing technologies
> rather than focussing on the intent. Also, 'relay' is one of the many
> possible features, but there are others. As such I think the
> 'smtp-relay' part of the name is too restrictive.
>

I wholeheartedly agree, I am quite ashamed being the author of the current
naming.

Alternative concepts for names:
>   - Replace smtp-relay with smtp only to make it more generic
>
+1

>   - Replace smtp-relay by mail-processing which I think might be the
> main target of such a server.
>

`mail-processing` feels too generic to me, while it might be a good fit I
wouldn't be able to tell what such an app does. I think smtp is a better
name as that is the only mail protocol
spoken by the app.

  - Replace pulsar-cassandra by scaling or distributed prefix that would
> emphasis the intent rather than the technology
>

Within the context of what I explained above, I am not sure which, if any,
is appropriate.
Maybe scaling is less constraining than distributed. If we end up depending
upon JPA we probably won't be able to say "distributed" :)

  - If we are to stick with technology names maybe just keep 'pulsar' as
> Cassandra usage looks accidental and contributors
>

Casandra usage is indeed accidental, the initial app cloned the distributed
app,
simplified it and swapped rabbitMQ with pulsar.

Which would lead to (scalar combination of the above):
>   - pulsar-smtp / distributed-smtp
>   - pulsar-mail-processing / distributed-mail-processing /
> scaling-mail-processing
>

So maybe scaling-pulsar-smtp ?

Jean

[1] For some definition of "reasonable" that we carefully avoid
investigating to closely, ignorance is bliss
[2]
https://github.com/apache/james-project/blob/master/server/apps/distributed-app/docs/modules/ROOT/pages/architecture/specialized-instances.adoc#distributed-james-server--specialized-instances
I couldn't navigate to this page from the james website, I remember having
done so accidentally in the past, but I couldn't find the path this time
around. Also https://james.apache.org/server/objectives.html has a dead
link to Distributed Email server
<https://james.staged.apache.org/james-distributed-app/3.7.0/index.html>