You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by "btellier@linagora.com (OpenPaaS)" <bt...@linagora.com> on 2020/12/04 07:22:00 UTC

About our usage of LWT in Cassandra related code

Hi,

I'm currently trying to increase overall efficiency of the Distributed
James server.

As such, I'm pocking around for improvement areas and found a huge topic
around LWT.

My conclusions so far are that we should keep LWT and SERIAL consistency
level out of the most common use cases.

I know that this is a massive change in regard of the way the project
had been working with Cassandra in the past few years. I would
definitely, in the middle term, would like to reach LWT free reads on
the Cassandra Mailbox to scale the deployments I am responsible of as
part of my Linagora job (my long term goal being to decrease the total
cost of ownership of a "Distributed James" based solution). While I am
not opposed to diverge from the Apache James project on this point, if
needed, I do believe an efficient distributed server (with the
consequences it implies in term of eventual consistency) might be a
strong asset for the Apache project as well, and would prefer to see
this work lending on the James project.

I've been ambitious on the ADR writing, especially in the complementary
work section. Let's see which consensual ground we find on that! (the ML
version here below serving as a public, immutable reference of my thinking!)

Cheers,

Benoit

-------------------------------------------------------------------

## Context

As any kind of server James needs to provide some level of consistencies.

Strong consistency can be achieved with Cassandra by relying on
LightWeight transactions. This enables
optimistic transactions on a single partition key.

Under the hood, Cassandra relies on the PAXOS algorithm to achieve
consensus across replica allowing us
to achieve linearizable consistency at the entry level. To do so,
Cassandra tracks consensus in a system.paxos
table. This `system.paxos` table needs to be checked upon reads as well
in order to ensure the latest state of the ongoing
consensus is known. This can be achieved by using the SERIAL consistency
level.

Experiments on a distributed James cluster (4 James nodes, having 4 CPU
and 8 GB of RAM each, and a 3 node Cassandra
cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that the
system.paxos table was by far the most read
and compacted table (ratio 5).

The table triggering the most reads to the `system.paxos` table was the
`acl` table. Deactivating LWT on this table alone
(lightweight transactions & SERIAL consistency level) enabled an instant
80% throughput, latencies reductions
as well as softer degradations when load breaking point is exceeded.

## Decision

Rely on `event sourcing` to maintain a projection of ACLs that do not
rely on LWT or SERIAL consistency level.

Event sourcing is thus responsible of handling concurrency and race
conditions as well as governing denormalization
for ACLs. It can be used as a source of truth to re-build ACL projections.

Note that the ACL projection tables can end up being out of
synchronization from the aggregate but we still have a
non-questionable source of truth handled via event sourcing.

## Consequences

We expect a better load handling, better response time, and cheaper
operation costs for Distributed James while not
compromising the data safety of ACL operations.

ACL updates being a rare operation, we do not expect significant
degradation of write performance by relying on
`eventSourcing`.

We need to implement a corrective task to fix the ACL denormalization
projections. Applicative read repairs could be
implemented as well, offering both diagnostic and on-the-fly corrections
without admin actions (a low probability should
however be used as loading an event sourcing aggregate is not a cheap
thing).

## Complementary work

There are several other places where we rely on Lightweight transaction
in the Cassandra code base and
that we might want to challenge:

- `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
are likely unnecessary as the webadmin
presentation layer is offering an idempotent API (and silents the
AlreadyExist exceptions). Only the CLI
(soon to be deprecated for Guice products) makes this distinction.
Discussions have started on the topic and a proof of
concept is available.
- `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
are likely unnecessary as the webadmin
presentation layer is offering an idempotent API (and silents the
AlreadyExist exceptions). Only the CLI
(soon to be deprecated for Guice products) makes this distinction.
Discussions have started on the topic and a proof of
concept is available.
- `mailboxes` relies on LWT to enforce name unicity. We hit the same
pitfalls than for ACLs as this is a very often
read table (however mailboxes of a given user being grouped together,
primary key read are more limited hence this is
less critical). Similar results could be expected. Discussions on this
topic have not been started yet. Further
impact studies on performance needs to be conducted.
- `messages` as flags update is so far transactional. However, by
better relying on the table structure used to store
flags we could be relying on Cassandra to solve data race issues for us.
Note also that IMAP CONDSTORE extension is not
implemented, and might be a non-viable option performance-wise. We might
choose to favor performance other
transactionality on this topic. Discussions on this topic have not
started yet.

LWT are required for `eventSourcing`. As event sourcing usage is limited
to low-usage use cases, the performance
degradations are not an issue.

LWT usage is required to generate `UIDs`. As append message operations
tend to be limited compared to
message update operations, this is likely less critical. UID generation
could be handled via alternative systems,
past implementations have been conducted on ZooKeeper.

If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ` likely
no longer makes sense. As such the fate of
`MODSEQ` is linked to decisions on the `message` topic.

Similarly, LWT are used to try to keep the count of emails in
MailRepository synchronize. Such a usage is non-performance
critical for a MDA (Mail Delivery Agent) use case but might have a
bigger impact for MTA (Mail Transfer Agent). No
discussion not work have been started on the topic.

Other usage of LWT includes Sieve script management, initialization of
the RabbitMQMailQueue browse start and other
low-impact use cases.

## References

* [Original pull request exploring the
topic](https://github.com/apache/james-project/pull/255):
`JAMES-3435 Cassandra: No longer rely on LWT for domain and users`
* [JIRA ticket](https://issues.apache.org/jira/browse/JAMES-3435)
* [Pull request abandoning LWT on reads for mailbox
ACL](https://github.com/linagora/james-project/pull/4103)
* [ADR-42 Applicative read
repairs](https://github.com/apache/james-project/blob/master/src/adr/0042-applicative-read-repairs.md)
* [ADR-21 ACL
inconsistencies](https://github.com/apache/james-project/blob/master/src/adr/0021-cassandra-acl-inconsistency.md)
* [Buggy IMAP CONDSTORE](https://issues.apache.org/jira/browse/JAMES-2055)
* [Link to the Mailing list thread discussing this ADR](LINK TO BE INCLUDED)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Benoit Tellier <bt...@linagora.com>.

This ADR can also be reviewed on Github:
https://github.com/apache/james-project/pull/271

Le 04/12/2020 à 14:22, btellier@linagora.com (OpenPaaS) a écrit :
> Hi,
>
> I'm currently trying to increase overall efficiency of the Distributed
> James server.
>
> As such, I'm pocking around for improvement areas and found a huge topic
> around LWT.
>
> My conclusions so far are that we should keep LWT and SERIAL consistency
> level out of the most common use cases.
>
> I know that this is a massive change in regard of the way the project
> had been working with Cassandra in the past few years. I would
> definitely, in the middle term, would like to reach LWT free reads on
> the Cassandra Mailbox to scale the deployments I am responsible of as
> part of my Linagora job (my long term goal being to decrease the total
> cost of ownership of a "Distributed James" based solution). While I am
> not opposed to diverge from the Apache James project on this point, if
> needed, I do believe an efficient distributed server (with the
> consequences it implies in term of eventual consistency) might be a
> strong asset for the Apache project as well, and would prefer to see
> this work lending on the James project.
>
> I've been ambitious on the ADR writing, especially in the complementary
> work section. Let's see which consensual ground we find on that! (the ML
> version here below serving as a public, immutable reference of my thinking!)
>
> Cheers,
>
> Benoit
>
> -------------------------------------------------------------------
>
> ## Context
>
> As any kind of server James needs to provide some level of consistencies.
>
> Strong consistency can be achieved with Cassandra by relying on
> LightWeight transactions. This enables
> optimistic transactions on a single partition key.
>
> Under the hood, Cassandra relies on the PAXOS algorithm to achieve
> consensus across replica allowing us
> to achieve linearizable consistency at the entry level. To do so,
> Cassandra tracks consensus in a system.paxos
> table. This `system.paxos` table needs to be checked upon reads as well
> in order to ensure the latest state of the ongoing
> consensus is known. This can be achieved by using the SERIAL consistency
> level.
>
> Experiments on a distributed James cluster (4 James nodes, having 4 CPU
> and 8 GB of RAM each, and a 3 node Cassandra
> cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that the
> system.paxos table was by far the most read
> and compacted table (ratio 5).
>
> The table triggering the most reads to the `system.paxos` table was the
> `acl` table. Deactivating LWT on this table alone
> (lightweight transactions & SERIAL consistency level) enabled an instant
> 80% throughput, latencies reductions
> as well as softer degradations when load breaking point is exceeded.
>
> ## Decision
>
> Rely on `event sourcing` to maintain a projection of ACLs that do not
> rely on LWT or SERIAL consistency level.
>
> Event sourcing is thus responsible of handling concurrency and race
> conditions as well as governing denormalization
> for ACLs. It can be used as a source of truth to re-build ACL projections.
>
> Note that the ACL projection tables can end up being out of
> synchronization from the aggregate but we still have a
> non-questionable source of truth handled via event sourcing.
>
> ## Consequences
>
> We expect a better load handling, better response time, and cheaper
> operation costs for Distributed James while not
> compromising the data safety of ACL operations.
>
> ACL updates being a rare operation, we do not expect significant
> degradation of write performance by relying on
> `eventSourcing`.
>
> We need to implement a corrective task to fix the ACL denormalization
> projections. Applicative read repairs could be
> implemented as well, offering both diagnostic and on-the-fly corrections
> without admin actions (a low probability should
> however be used as loading an event sourcing aggregate is not a cheap
> thing).
>
> ## Complementary work
>
> There are several other places where we rely on Lightweight transaction
> in the Cassandra code base and
> that we might want to challenge:
>
>  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI
> (soon to be deprecated for Guice products) makes this distinction.
> Discussions have started on the topic and a proof of
> concept is available.
>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI
> (soon to be deprecated for Guice products) makes this distinction.
> Discussions have started on the topic and a proof of
> concept is available.
>  - `mailboxes` relies on LWT to enforce name unicity. We hit the same
> pitfalls than for ACLs as this is a very often
>  read table (however mailboxes of a given user being grouped together,
> primary key read are more limited hence this is
>  less critical). Similar results could be expected. Discussions on this
> topic have not been started yet. Further
>  impact studies on performance needs to be conducted.
>  - `messages` as flags update is so far transactional. However, by
> better relying on the table structure used to store
> flags we could be relying on Cassandra to solve data race issues for us.
> Note also that IMAP CONDSTORE extension is not
> implemented, and might be a non-viable option performance-wise. We might
> choose to favor performance other
> transactionality on this topic. Discussions on this topic have not
> started yet.
>
> LWT are required for `eventSourcing`. As event sourcing usage is limited
> to low-usage use cases, the performance
> degradations are not an issue.
>
> LWT usage is required to generate `UIDs`. As append message operations
> tend to be limited compared to
> message update operations, this is likely less critical. UID generation
> could be handled via alternative systems,
> past implementations have been conducted on ZooKeeper.
>
> If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ` likely
> no longer makes sense. As such the fate of
> `MODSEQ` is linked to decisions on the `message` topic.
>
> Similarly, LWT are used to try to keep the count of emails in
> MailRepository synchronize. Such a usage is non-performance
> critical for a MDA (Mail Delivery Agent) use case but might have a
> bigger impact for MTA (Mail Transfer Agent). No
> discussion not work have been started on the topic.
>
> Other usage of LWT includes Sieve script management, initialization of
> the RabbitMQMailQueue browse start and other
> low-impact use cases.
>
> ## References
>
> * [Original pull request exploring the
> topic](https://github.com/apache/james-project/pull/255):
> `JAMES-3435 Cassandra: No longer rely on LWT for domain and users`
> * [JIRA ticket](https://issues.apache.org/jira/browse/JAMES-3435)
> * [Pull request abandoning LWT on reads for mailbox
> ACL](https://github.com/linagora/james-project/pull/4103)
> * [ADR-42 Applicative read
> repairs](https://github.com/apache/james-project/blob/master/src/adr/0042-applicative-read-repairs.md)
> * [ADR-21 ACL
> inconsistencies](https://github.com/apache/james-project/blob/master/src/adr/0021-cassandra-acl-inconsistency.md)
> * [Buggy IMAP CONDSTORE](https://issues.apache.org/jira/browse/JAMES-2055)
> * [Link to the Mailing list thread discussing this ADR](LINK TO BE INCLUDED)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Matthieu Baechler <ma...@apache.org>.

On Tue, 2020-12-08 at 10:12 +0700, Tellier Benoit wrote:
> Hello Matthieu,
> 
> Sadly, I'm unable to see what you did write in the email you sent due
> to
> the absence of quote.
> 
> Can you review your email client settings, in order to get a readable
> output we can start discussing on?
> 
> This time, I made the effort, but I would greatly appreciate a better
> display.
> 

I don't know what happened, I use the same mailer for years and never
had this issue before.

This morning, replying to the original mail with the same mailer with
the same settings quote things.

I guess it's a bug.


> Best regards,
> 
> Benoit
> 
> Le 07/12/2020 à 14:47, Matthieu Baechler a écrit :
> > Hi Benoit,
> > 
> > On Fri, 2020-12-04 at 14:22 +0700, btellier@linagora.com (OpenPaaS)
> > wrote:
> > Hi,
> > 
> > I'm currently trying to increase overall efficiency of the
> > Distributed
> > James server.
> > 
> > As such, I'm pocking around for improvement areas and found a huge
> > topic
> > around LWT.
> > 
> > My conclusions so far are that we should keep LWT and SERIAL
> > consistency
> > level out of the most common use cases.
> > 
> > I know that this is a massive change in regard of the way the
> > project
> > had been working with Cassandra in the past few years. I would
> > definitely, in the middle term, would like to reach LWT free reads
> > on
> > the Cassandra Mailbox to scale the deployments I am responsible of
> > as
> > part of my Linagora job (my long term goal being to decrease the
> > total
> > cost of ownership of a "Distributed James" based solution). While I
> > am
> > not opposed to diverge from the Apache James project on this point,
> > if
> > needed, I do believe an efficient distributed server (with the
> > consequences it implies in term of eventual consistency) might be a
> > strong asset for the Apache project as well, and would prefer to
> > see
> > this work lending on the James project.
> > 
> > I've been ambitious on the ADR writing, especially in the
> > complementary
> > work section. Let's see which consensual ground we find on that!
> > (the
> > ML
> > version here below serving as a public, immutable reference of my
> > thinking!)
> > 
> > 
> > I doubt we can model IMAP without serializability somewhere but
> > let's
> > read your proposal as I have LWT as much as you are.
> 
> s/have/hate/ ?

Yes, typo

> 
> > 
> > 
> > -------------------------------------------------------------------
> > 
> > ## Context
> > 
> > As any kind of server James needs to provide some level of
> > consistencies.
> > 
> > Strong consistency can be achieved with Cassandra by relying on
> > LightWeight transactions. This enables
> > optimistic transactions on a single partition key.
> > 
> > Under the hood, Cassandra relies on the PAXOS algorithm to achieve
> > consensus across replica allowing us
> > to achieve linearizable consistency at the entry level. To do so,
> > Cassandra tracks consensus in a system.paxos
> > table. This `system.paxos` table needs to be checked upon reads as
> > well
> > in order to ensure the latest state of the ongoing
> > consensus is known. This can be achieved by using the SERIAL
> > consistency
> > level.
> > 
> > Experiments on a distributed James cluster (4 James nodes, having 4
> > CPU
> > and 8 GB of RAM each, and a 3 node Cassandra
> > cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that
> > the
> > system.paxos table was by far the most read
> > and compacted table (ratio 5).
> > The table triggering the most reads to the `system.paxos` table was
> > the
> > `acl` table. Deactivating LWT on this table alone
> > (lightweight transactions & SERIAL consistency level) enabled an
> > instant
> > 80% throughput, latencies reductions
> > as well as softer degradations when load breaking point is
> > exceeded.
> > 
> > 
> > Do you mean that Cassandra is the bottleneck in this setup?
> > What is the effect of having more Cassandra nodes?
> 
> Yes, it is.
> 
> The effect of adding more Cassandra nodes means more costs.

You didn't answered the question I asked, do you?

> Our ownership cost is so far of 5€/user/year which is around 25 time
> more than our competitors. The goal is to lower such costs, in order
> to
> have a viable commercial solution built on top of James.

Do you have any source regarding competitor costs?

BTW, I don't disagree we could have a better usage of resources.


> 
> > 
> > ## Decision
> > 
> > Rely on `event sourcing` to maintain a projection of ACLs that do
> > not
> > rely on LWT or SERIAL consistency level.
> > 
> > Event sourcing is thus responsible of handling concurrency and race
> > conditions as well as governing denormalization
> > for ACLs. It can be used as a source of truth to re-build ACL
> > projections.
> > 
> > Note that the ACL projection tables can end up being out of
> > synchronization from the aggregate but we still have a
> > non-questionable source of truth handled via event sourcing.
> > 
> > ## Consequences
> > 
> > We expect a better load handling, better response time, and cheaper
> > operation costs for Distributed James while not
> > compromising the data safety of ACL operations.
> > 
> > ACL updates being a rare operation, we do not expect significant
> > degradation of write performance by relying on
> > `eventSourcing`.
> > 
> > We need to implement a corrective task to fix the ACL
> > denormalization
> > projections. Applicative read repairs could be
> > implemented as well, offering both diagnostic and on-the-fly
> > corrections
> > without admin actions (a low probability should
> > however be used as loading an event sourcing aggregate is not a
> > cheap
> > thing).
> > 
> > 
> > What implementation are you using for Event Sourcing? AFAIK, James
> > on
> > Cassandra uses LWT + batchs for Event Store.
> 
> I have answered to Raphael that we were moving transactionality out
> of
> the read path. Writes being rare keeping some sort of transactions
> like
> eventsourcing on the write path is not an issue.

I don't see this question/answer in this mail thread.

> > 
> > ## Complementary work
> > 
> > There are several other places where we rely on Lightweight
> > transaction
> > in the Cassandra code base and
> > that we might want to challenge:
> > 
> >  - `users` we rely on LWT for throwing "AlreadyExist" exceptions.
> > LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> > (soon to be deprecated for Guice products) makes this distinction.
> > Discussions have started on the topic and a proof of
> > concept is available.
> >  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions.
> > LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> > (soon to be deprecated for Guice products) makes this distinction.
> > Discussions have started on the topic and a proof of
> > concept is available.
> >  - `mailboxes` relies on LWT to enforce name unicity. We hit the
> > same
> > pitfalls than for ACLs as this is a very often
> >  read table (however mailboxes of a given user being grouped
> > together,
> > primary key read are more limited hence this is
> >  less critical). Similar results could be expected. Discussions on
> > this
> > topic have not been started yet. Further
> >  impact studies on performance needs to be conducted.
> > 
> > Well, lagging on ACL is not really a problem but for mailbox, don't
> > you
> > fear having race conditions and thus name collision on mailbox?
> 
> The eventSourcing source of truth being queried upon writes,
> conflicts
> will be resolved?

You propose to use Event Sourcing to handle mailbox operations?

> 
> > 
> >  - `messages` as flags update is so far transactional. However, by
> > better relying on the table structure used to store
> > flags we could be relying on Cassandra to solve data race issues
> > for
> > us.
> > Note also that IMAP CONDSTORE extension is not
> > implemented, and might be a non-viable option performance-wise. We
> > might
> > choose to favor performance other
> > transactionality on this topic. Discussions on this topic have not
> > started yet.
> > 
> > I think that modern IMAP extensions are important for the user
> > experience: they can make email handling faster by themselves. I
> > would
> > not make a choice that prevents implementation of such extensions
> > in
> > the futures.
> 
> My opinion is that IMAP belongs to the past. It is an inefficient,
> complicated protocol and our implementation of it is clearly not in
> good
> shape.
> 
> My strategy (at least on Linagora products) to convert as much
> clients
> as possible in JMAP.
> 
> I understand this point is controversial.

Standard protocols are always a thing of the past and it's why there
are actual widespread implementations for it.

How do you expect to make iOS support JMAP natively? macOS? Outlook?

People won't change their mailers just because IMAP is inefficient.
GMail is implementing IMAP and it's no coincidence.

I would not agree to let James deprecate IMAP usage for the time being.

> 
> > 
> > LWT are required for `eventSourcing`. As event sourcing usage is
> > limited
> > to low-usage use cases, the performance
> > degradations are not an issue.
> > 
> > I think I understand but I ask anyway: the performance gain is not
> > really the removal of LWT but the CQRS nature of Event Sourcing,
> > you'll
> > read in a view that doesn't use LWT. 
> 
> Yes.
> 
> > Can't you achieve the same with a
> > "simpler" CQRS architecture without using Event Sourcing?
> 
> Define ""simpler" CQRS architecture", I don't understand what you
> mean.

CQRS is having a write path and one or several read path. It allows to
have different constraints for each path. Like having a transactional
ACL writing and an eventually consistent fast non-transactional ACL
reading.

Event Sourcing is fine in this regard but its purpose is to take
decision, model around user intention, etc. It may not be exactly what
you want for ACLs.

You could, for example, keep the existing code and build a view with a
listener for the read path.

> 
> Also, as explained to Jean, ACLDiff being fired and use in the
> mailbox
> system, some sort of transactionality is enforced by the  current
> code,
> that's expensive to change, and I don't intend to change it.

Good point.

> > 
> > 
> > LWT usage is required to generate `UIDs`. As append message
> > operations
> > tend to be limited compared to
> > message update operations, this is likely less critical. UID
> > generation
> > could be handled via alternative systems,
> > past implementations have been conducted on ZooKeeper.
> > 
> > If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ`
> > likely
> > no longer makes sense. As such the fate of
> > `MODSEQ` is linked to decisions on the `message` topic.
> > 
> > 
> > Oh, here we are: we need yet another system. Note that I'm in favor
> > of
> > it but that's the reason why we use LWT in the first place: avoid
> > this
> > additional dependency. It's rather LWT or any transactional system
> > as
> > we can't find a wait to workaround the need for monotonic
> > distributed
> > counter (for example).
> > 
> > You listed several problems and in my opinion each one may have a
> > different solution. What about debating each one separately?
> 
> Please.
> 
> > 
> > Could we start from here: what's the best solution to implement a
> > monotonic distributed counter?
> 
> Here are ideas on the top of my head:
> 
>  - 1. No implementing it in the first place - because we don't need
> to.

Always the best. Didn't find how for now.

>  - 2. Zookeeper ?
>  - 3. ?

Having James instances share some data by themselves instead of relying
on an external system (for example using https://atomix.io/ )

Cheers,

-- Matthieu Baechler
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Tellier Benoit <bt...@apache.org>.

Hello Matthieu,

Sadly, I'm unable to see what you did write in the email you sent due to
the absence of quote.

Can you review your email client settings, in order to get a readable
output we can start discussing on?

This time, I made the effort, but I would greatly appreciate a better
display.

Best regards,

Benoit

Le 07/12/2020 à 14:47, Matthieu Baechler a écrit :
> Hi Benoit,
> 
> On Fri, 2020-12-04 at 14:22 +0700, btellier@linagora.com (OpenPaaS) wrote:
> Hi,
> 
> I'm currently trying to increase overall efficiency of the Distributed
> James server.
> 
> As such, I'm pocking around for improvement areas and found a huge
> topic
> around LWT.
> 
> My conclusions so far are that we should keep LWT and SERIAL
> consistency
> level out of the most common use cases.
> 
> I know that this is a massive change in regard of the way the project
> had been working with Cassandra in the past few years. I would
> definitely, in the middle term, would like to reach LWT free reads on
> the Cassandra Mailbox to scale the deployments I am responsible of as
> part of my Linagora job (my long term goal being to decrease the total
> cost of ownership of a "Distributed James" based solution). While I am
> not opposed to diverge from the Apache James project on this point, if
> needed, I do believe an efficient distributed server (with the
> consequences it implies in term of eventual consistency) might be a
> strong asset for the Apache project as well, and would prefer to see
> this work lending on the James project.
> 
> I've been ambitious on the ADR writing, especially in the complementary
> work section. Let's see which consensual ground we find on that! (the
> ML
> version here below serving as a public, immutable reference of my
> thinking!)
> 
> 
> I doubt we can model IMAP without serializability somewhere but let's
> read your proposal as I have LWT as much as you are.

s/have/hate/ ?

> 
> 
> -------------------------------------------------------------------
> 
> ## Context
> 
> As any kind of server James needs to provide some level of
> consistencies.
> 
> Strong consistency can be achieved with Cassandra by relying on
> LightWeight transactions. This enables
> optimistic transactions on a single partition key.
> 
> Under the hood, Cassandra relies on the PAXOS algorithm to achieve
> consensus across replica allowing us
> to achieve linearizable consistency at the entry level. To do so,
> Cassandra tracks consensus in a system.paxos
> table. This `system.paxos` table needs to be checked upon reads as well
> in order to ensure the latest state of the ongoing
> consensus is known. This can be achieved by using the SERIAL
> consistency
> level.
> 
> Experiments on a distributed James cluster (4 James nodes, having 4 CPU
> and 8 GB of RAM each, and a 3 node Cassandra
> cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that the
> system.paxos table was by far the most read
> and compacted table (ratio 5).
> The table triggering the most reads to the `system.paxos` table was the
> `acl` table. Deactivating LWT on this table alone
> (lightweight transactions & SERIAL consistency level) enabled an
> instant
> 80% throughput, latencies reductions
> as well as softer degradations when load breaking point is exceeded.
> 
> 
> Do you mean that Cassandra is the bottleneck in this setup?
> What is the effect of having more Cassandra nodes?

Yes, it is.

The effect of adding more Cassandra nodes means more costs.

Our ownership cost is so far of 5€/user/year which is around 25 time
more than our competitors. The goal is to lower such costs, in order to
have a viable commercial solution built on top of James.

> 
> ## Decision
> 
> Rely on `event sourcing` to maintain a projection of ACLs that do not
> rely on LWT or SERIAL consistency level.
> 
> Event sourcing is thus responsible of handling concurrency and race
> conditions as well as governing denormalization
> for ACLs. It can be used as a source of truth to re-build ACL
> projections.
> 
> Note that the ACL projection tables can end up being out of
> synchronization from the aggregate but we still have a
> non-questionable source of truth handled via event sourcing.
> 
> ## Consequences
> 
> We expect a better load handling, better response time, and cheaper
> operation costs for Distributed James while not
> compromising the data safety of ACL operations.
> 
> ACL updates being a rare operation, we do not expect significant
> degradation of write performance by relying on
> `eventSourcing`.
> 
> We need to implement a corrective task to fix the ACL denormalization
> projections. Applicative read repairs could be
> implemented as well, offering both diagnostic and on-the-fly
> corrections
> without admin actions (a low probability should
> however be used as loading an event sourcing aggregate is not a cheap
> thing).
> 
> 
> What implementation are you using for Event Sourcing? AFAIK, James on
> Cassandra uses LWT + batchs for Event Store.

I have answered to Raphael that we were moving transactionality out of
the read path. Writes being rare keeping some sort of transactions like
eventsourcing on the write path is not an issue.

> 
> ## Complementary work
> 
> There are several other places where we rely on Lightweight transaction
> in the Cassandra code base and
> that we might want to challenge:
> 
>  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI
> (soon to be deprecated for Guice products) makes this distinction.
> Discussions have started on the topic and a proof of
> concept is available.
>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI
> (soon to be deprecated for Guice products) makes this distinction.
> Discussions have started on the topic and a proof of
> concept is available.
>  - `mailboxes` relies on LWT to enforce name unicity. We hit the same
> pitfalls than for ACLs as this is a very often
>  read table (however mailboxes of a given user being grouped together,
> primary key read are more limited hence this is
>  less critical). Similar results could be expected. Discussions on this
> topic have not been started yet. Further
>  impact studies on performance needs to be conducted.
> 
> Well, lagging on ACL is not really a problem but for mailbox, don't you
> fear having race conditions and thus name collision on mailbox?

The eventSourcing source of truth being queried upon writes, conflicts
will be resolved?

> 
>  - `messages` as flags update is so far transactional. However, by
> better relying on the table structure used to store
> flags we could be relying on Cassandra to solve data race issues for
> us.
> Note also that IMAP CONDSTORE extension is not
> implemented, and might be a non-viable option performance-wise. We
> might
> choose to favor performance other
> transactionality on this topic. Discussions on this topic have not
> started yet.
> 
> I think that modern IMAP extensions are important for the user
> experience: they can make email handling faster by themselves. I would
> not make a choice that prevents implementation of such extensions in
> the futures.

My opinion is that IMAP belongs to the past. It is an inefficient,
complicated protocol and our implementation of it is clearly not in good
shape.

My strategy (at least on Linagora products) to convert as much clients
as possible in JMAP.

I understand this point is controversial.

> 
> LWT are required for `eventSourcing`. As event sourcing usage is
> limited
> to low-usage use cases, the performance
> degradations are not an issue.
> 
> I think I understand but I ask anyway: the performance gain is not
> really the removal of LWT but the CQRS nature of Event Sourcing, you'll
> read in a view that doesn't use LWT. 

Yes.

> Can't you achieve the same with a
> "simpler" CQRS architecture without using Event Sourcing?

Define ""simpler" CQRS architecture", I don't understand what you mean.

Also, as explained to Jean, ACLDiff being fired and use in the mailbox
system, some sort of transactionality is enforced by the  current code,
that's expensive to change, and I don't intend to change it.

> 
> 
> LWT usage is required to generate `UIDs`. As append message operations
> tend to be limited compared to
> message update operations, this is likely less critical. UID generation
> could be handled via alternative systems,
> past implementations have been conducted on ZooKeeper.
> 
> If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ` likely
> no longer makes sense. As such the fate of
> `MODSEQ` is linked to decisions on the `message` topic.
> 
> 
> Oh, here we are: we need yet another system. Note that I'm in favor of
> it but that's the reason why we use LWT in the first place: avoid this
> additional dependency. It's rather LWT or any transactional system as
> we can't find a wait to workaround the need for monotonic distributed
> counter (for example).
> 
> You listed several problems and in my opinion each one may have a
> different solution. What about debating each one separately?

Please.

> 
> Could we start from here: what's the best solution to implement a
> monotonic distributed counter?

Here are ideas on the top of my head:

 - 1. No implementing it in the first place - because we don't need to.
 - 2. Zookeeper ?
 - 3. ?

> 
> Cheers,
> 
> -- Matthieu Baechler
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Matthieu Baechler <ma...@apache.org>.

Hi Benoit,

On Fri, 2020-12-04 at 14:22 +0700, btellier@linagora.com (OpenPaaS) wrote:
Hi,

I'm currently trying to increase overall efficiency of the Distributed
James server.

As such, I'm pocking around for improvement areas and found a huge
topic
around LWT.

My conclusions so far are that we should keep LWT and SERIAL
consistency
level out of the most common use cases.

I know that this is a massive change in regard of the way the project
had been working with Cassandra in the past few years. I would
definitely, in the middle term, would like to reach LWT free reads on
the Cassandra Mailbox to scale the deployments I am responsible of as
part of my Linagora job (my long term goal being to decrease the total
cost of ownership of a "Distributed James" based solution). While I am
not opposed to diverge from the Apache James project on this point, if
needed, I do believe an efficient distributed server (with the
consequences it implies in term of eventual consistency) might be a
strong asset for the Apache project as well, and would prefer to see
this work lending on the James project.

I've been ambitious on the ADR writing, especially in the complementary
work section. Let's see which consensual ground we find on that! (the
ML
version here below serving as a public, immutable reference of my
thinking!)


I doubt we can model IMAP without serializability somewhere but let's
read your proposal as I have LWT as much as you are.


-------------------------------------------------------------------

## Context

As any kind of server James needs to provide some level of
consistencies.

Strong consistency can be achieved with Cassandra by relying on
LightWeight transactions. This enables
optimistic transactions on a single partition key.

Under the hood, Cassandra relies on the PAXOS algorithm to achieve
consensus across replica allowing us
to achieve linearizable consistency at the entry level. To do so,
Cassandra tracks consensus in a system.paxos
table. This `system.paxos` table needs to be checked upon reads as well
in order to ensure the latest state of the ongoing
consensus is known. This can be achieved by using the SERIAL
consistency
level.

Experiments on a distributed James cluster (4 James nodes, having 4 CPU
and 8 GB of RAM each, and a 3 node Cassandra
cluster of 32 GB of RAM, 8 CPUs, and SSD disks) demonstrated that the
system.paxos table was by far the most read
and compacted table (ratio 5).
The table triggering the most reads to the `system.paxos` table was the
`acl` table. Deactivating LWT on this table alone
(lightweight transactions & SERIAL consistency level) enabled an
instant
80% throughput, latencies reductions
as well as softer degradations when load breaking point is exceeded.


Do you mean that Cassandra is the bottleneck in this setup?
What is the effect of having more Cassandra nodes?

## Decision

Rely on `event sourcing` to maintain a projection of ACLs that do not
rely on LWT or SERIAL consistency level.

Event sourcing is thus responsible of handling concurrency and race
conditions as well as governing denormalization
for ACLs. It can be used as a source of truth to re-build ACL
projections.

Note that the ACL projection tables can end up being out of
synchronization from the aggregate but we still have a
non-questionable source of truth handled via event sourcing.

## Consequences

We expect a better load handling, better response time, and cheaper
operation costs for Distributed James while not
compromising the data safety of ACL operations.

ACL updates being a rare operation, we do not expect significant
degradation of write performance by relying on
`eventSourcing`.

We need to implement a corrective task to fix the ACL denormalization
projections. Applicative read repairs could be
implemented as well, offering both diagnostic and on-the-fly
corrections
without admin actions (a low probability should
however be used as loading an event sourcing aggregate is not a cheap
thing).


What implementation are you using for Event Sourcing? AFAIK, James on
Cassandra uses LWT + batchs for Event Store.

## Complementary work

There are several other places where we rely on Lightweight transaction
in the Cassandra code base and
that we might want to challenge:

 - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
are likely unnecessary as the webadmin
presentation layer is offering an idempotent API (and silents the
AlreadyExist exceptions). Only the CLI
(soon to be deprecated for Guice products) makes this distinction.
Discussions have started on the topic and a proof of
concept is available.
 - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
are likely unnecessary as the webadmin
presentation layer is offering an idempotent API (and silents the
AlreadyExist exceptions). Only the CLI
(soon to be deprecated for Guice products) makes this distinction.
Discussions have started on the topic and a proof of
concept is available.
 - `mailboxes` relies on LWT to enforce name unicity. We hit the same
pitfalls than for ACLs as this is a very often
 read table (however mailboxes of a given user being grouped together,
primary key read are more limited hence this is
 less critical). Similar results could be expected. Discussions on this
topic have not been started yet. Further
 impact studies on performance needs to be conducted.

Well, lagging on ACL is not really a problem but for mailbox, don't you
fear having race conditions and thus name collision on mailbox?

 - `messages` as flags update is so far transactional. However, by
better relying on the table structure used to store
flags we could be relying on Cassandra to solve data race issues for
us.
Note also that IMAP CONDSTORE extension is not
implemented, and might be a non-viable option performance-wise. We
might
choose to favor performance other
transactionality on this topic. Discussions on this topic have not
started yet.

I think that modern IMAP extensions are important for the user
experience: they can make email handling faster by themselves. I would
not make a choice that prevents implementation of such extensions in
the futures.

LWT are required for `eventSourcing`. As event sourcing usage is
limited
to low-usage use cases, the performance
degradations are not an issue.

I think I understand but I ask anyway: the performance gain is not
really the removal of LWT but the CQRS nature of Event Sourcing, you'll
read in a view that doesn't use LWT. Can't you achieve the same with a
"simpler" CQRS architecture without using Event Sourcing?


LWT usage is required to generate `UIDs`. As append message operations
tend to be limited compared to
message update operations, this is likely less critical. UID generation
could be handled via alternative systems,
past implementations have been conducted on ZooKeeper.

If not implementing IMAP CONDSTORE, generation of IMAP `MODSEQ` likely
no longer makes sense. As such the fate of
`MODSEQ` is linked to decisions on the `message` topic.


Oh, here we are: we need yet another system. Note that I'm in favor of
it but that's the reason why we use LWT in the first place: avoid this
additional dependency. It's rather LWT or any transactional system as
we can't find a wait to workaround the need for monotonic distributed
counter (for example).

You listed several problems and in my opinion each one may have a
different solution. What about debating each one separately?

Could we start from here: what's the best solution to implement a
monotonic distributed counter?

Cheers,

-- Matthieu Baechler


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Tellier Benoit <bt...@apache.org>.

I'll answer other threads separately.

I created https://issues.apache.org/jira/browse/JAMES-3468 about this.

I asked Quan, our intern, to take a look.

Regards,

Benoit Tellier

Le 09/12/2020 à 12:58, Jean Helou a écrit :
>>> So from a user perspective adding a user would always succeed. But would
>> it
>>> succeed by doing nothing (the current behaviour in silencing the
>>> AlreadyExist exception) or would it succeed by effectively overwriting
>> the
>>> user (in a last write wins manner) ?
>>
>> Webadmin so far overwrite the user (and its password) in a last write
>> win manner.
>>
> 
> That sounds really scary
> 
> 
>>  - Either we need to distinguish "create" from "update" within the
>> webadmin API
>>
> 
> Well  that would definitely have my vote : as an admin operator I *never*
> want to accidentally overwrite an existing user when trying to create a new
> one (with the possible exception of retrying a create operation that just
> timeouted, in which case my first reflex would be to execute a read to try
> and make sure that the operation that just failed hasn't actually succeeded)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Tellier Benoit <bt...@apache.org>.

Le 09/12/2020 à 12:58, Jean Helou a écrit :

>  > Also for ACLs isn't eventual consistency acceptable ?
> 
> 
>> My take is that it is. My first shot was to do just that.
>>
>> Howver the current code enforces a "transaction like" behavior, with an
>> ACLDiff fired on the mailbox event system. Maintaining denormalization
>> was also an issue in the face of reset (requiring to know previously
>> stored data).
>>
>> A full rewritte of the context ACLs are run in would have been needed.
>> And could have been controversial.
>>
>>> using transactions to
>>> avoid non serial writes but accepting stale reads ?
>>
>> Could be too. This could be a quick-win to enhance existing situation,
>> while waiting for stronger decisions and
> 
> What you describe could be a very good short term solution for message
>> flags (I think we will need to challenge CONDSTORE/QRSYNC anyway!)
>>
> 
> Why do you say this would only be a good short term solution ?

On the key table, here is what nodetool tablestat states on one of the
production instances I <3 :

		Table: imapuidtable
		SSTable count: 3
		Space used (live): 472544882
		Space used (total): 472544882
		SSTable Compression Ratio: 0.40981487390641924
		Number of partitions (estimate): 6486085
		Local read count: 15793743
		Local read latency: 0.233 ms
		Local write count: 9171561
		Local write latency: 0.037 ms

What you see is that ~40% of the workload on key messages metadata is
writes.

 - IMAP SELECT reset the RECENT flag
 - SEEN flags is updated for most messages
 - People move / delete their messages

That's why I say turning off SERIAL reads on this is a partial solution
that we can activate (via a configuration option to make every body
happy) on the short term.

LWT on writes will stay a problem.

> My understanding is that currently
> - most reads and writes use LWT and that using LWT implies slow writes (to
> something akin to a lock I  suppose) for both read and write operations

Some, those against the table that acts as a source of truth.

> - the amount of reads far outweighs the number of writes (at least for the
> listed use cases of ACLs, users and domains, it sounds like the UID/Flags
> stuff may not be so clear cut)

This is true for acl and mailbox, not for messages hence the different
solution.

> 
> In such a context dropping read consistency while keeping write consistency
> sounds like it would already be a huge gain :) The assertions in a number
> of tests will likely have to be updated to rely on `Awaitility.await()` :)

+1, that would be an enhancements.

> 
> 
> On Wed, Dec 9, 2020 at 2:40 AM Tellier Benoit <bt...@apache.org> wrote:
> 
>> Sorry for repost,
>>
>> I sent that response before but it was lost.
>>
>> Maybe the unfamous text/html format issue.
>>
>> Le 07/12/2020 à 04:33, Jean Helou a écrit :
>>> Hello,
>>>
>>> I'm currently trying to increase overall efficiency of the Distributed
>>>> James server.
>>>>
>>>
>>> I have some concerns but i feel imposterish for posting them as they most
>>> likely come from my own lack of knowledge, i'll still try just in case
>> some
>>> of points are valid :)
>>
>> Thank you very much to dare to do so!
>>
>> You are likely not the only one to lack this knowledge, hopefully
>> discussions will clarify that.
>>
>> I also sometime have problems to express myself clearly, thus
>> explanation are normal.
>>
>>>
>>>  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
>>>> are likely unnecessary as the webadmin
>>>> presentation layer is offering an idempotent API (and silents the
>>>> AlreadyExist exceptions). Only the CLI
>>>
>>> (soon to be deprecated for Guice products) makes this distinction.
>>>
>>>
>>> So from a user perspective adding a user would always succeed. But would
>> it
>>> succeed by doing nothing (the current behaviour in silencing the
>>> AlreadyExist exception) or would it succeed by effectively overwriting
>> the
>>> user (in a last write wins manner) ?
>>
>> Webadmin so far overwrite the user (and its password) in a last write
>> win manner.
>>
>>> This is a completely different
>>> behaviour which is not necessarily desirable.
>>> this can be further divided into 2 different cases :
>>> - there are concurrent attempts to create the same user (in which case
>> the
>>> user data is very likely the same or very close, and has possibly never
>>> been exposed to a human) in which case the LWW behaviour may be
>> acceptable
>>> - A user has existed for a long time (definition of long to be defined
>> but
>>> I would say above a few seconds :) ) in which cas overwriting is most
>>> likely not acceptable
>>>
>>
>> My take is that we need to make a choice:
>>
>>  - Either we need to distinguish "create" from "update" within the
>> webadmin API
>>  - Or we relax the condition downstream.
>>
>> I would be in favor of handling conflict as part of the WebAdmin API.
>>
>> Moreover what you say regarding conflict is very interesting. It suggest
>> strong consistency might not be needed. A simple "read before your
>> write" would be enough to see if a long standing user would be updated,
>> while not dealing with conflict for newly created user.
>>
>>>
>>>>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
>>>> are likely unnecessary as the webadmin
>>>> presentation layer is offering an idempotent API (and silents the
>>>> AlreadyExist exceptions). Only the CLI
>>>> (soon to be deprecated for Guice products) makes this distinction.
>>>> Discussions have started on the topic and a proof of
>>>> concept is available.
>>>>
>>>
>>> same as above
>>>
>>
>>
>> No not really, as domains cary no other information (like user carry a
>> password).
>>
>> You don't have the risk to grant access to somebody else account by
>> mistake.
>>
>>> Why it would be ok to drop LWT for ACL updates only to replace it by
>>> eventsourcing when you write:
>>>> LWT are required for `eventSourcing`. As event sourcing usage is limited
>>> to low-usage use cases, the performance degradations are not an issue.
>>> Doesn't that mean that ACLs would still rely on LWT but within an
>>> additional layer ?
>>
>> Yes. Writes are resolved against event sourcing system using LWT.
>>
>> Reads are resolved against a projection, free of LWT, maintaines via
>> subscribers to the event sourcing system.
>>
>>> Also for ACLs isn't eventual consistency acceptable ?
>>
>> My take is that it is. My first shot was to do just that.
>>
>> Howver the current code enforces a "transaction like" behavior, with an
>> ACLDiff fired on the mailbox event system. Maintaining denormalization
>> was also an issue in the face of reset (requiring to know previously
>> stored data).
>>
>> A full rewritte of the context ACLs are run in would have been needed.
>> And could have been controversial.
>>
>>> using transactions to
>>> avoid non serial writes but accepting stale reads ?
>>
>> Could be too. This could be a quick-win to enhance existing situation,
>> while waiting for stronger decisions and
>>
>> Here we might need a distinction between reads made as part of the
>> update process (that are required to be up to date, so need to be
>> SERIAL!) and regular reads that are acceptable to be stale (and can be
>> made using QUORUM).
>>
>> What you describe could be a very good short term solution for message
>> flags (I think we will need to challenge CONDSTORE/QRSYNC anyway!)
>>
>>>
>>> That's the limit of my understanding : all the flags/UID/IMAP concerns
>> are
>>> beyond my current knowledge but I'll enjoy reading the comments :)
>>>
>>> jean
>>>
>>
>> Cheers,
>>
>> Benoit
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
>> For additional commands, e-mail: server-dev-help@james.apache.org
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Jean Helou <je...@gmail.com>.

> > So from a user perspective adding a user would always succeed. But would
> it
> > succeed by doing nothing (the current behaviour in silencing the
> > AlreadyExist exception) or would it succeed by effectively overwriting
> the
> > user (in a last write wins manner) ?
>
> Webadmin so far overwrite the user (and its password) in a last write
> win manner.
>

That sounds really scary


>  - Either we need to distinguish "create" from "update" within the
> webadmin API
>

Well  that would definitely have my vote : as an admin operator I *never*
want to accidentally overwrite an existing user when trying to create a new
one (with the possible exception of retrying a create operation that just
timeouted, in which case my first reflex would be to execute a read to try
and make sure that the operation that just failed hasn't actually succeeded)



> Moreover what you say regarding conflict is very interesting. It suggest
> strong consistency might not be needed. A simple "read before your
> write" would be enough to see if a long standing user would be updated,
> while not dealing with conflict for newly created user.
>

Rereading that paragraph, I realize I am biased towards human time scale.
That's an option if the api/system is only operated by a human because
computers and networks are relatively fast in the human time scale. If the
api is used for automation that's not so true anymore as consensus
propagation is slow in the computer time scale.

With that said, user driven write operations on users should be relatively
rare : creation, some updates and eventually deletion should probably trade
off speed for consistency.
if there are write operations triggered by internal events these may not
require full consistency

To illustrate a bit with actual use cases :

As an admin of the server, I trigger a user creation, since I'm a sloppy
admin, I already communicated the credentials and email address to 3rd
parties which may start trying to use them while the creation is in
progress
*before* the create command completes I wouldn't expect logins to succeed
or sent emails to be delivered though it wouldn't be an issue if they were
*after* the command completes I expect all logins and all email delivery to
succeed
This implies full consistency on the write operation, but inconsistent
reads are fine

As a user I change my password , I know the new password since I provided
it to the update command.
*before* the  command completes I would expect logins with the new
credentials to fail and I would expect logins with the old credentials to
succeed
*after* the command completes I expect all logins with the new credentials
to succeed and all logins with the old credentials to fail
this implies full consistency on the corresponding read and writes (for
password updates)

you can probably infer the delete expectations are the reverse of the
creation :)  as far as I can tell the user only bears username and password
and here I assume that username are immutable but updating the username
mutability would require the same consistency constraints as updating the
password


 > Also for ACLs isn't eventual consistency acceptable ?


> My take is that it is. My first shot was to do just that.
>
> Howver the current code enforces a "transaction like" behavior, with an
> ACLDiff fired on the mailbox event system. Maintaining denormalization
> was also an issue in the face of reset (requiring to know previously
> stored data).
>
> A full rewritte of the context ACLs are run in would have been needed.
> And could have been controversial.
>
> > using transactions to
> > avoid non serial writes but accepting stale reads ?
>
> Could be too. This could be a quick-win to enhance existing situation,
> while waiting for stronger decisions and

What you describe could be a very good short term solution for message
> flags (I think we will need to challenge CONDSTORE/QRSYNC anyway!)
>

Why do you say this would only be a good short term solution ?
My understanding is that currently
- most reads and writes use LWT and that using LWT implies slow writes (to
something akin to a lock I  suppose) for both read and write operations
- the amount of reads far outweighs the number of writes (at least for the
listed use cases of ACLs, users and domains, it sounds like the UID/Flags
stuff may not be so clear cut)

In such a context dropping read consistency while keeping write consistency
sounds like it would already be a huge gain :) The assertions in a number
of tests will likely have to be updated to rely on `Awaitility.await()` :)


On Wed, Dec 9, 2020 at 2:40 AM Tellier Benoit <bt...@apache.org> wrote:

> Sorry for repost,
>
> I sent that response before but it was lost.
>
> Maybe the unfamous text/html format issue.
>
> Le 07/12/2020 à 04:33, Jean Helou a écrit :
> > Hello,
> >
> > I'm currently trying to increase overall efficiency of the Distributed
> >> James server.
> >>
> >
> > I have some concerns but i feel imposterish for posting them as they most
> > likely come from my own lack of knowledge, i'll still try just in case
> some
> > of points are valid :)
>
> Thank you very much to dare to do so!
>
> You are likely not the only one to lack this knowledge, hopefully
> discussions will clarify that.
>
> I also sometime have problems to express myself clearly, thus
> explanation are normal.
>
> >
> >  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> >> are likely unnecessary as the webadmin
> >> presentation layer is offering an idempotent API (and silents the
> >> AlreadyExist exceptions). Only the CLI
> >
> > (soon to be deprecated for Guice products) makes this distinction.
> >
> >
> > So from a user perspective adding a user would always succeed. But would
> it
> > succeed by doing nothing (the current behaviour in silencing the
> > AlreadyExist exception) or would it succeed by effectively overwriting
> the
> > user (in a last write wins manner) ?
>
> Webadmin so far overwrite the user (and its password) in a last write
> win manner.
>
> > This is a completely different
> > behaviour which is not necessarily desirable.
> > this can be further divided into 2 different cases :
> > - there are concurrent attempts to create the same user (in which case
> the
> > user data is very likely the same or very close, and has possibly never
> > been exposed to a human) in which case the LWW behaviour may be
> acceptable
> > - A user has existed for a long time (definition of long to be defined
> but
> > I would say above a few seconds :) ) in which cas overwriting is most
> > likely not acceptable
> >
>
> My take is that we need to make a choice:
>
>  - Either we need to distinguish "create" from "update" within the
> webadmin API
>  - Or we relax the condition downstream.
>
> I would be in favor of handling conflict as part of the WebAdmin API.
>
> Moreover what you say regarding conflict is very interesting. It suggest
> strong consistency might not be needed. A simple "read before your
> write" would be enough to see if a long standing user would be updated,
> while not dealing with conflict for newly created user.
>
> >
> >>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> >> are likely unnecessary as the webadmin
> >> presentation layer is offering an idempotent API (and silents the
> >> AlreadyExist exceptions). Only the CLI
> >> (soon to be deprecated for Guice products) makes this distinction.
> >> Discussions have started on the topic and a proof of
> >> concept is available.
> >>
> >
> > same as above
> >
>
>
> No not really, as domains cary no other information (like user carry a
> password).
>
> You don't have the risk to grant access to somebody else account by
> mistake.
>
> > Why it would be ok to drop LWT for ACL updates only to replace it by
> > eventsourcing when you write:
> >> LWT are required for `eventSourcing`. As event sourcing usage is limited
> > to low-usage use cases, the performance degradations are not an issue.
> > Doesn't that mean that ACLs would still rely on LWT but within an
> > additional layer ?
>
> Yes. Writes are resolved against event sourcing system using LWT.
>
> Reads are resolved against a projection, free of LWT, maintaines via
> subscribers to the event sourcing system.
>
> > Also for ACLs isn't eventual consistency acceptable ?
>
> My take is that it is. My first shot was to do just that.
>
> Howver the current code enforces a "transaction like" behavior, with an
> ACLDiff fired on the mailbox event system. Maintaining denormalization
> was also an issue in the face of reset (requiring to know previously
> stored data).
>
> A full rewritte of the context ACLs are run in would have been needed.
> And could have been controversial.
>
> > using transactions to
> > avoid non serial writes but accepting stale reads ?
>
> Could be too. This could be a quick-win to enhance existing situation,
> while waiting for stronger decisions and
>
> Here we might need a distinction between reads made as part of the
> update process (that are required to be up to date, so need to be
> SERIAL!) and regular reads that are acceptable to be stale (and can be
> made using QUORUM).
>
> What you describe could be a very good short term solution for message
> flags (I think we will need to challenge CONDSTORE/QRSYNC anyway!)
>
> >
> > That's the limit of my understanding : all the flags/UID/IMAP concerns
> are
> > beyond my current knowledge but I'll enjoy reading the comments :)
> >
> > jean
> >
>
> Cheers,
>
> Benoit
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>
>

Re: About our usage of LWT in Cassandra related code

Posted by Tellier Benoit <bt...@apache.org>.

Sorry for repost,

I sent that response before but it was lost.

Maybe the unfamous text/html format issue.

Le 07/12/2020 à 04:33, Jean Helou a écrit :
> Hello,
> 
> I'm currently trying to increase overall efficiency of the Distributed
>> James server.
>>
> 
> I have some concerns but i feel imposterish for posting them as they most
> likely come from my own lack of knowledge, i'll still try just in case some
> of points are valid :)

Thank you very much to dare to do so!

You are likely not the only one to lack this knowledge, hopefully
discussions will clarify that.

I also sometime have problems to express myself clearly, thus
explanation are normal.

> 
>  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
>> are likely unnecessary as the webadmin
>> presentation layer is offering an idempotent API (and silents the
>> AlreadyExist exceptions). Only the CLI
> 
> (soon to be deprecated for Guice products) makes this distinction.
> 
> 
> So from a user perspective adding a user would always succeed. But would it
> succeed by doing nothing (the current behaviour in silencing the
> AlreadyExist exception) or would it succeed by effectively overwriting the
> user (in a last write wins manner) ? 

Webadmin so far overwrite the user (and its password) in a last write
win manner.

> This is a completely different
> behaviour which is not necessarily desirable.
> this can be further divided into 2 different cases :
> - there are concurrent attempts to create the same user (in which case the
> user data is very likely the same or very close, and has possibly never
> been exposed to a human) in which case the LWW behaviour may be acceptable
> - A user has existed for a long time (definition of long to be defined but
> I would say above a few seconds :) ) in which cas overwriting is most
> likely not acceptable
> 

My take is that we need to make a choice:

 - Either we need to distinguish "create" from "update" within the
webadmin API
 - Or we relax the condition downstream.

I would be in favor of handling conflict as part of the WebAdmin API.

Moreover what you say regarding conflict is very interesting. It suggest
strong consistency might not be needed. A simple "read before your
write" would be enough to see if a long standing user would be updated,
while not dealing with conflict for newly created user.

> 
>>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
>> are likely unnecessary as the webadmin
>> presentation layer is offering an idempotent API (and silents the
>> AlreadyExist exceptions). Only the CLI
>> (soon to be deprecated for Guice products) makes this distinction.
>> Discussions have started on the topic and a proof of
>> concept is available.
>>
> 
> same as above
>


No not really, as domains cary no other information (like user carry a
password).

You don't have the risk to grant access to somebody else account by mistake.

> Why it would be ok to drop LWT for ACL updates only to replace it by
> eventsourcing when you write:
>> LWT are required for `eventSourcing`. As event sourcing usage is limited
> to low-usage use cases, the performance degradations are not an issue.
> Doesn't that mean that ACLs would still rely on LWT but within an
> additional layer ?

Yes. Writes are resolved against event sourcing system using LWT.

Reads are resolved against a projection, free of LWT, maintaines via
subscribers to the event sourcing system.

> Also for ACLs isn't eventual consistency acceptable ? 

My take is that it is. My first shot was to do just that.

Howver the current code enforces a "transaction like" behavior, with an
ACLDiff fired on the mailbox event system. Maintaining denormalization
was also an issue in the face of reset (requiring to know previously
stored data).

A full rewritte of the context ACLs are run in would have been needed.
And could have been controversial.

> using transactions to
> avoid non serial writes but accepting stale reads ?

Could be too. This could be a quick-win to enhance existing situation,
while waiting for stronger decisions and

Here we might need a distinction between reads made as part of the
update process (that are required to be up to date, so need to be
SERIAL!) and regular reads that are acceptable to be stale (and can be
made using QUORUM).

What you describe could be a very good short term solution for message
flags (I think we will need to challenge CONDSTORE/QRSYNC anyway!)

> 
> That's the limit of my understanding : all the flags/UID/IMAP concerns are
> beyond my current knowledge but I'll enjoy reading the comments :)
> 
> jean
> 

Cheers,

Benoit

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Matthieu Baechler <ma...@apache.org>.

On Sun, 2020-12-06 at 22:33 +0100, Jean Helou wrote:
> Hello,
> 
> I'm currently trying to increase overall efficiency of the
> Distributed
> > James server.
> > 
> 
> I have some concerns but i feel imposterish for posting them as they
> most
> likely come from my own lack of knowledge, i'll still try just in
> case some
> of points are valid :)
> 
>  - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> 
> (soon to be deprecated for Guice products) makes this distinction.
> 
> 
> So from a user perspective adding a user would always succeed. But
> would it
> succeed by doing nothing (the current behaviour in silencing the
> AlreadyExist exception) or would it succeed by effectively
> overwriting the
> user (in a last write wins manner) ? This is a completely different
> behaviour which is not necessarily desirable.
> this can be further divided into 2 different cases :
> - there are concurrent attempts to create the same user (in which
> case the
> user data is very likely the same or very close, and has possibly
> never
> been exposed to a human) in which case the LWW behaviour may be
> acceptable
> - A user has existed for a long time (definition of long to be
> defined but
> I would say above a few seconds :) ) in which cas overwriting is most
> likely not acceptable
> 

Fully agree: being idempotent for a command is not the same thing as
"having unpredicable things happening without complaining".

I almost never want to overwrite a user without explicitely asking for
it: as a user creating a resource is not the same intention as
modifying it.

> 
> >  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions.
> > LWT
> > are likely unnecessary as the webadmin
> > presentation layer is offering an idempotent API (and silents the
> > AlreadyExist exceptions). Only the CLI
> > (soon to be deprecated for Guice products) makes this distinction.
> > Discussions have started on the topic and a proof of
> > concept is available.
> > 
> 
> same as above
> 
> Why it would be ok to drop LWT for ACL updates only to replace it by
> eventsourcing when you write:
> > LWT are required for `eventSourcing`. As event sourcing usage is
> > limited
> to low-usage use cases, the performance degradations are not an
> issue.
> Doesn't that mean that ACLs would still rely on LWT but within an
> additional layer ?

Yes, it's the proposed solution AFAIU.

> Also for ACLs isn't eventual consistency acceptable ? using
> transactions to
> avoid non serial writes but accepting stale reads ?

I would say ACL could take effect in an eventual consistency way.

> That's the limit of my understanding : all the flags/UID/IMAP
> concerns are
> beyond my current knowledge but I'll enjoy reading the comments :)

Cheers,

-- Matthieu Baechler


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: About our usage of LWT in Cassandra related code

Posted by Jean Helou <je...@gmail.com>.

Hello,

I'm currently trying to increase overall efficiency of the Distributed
> James server.
>

I have some concerns but i feel imposterish for posting them as they most
likely come from my own lack of knowledge, i'll still try just in case some
of points are valid :)

 - `users` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI

(soon to be deprecated for Guice products) makes this distinction.


So from a user perspective adding a user would always succeed. But would it
succeed by doing nothing (the current behaviour in silencing the
AlreadyExist exception) or would it succeed by effectively overwriting the
user (in a last write wins manner) ? This is a completely different
behaviour which is not necessarily desirable.
this can be further divided into 2 different cases :
- there are concurrent attempts to create the same user (in which case the
user data is very likely the same or very close, and has possibly never
been exposed to a human) in which case the LWW behaviour may be acceptable
- A user has existed for a long time (definition of long to be defined but
I would say above a few seconds :) ) in which cas overwriting is most
likely not acceptable


>  - `domains` we rely on LWT for throwing "AlreadyExist" exceptions. LWT
> are likely unnecessary as the webadmin
> presentation layer is offering an idempotent API (and silents the
> AlreadyExist exceptions). Only the CLI
> (soon to be deprecated for Guice products) makes this distinction.
> Discussions have started on the topic and a proof of
> concept is available.
>

same as above

Why it would be ok to drop LWT for ACL updates only to replace it by
eventsourcing when you write:
>LWT are required for `eventSourcing`. As event sourcing usage is limited
to low-usage use cases, the performance degradations are not an issue.
Doesn't that mean that ACLs would still rely on LWT but within an
additional layer ?
Also for ACLs isn't eventual consistency acceptable ? using transactions to
avoid non serial writes but accepting stale reads ?

That's the limit of my understanding : all the flags/UID/IMAP concerns are
beyond my current knowledge but I'll enjoy reading the comments :)

jean