You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2021/03/29 04:21:00 UTC

[jira] [Commented] (JAMES-3435) Relaxing LWT usage: domain, users

    [ https://issues.apache.org/jira/browse/JAMES-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310397#comment-17310397 ] 

Benoit Tellier commented on JAMES-3435:
---------------------------------------

I had an exchange on this topic with Ilja Weis.

He pointed me to the following links:

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.10

{code:java}
    - This release fix a correctness issue with SERIAL reads, and LWT writes that do not apply.
      Unfortunately, this fix has a performance impact on read performance at the SERIAL or
      LOCAL_SERIAL consistency levels. For heavy users of such SERIAL reads, the performance
      impact may be noticeable and may also result in an increased of timeouts. For that
      reason, a opt-in system property has been added to disable the fix:
        -Dcassandra.unsafe.disable-serial-reads-linearizability=true
      Use this flag at your own risk as it revert SERIAL reads to the incorrect behavior of
      previous versions. See CASSANDRA-12126 for details.
{code}

More details are provided here: https://issues.apache.org/jira/browse/CASSANDRA-12126

In short PAXOS setup for LightWeight Transaction LWT requires to commit an empty update on each reads to be sure to not miss some in flight updates (in some complex distributed failures edge cases). This results in a performance hit: some users reports some SERIAL reads timeout upon high SMTP load.

Please note that James relies on linearizability - and not achieving these guaranties will lead to message loss (UID being a per mailbox monotic integer, long consistency might leads to message overwrites).

I am convinced that we do not need SERIAL reads upon regular reads, we just need them as part of write transactions. We can thus reduce the SERIAL read workload significantly.

Then, I wonder if we should not let people more choices. For instance regarding flags updates, I would personally consider that lost updates are acceptable, as it would end up being an inconvenience to the end user who might have to mark his mail as read again. I would be glad to experiment a non LWT dependent flag update. (experiments suggests that MODSEQ allocation acts as a sequencer that limits concurrency upon flag updates - I was surprise that 20 updates conducted on 4 threads would lead to inconsistent results only 25% of the time...)

I would advocate finer grain control of the LWT through configuration properties. Namely:

{code:java}
| mailbox.read.strong.consistency
| Optional. Boolean, defaults to true. Disabling should be considered experimental.
If enabled, regular consistency level is used for read transactions for mailbox. Not doing so might result
in stale reads as the system.paxos table will not be checked for latest updates. Better performance are expected
by turning it off. Note that reads performed as part of write transactions are always performed with a strong
consistency.

| message.read.strong.consistency
| Optional. Boolean, defaults to true. Disabling should be considered experimental.
If enabled, regular consistency level is used for read transactions for message. Not doing so might result
in stale reads as the system.paxos table will not be checked for latest updates. Better performance are expected
by turning it off. Note that reads performed as part of write transactions are always performed with a strong
consistency.

| message.write.strong.consistency.unsafe
| Optional. Boolean, defaults to true. Disabling should be considered experimental and unsafe.
If enabled, Lightweight transactions will no longer be used upon messages operation (table `imapUidTable`).
As message flags updates relies so far on a read-before-write model, it exposes yourself to data races leading to
potentially update loss. Better performance are expected
by turning it off. Reads performed as part of write transaction are also performed with a relaxed consistency.
{code}

I propose myself to contribute this timely...

Also, alternative technologies might need to be explored to generate monotic counters - UID & MODSEQ (see RFC-3501) not included in above proposal as correctness is required. 

 - Discussing this very topic with @mbaechler https://atomix.io/ might  be a nice candidate - out of the box support for atomic counters but work would be needed on the cluster membership side - a good option might be standalone atomix agents https://atomix.io/docs/latest/user-manual/deployment/kubernetes/... I still remain questions regarding persistence... - I started a thread on their gitter.
 - Historically the project did have code to handle UID and MODSEQ generation through ZOOKEEPER - but was unmaintained and had been removed.
 - Could consul be a candidate? https://www.consul.io/api-docs/kv & https://www.consul.io/api/features/consistency (consistent)

I think contributions should be welcomed on the monotic integer topic to provide alternatives to Cassandra LWT.

Given this deployed we should notice a sharp drop on the LWT usage, less activity on the system.paxos table as well as a CPU usage decrease on the Cassandra cluster.


> Relaxing LWT usage: domain, users
> ---------------------------------
>
>                 Key: JAMES-3435
>                 URL: https://issues.apache.org/jira/browse/JAMES-3435
>             Project: James Server
>          Issue Type: Improvement
>          Components: cassandra
>    Affects Versions: master
>            Reporter: Benoit Tellier
>            Priority: Major
>             Fix For: master
>
>
> https://www.mail-archive.com/server-dev@james.apache.org/msg68713.html
> {code:java}
> Cassandra is an eventually consistent datastore, that can be used in a
> consistant fashion. To do so, we rely on a mechanism called "LightWeight
> Transactions (LWT)". Lightweight transactions relies on the PAXOS
> distributed consensus algorithm to enforce a condition upon data
> mutation. A table, system.paxos, is used to track the state of
> transactions. Furthermore, upon writes, several round-trips (two) are
> needed to ensure data integrity accross replica(minimum round trips to
> achieve consensus) and the system.paxos table is read / written to in
> addition to the applicative table.
> All of this causes LWT to be significantly slower than their lower
> consistency counterparts. On some Linagora owned production instances,
> regular reads takes 2ms while reads on tables relying on LWT takes 6ms.
> Similar figures are found for writes. We also noticed some high
> compaction throughtput on the paxos table, leading to many back-ground
> writes.
> Given the massive impact of LWT usage on performance, and given the lack
> of debate upon LWT adoption, I would like to re-challenge their usage...
> Here are the places we rely on LWT for the Distributed Server:
>  - IMAP UID generation (monotic integer) - strong consistency is
> strictly required to not loose data as overwriting a uid means
> overwriting a message.
>  - IMAP ModSeq generation (monotic integer) - strong consistency is
> required, as modseq overwrites can lead to some data not being well
> synchronised.
>  - Domain and users - we rely on LWT to return an error when deleting a
> user that do not exist, or creating an already existing user. It sounds
> unecessary.
>  - Message flags relies on LWT to ensure updates are not overwritten. As
> an often read metadata, the impact is high, for limited criticity for
> the end user. After all, no data is lost, only a user action like
> marking a message as Seen, an action that he can very well perform again
>  - Mailbox path registration, ACL - required to prevent data races
> My proposal would be:
>  - Keep using LWT for UID and modseq generation, as well as Mailbox path
> registration.
>  - Make the use of LWT for message flags update configurable - as an
> admin I can choose to disable it.
>  - I am also fine with completly removing LWT usage for message flags
> update.
>  - No longer use LWT on domain or users. Instead use idempotent create /
> delete. The contract test will thus need to be relaxed.
>  - On the long term, relying on a CRDT to represent ACLs at the
> Cassandra level, instead of serialized JSON, would enable to get rid of
> LWT usage on the ACL table.
> {code}
> Let's start relaxing LWT transaction for users & domains.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org