You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "René Cordier (Jira)" <se...@james.apache.org> on 2021/05/17 10:08:00 UTC

[jira] [Created] (JAMES-3586) CL one option for the Cassandra blob store

René Cordier created JAMES-3586:
-----------------------------------

             Summary: CL one option for the Cassandra blob store
                 Key: JAMES-3586
                 URL: https://issues.apache.org/jira/browse/JAMES-3586
             Project: James Server
          Issue Type: Improvement
            Reporter: René Cordier
             Fix For: 3.7.0


h2. Context

Some users are storing all message content in Cassandra and thus stores huge amount of data.

We would like to reduce the performance costs to read this large amount of data.

Blobs being immutable, we have a guaranty that:
 * If we read something the value is up-to date
 * If we fail at reading something we have a guaranty the content had not been replicated yet. A second read with a higher consistency level will read the data (and consistency piggy backed on consistency levels will heal the data)

Cassandra being very efficient at replicating things (think hinted handoff, direct asynchronous replication), we can expect that data is correctly duplicated before reads are attempted.
h2. Decision

Via a configuration option, allow optimizing blob access.
 * If enabled, perform a first read at CL one and fallback if needed by performing a second read at the regular CL
 * If disabled, only a read at the regular CL will be attempted

A metric should be implemented to track the CL one hit rate, allowing an effective review of the effectiveness of this solution.
h2. Consequences

In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
h2. Work to be conducted

Add a configuration option in cassandra.properties:
{code:bash}
# Experimental configuration option. Defaults to false.
# Enabling it resutls in reading strictly immutable (not deleted, not updated) data at CL ONE. If the data is missing,
# we can be sure that the data had not been replicated yet, a second read is performed  with a higher consistency level.
# This option still offer the same level of consistency (thanks to strict immutability) but might result in higher resource usage in case of mis-behaving replication.
# Metrics can be used to mesure the efficiency of this.
optimistic.consistency.level.enabled=false
{code}
Also apply it for messagev3 and attachmentv2 tables to benefit from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org