You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "René Cordier (Jira)" <se...@james.apache.org> on 2021/05/17 11:00:00 UTC

[jira] [Updated] (JAMES-3586) CL one option for the Cassandra blob store

     [ https://issues.apache.org/jira/browse/JAMES-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

René Cordier updated JAMES-3586:
--------------------------------
    Description: 
h2. Context

Some users are storing all message content in Cassandra and thus stores huge amount of data.

We would like to reduce the performance costs to read this large amount of data.

Blobs being immutable, we have a guaranty that:
 * If we read something the value is up-to date
 * If we fail at reading something we have a guaranty the content had not been replicated yet. A second read with a higher consistency level will read the data (and consistency piggy backed on consistency levels will heal the data)

Cassandra being very efficient at replicating things (think hinted handoff, direct asynchronous replication), we can expect that data is correctly duplicated before reads are attempted.
h2. Decision

Via a configuration option, allow optimizing blob access.
 * If enabled, perform a first read at CL one and fallback if needed by performing a second read at the regular CL
 * If disabled, only a read at the regular CL will be attempted

A metric should be implemented to track the CL one hit rate, allowing an effective review of the effectiveness of this solution.
h2. Consequences

In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
h2. Work to be conducted

Add a configuration option in cassandra.properties:
{code:bash}
# Experimental configuration option. Defaults to false.
# Enabling it resutls in reading strictly immutable (not deleted, not updated) data at CL ONE. If the data is missing,
# we can be sure that the data had not been replicated yet, a second read is performed  with a higher consistency level.
# This option still offer the same level of consistency (thanks to strict immutability) but might result in higher resource usage in case of mis-behaving replication.
# Metrics can be used to mesure the efficiency of this.
optimistic.consistency.level.enabled=false
{code}
 

  was:
h2. Context

Some users are storing all message content in Cassandra and thus stores huge amount of data.

We would like to reduce the performance costs to read this large amount of data.

Blobs being immutable, we have a guaranty that:
 * If we read something the value is up-to date
 * If we fail at reading something we have a guaranty the content had not been replicated yet. A second read with a higher consistency level will read the data (and consistency piggy backed on consistency levels will heal the data)

Cassandra being very efficient at replicating things (think hinted handoff, direct asynchronous replication), we can expect that data is correctly duplicated before reads are attempted.
h2. Decision

Via a configuration option, allow optimizing blob access.
 * If enabled, perform a first read at CL one and fallback if needed by performing a second read at the regular CL
 * If disabled, only a read at the regular CL will be attempted

A metric should be implemented to track the CL one hit rate, allowing an effective review of the effectiveness of this solution.
h2. Consequences

In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
h2. Work to be conducted

Add a configuration option in cassandra.properties:
{code:bash}
# Experimental configuration option. Defaults to false.
# Enabling it resutls in reading strictly immutable (not deleted, not updated) data at CL ONE. If the data is missing,
# we can be sure that the data had not been replicated yet, a second read is performed  with a higher consistency level.
# This option still offer the same level of consistency (thanks to strict immutability) but might result in higher resource usage in case of mis-behaving replication.
# Metrics can be used to mesure the efficiency of this.
optimistic.consistency.level.enabled=false
{code}
Also apply it for messagev3 and attachmentv2 tables to benefit from it.


> CL one option for the Cassandra blob store
> ------------------------------------------
>
>                 Key: JAMES-3586
>                 URL: https://issues.apache.org/jira/browse/JAMES-3586
>             Project: James Server
>          Issue Type: Improvement
>            Reporter: René Cordier
>            Priority: Major
>             Fix For: 3.7.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Context
> Some users are storing all message content in Cassandra and thus stores huge amount of data.
> We would like to reduce the performance costs to read this large amount of data.
> Blobs being immutable, we have a guaranty that:
>  * If we read something the value is up-to date
>  * If we fail at reading something we have a guaranty the content had not been replicated yet. A second read with a higher consistency level will read the data (and consistency piggy backed on consistency levels will heal the data)
> Cassandra being very efficient at replicating things (think hinted handoff, direct asynchronous replication), we can expect that data is correctly duplicated before reads are attempted.
> h2. Decision
> Via a configuration option, allow optimizing blob access.
>  * If enabled, perform a first read at CL one and fallback if needed by performing a second read at the regular CL
>  * If disabled, only a read at the regular CL will be attempted
> A metric should be implemented to track the CL one hit rate, allowing an effective review of the effectiveness of this solution.
> h2. Consequences
> In a multiDC setup with RF=3 DC=2 this implies a factor 4 in IO reduction across the cluster, lowering a lot the read pressure on the Cassandra BlobStore.
> h2. Work to be conducted
> Add a configuration option in cassandra.properties:
> {code:bash}
> # Experimental configuration option. Defaults to false.
> # Enabling it resutls in reading strictly immutable (not deleted, not updated) data at CL ONE. If the data is missing,
> # we can be sure that the data had not been replicated yet, a second read is performed  with a higher consistency level.
> # This option still offer the same level of consistency (thanks to strict immutability) but might result in higher resource usage in case of mis-behaving replication.
> # Metrics can be used to mesure the efficiency of this.
> optimistic.consistency.level.enabled=false
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org