You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joey Lynch (Jira)" <ji...@apache.org> on 2020/04/19 21:00:05 UTC
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

    [ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087187#comment-17087187 ] 

Joey Lynch commented on CASSANDRA-15379:
----------------------------------------

Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites).

*Experimental Setup:*

A baseline and candidate cluster of EC2 machines running the following:
 * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge
 * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster
 * Baseline C* version: Latest trunk (b05fe7ab)
 * Candidate C* version: The proposed patch applied to the same version of trunk
 * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast)

In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ...

*Defaults Benchmark:*
 * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a  random load pattern.
 * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways)
 * Compaction settings: LCS with size=256MiB, fanout=20
 * Compression: LZ4 with 16 KiB block siz 

*Defaults Benchmark Results:*

We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor.

1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png]
2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png]
3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png]

Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here).

*Next steps for me:*
 * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for commit
 * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to see reduced CPU usage due to flushes. I will likely have to reduce the read/write throughput due to compactions taking a crazy amount of our on CPU time with this configuration.

> Make it possible to flush with a different compression strategy than we compact with
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15379
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/Config, Local/Memtable
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>         Attachments: 15379_coordinator_defaults.png, 15379_replica_defaults.png, 15379_system_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on some of our most dense clusters and have been observing close to 50% reduction in footprint with Zstd on some of our workloads! Unfortunately though we have been running into an issue where the flush might take so long (Zstd is slower to compress than LZ4) that we can actually block the next flush and cause instability.
> Internally we are working around this with a very simple patch which flushes SSTables as the default compression strategy (LZ4) regardless of the table params. This is a simple solution but I think the ideal solution though might be for the flush compression strategy to be configurable separately from the table compression strategy (while defaulting to the same thing). Instead of adding yet another compression option to the yaml (like hints and commitlog) I was thinking of just adding it to the table parameters and then adding a {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently supported defaults are:
> # * compression       : How are SSTables compressed in general (flush, compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 16
>   flush_compression:
>     class_name: 'LZ4Compressor'
>     parameters:
>       chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path forward to providing user specified defaults for table creation (so e.g. if a particular user wanted to use a different default chunk_length_in_kb they can do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org