You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (Jira)" <ji...@apache.org> on 2020/12/08 00:43:00 UTC
[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml

     [ https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-16315:
-------------------------------------
    Description: 
Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of {{concurrent_compactors}}.  However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple.

  was:
Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using fast SSD, you can increase the number of {{concurrent_compactors}}.  However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple.


> Remove bad advice on concurrent compactors from cassandra.yaml
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-16315
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Jeremy Hanna
>            Priority: Normal
>
> Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors:
> {code}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> {code}
> However in practice there are a number of problems with this.  While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism.
> This has caused problems for those who have taken this advice literally.
> I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults.
> See also CASSANDRA-7139 for a discussion on considerations.
> I see two short-term options to avoid new user pain:
> 1. Change the language to say something like this:
> {quote}
> When using SSD based storage, you can increase the number of {{concurrent_compactors}}.  However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions.  It's best to test and measure with your workload and hardware.
> {quote}
> 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org