You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ken Hancock <ke...@schange.com> on 2016/01/04 21:59:53 UTC

compaction_throughput_mb_per_sec

I was surprised the other day to discover that this was a cluster-wide
setting.   Why does that make sense?

In a heterogeneous cassandra deployment, say I have some old servers
running spinning disks and I'm bringing on more nodes that perhaps utilize
SSD.  I want to have different compaction throttling  on different nodes to
minimize read impact times.

I can already balance data ownership through either token allocation or
vnode counts.

Also, as I increase my node count, I technically also have to increase my
compaction_throughput which would require a rolling restart across the
cluster.

Re: compaction_throughput_mb_per_sec

Posted by Nate McCall <na...@thelastpickle.com>.

>
>> Also, as I increase my node count, I technically also have to increase my
>> compaction_throughput which would require a rolling restart across the
>> cluster.
>>
>>
> You can set compaction throughput on each node dynamically via nodetool
> setcompactionthroughput.
>
>
>
Also, the IOPS generated by your worklaod and the efficiency of the JVM
with such are what should drive compaction throughput settings. Raw node
count is orthogonal.

Re: compaction_throughput_mb_per_sec

Posted by Nate McCall <na...@thelastpickle.com>.

>
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
You can set compaction throughput on each node dynamically via nodetool
setcompactionthroughput.


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: compaction_throughput_mb_per_sec

Posted by Jack Krupansky <ja...@gmail.com>.

I forwarded a comment to the docs team.

It appears that they picked the language up from the cassandra.yaml file
itself. Looking at use of system in that file, it seems that it usually
means the node, the box running the node.

-- Jack Krupansky

On Tue, Jan 5, 2016 at 9:50 AM, Ken Hancock <ke...@schange.com> wrote:

> As to why I think it's cluster-wide, here's what the documentation says:
>
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html
> compaction_throughput_mb_per_sec
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
> (Default: 16 ) Throttles compaction to the specified total throughput
> across the entire system. The faster you insert data, the faster you need
> to compact in order to keep the SSTable count down. The recommended Value
> is 16 to 32 times the rate of write throughput (in MBs/second). Setting the
> value to 0 disables compaction throttling. Perhaps "across the entire
> system" means "across all keyspaces for this Cassandra node"?
>
> Compare the above documentation with the subsequent one which specifically
> calls out "a node":
>
> concurrent_compactors
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__concurrent_compactors>
> (Default: 1 per CPU core**) Sets the number of concurrent compaction
> processes allowed to run simultaneously on a node, not including validation
> compactions for anti-entropy repair. Simultaneous compactions help preserve
> read performance in a mixed read-write workload by mitigating the tendency
> of small SSTables to accumulate during a single long-running compaction. If
> compactions run too slowly or too fast, change
> compaction_throughput_mb_per_sec
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
> first. I always thought it was per-node and I'm guessing this is a
> documentation lack of clarity issue.
>
> On Mon, Jan 4, 2016 at 5:06 PM, Jeff Jirsa <je...@crowdstrike.com>
> wrote:
>
>> Why do you think it’s cluster wide? That param is per-node, and you can
>> change it at runtime with nodetool (or via the JMX interface using jconsole
>> to ip:7199 )
>>
>>
>>
>> From: Ken Hancock
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, January 4, 2016 at 12:59 PM
>> To: "user@cassandra.apache.org"
>> Subject: compaction_throughput_mb_per_sec
>>
>> I was surprised the other day to discover that this was a cluster-wide
>> setting.   Why does that make sense?
>>
>> In a heterogeneous cassandra deployment, say I have some old servers
>> running spinning disks and I'm bringing on more nodes that perhaps utilize
>> SSD.  I want to have different compaction throttling  on different nodes to
>> minimize read impact times.
>>
>> I can already balance data ownership through either token allocation or
>> vnode counts.
>>
>> Also, as I increase my node count, I technically also have to increase my
>> compaction_throughput which would require a rolling restart across the
>> cluster.
>>
>>
>>
>
>
>

Re: compaction_throughput_mb_per_sec

Posted by Ken Hancock <ke...@schange.com>.

Will do.  I searched the doc for additional usage of the term "system"

commitlog_segment_size_in_mb refers to "every table in the system"
concurrent_writes talks about CPU cores "in your system"

That's it for "system" other than the compaction_throughput_mb_per_sec
which refers to "across the entire system".

node is the predominant term in the yaml configuration, though I can
certainly see potential confusion with vnodes.

On Tue, Jan 5, 2016 at 2:26 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock <ke...@schange.com>
> wrote:
>
>> As to why I think it's cluster-wide, here's what the documentation says:
>>
>
> Do you see "system" used in place of "cluster" anywhere else in the docs?
>
> I think you are correct that the docs should standardize on "system"
> instead of "node", because node to me includes vnodes. "system" or "host"
> is what I think of as "the entire cassandra process".
>
> If I were you, I'd email docs AT datastaxdotcom with your feedback. :D
>
> =Rob
>
>

-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hancock@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
<http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: compaction_throughput_mb_per_sec

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock <ke...@schange.com> wrote:

> As to why I think it's cluster-wide, here's what the documentation says:
>

Do you see "system" used in place of "cluster" anywhere else in the docs?

I think you are correct that the docs should standardize on "system"
instead of "node", because node to me includes vnodes. "system" or "host"
is what I think of as "the entire cassandra process".

If I were you, I'd email docs AT datastaxdotcom with your feedback. :D

=Rob

Re: compaction_throughput_mb_per_sec

Posted by Ken Hancock <ke...@schange.com>.

As to why I think it's cluster-wide, here's what the documentation says:

https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
(Default: 16 ) Throttles compaction to the specified total throughput
across the entire system. The faster you insert data, the faster you need
to compact in order to keep the SSTable count down. The recommended Value
is 16 to 32 times the rate of write throughput (in MBs/second). Setting the
value to 0 disables compaction throttling. Perhaps "across the entire
system" means "across all keyspaces for this Cassandra node"?

Compare the above documentation with the subsequent one which specifically
calls out "a node":

concurrent_compactors
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__concurrent_compactors>
(Default: 1 per CPU core**) Sets the number of concurrent compaction
processes allowed to run simultaneously on a node, not including validation
compactions for anti-entropy repair. Simultaneous compactions help preserve
read performance in a mixed read-write workload by mitigating the tendency
of small SSTables to accumulate during a single long-running compaction. If
compactions run too slowly or too fast, change
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
first. I always thought it was per-node and I'm guessing this is a
documentation lack of clarity issue.

On Mon, Jan 4, 2016 at 5:06 PM, Jeff Jirsa <je...@crowdstrike.com>
wrote:

> Why do you think it’s cluster wide? That param is per-node, and you can
> change it at runtime with nodetool (or via the JMX interface using jconsole
> to ip:7199 )
>
>
>
> From: Ken Hancock
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, January 4, 2016 at 12:59 PM
> To: "user@cassandra.apache.org"
> Subject: compaction_throughput_mb_per_sec
>
> I was surprised the other day to discover that this was a cluster-wide
> setting.   Why does that make sense?
>
> In a heterogeneous cassandra deployment, say I have some old servers
> running spinning disks and I'm bringing on more nodes that perhaps utilize
> SSD.  I want to have different compaction throttling  on different nodes to
> minimize read impact times.
>
> I can already balance data ownership through either token allocation or
> vnode counts.
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
>

Re: compaction_throughput_mb_per_sec

Posted by Jeff Jirsa <je...@crowdstrike.com>.

Why do you think it’s cluster wide? That param is per-node, and you can change it at runtime with nodetool (or via the JMX interface using jconsole to ip:7199 )

From:  Ken Hancock
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, January 4, 2016 at 12:59 PM
To:  "user@cassandra.apache.org"
Subject:  compaction_throughput_mb_per_sec

I was surprised the other day to discover that this was a cluster-wide setting.   Why does that make sense?

In a heterogeneous cassandra deployment, say I have some old servers running spinning disks and I'm bringing on more nodes that perhaps utilize SSD.  I want to have different compaction throttling  on different nodes to minimize read impact times.

I can already balance data ownership through either token allocation or vnode counts. 

Also, as I increase my node count, I technically also have to increase my compaction_throughput which would require a rolling restart across the cluster.

Re: compaction_throughput_mb_per_sec

Posted by Carl Yeksigian <ca...@yeksigian.com>.

This is set in the cassandra.yaml on each node independently; it doesn't
have to be same cluster-wide.

On Mon, Jan 4, 2016 at 3:59 PM, Ken Hancock <ke...@schange.com> wrote:

> I was surprised the other day to discover that this was a cluster-wide
> setting.   Why does that make sense?
>
> In a heterogeneous cassandra deployment, say I have some old servers
> running spinning disks and I'm bringing on more nodes that perhaps utilize
> SSD.  I want to have different compaction throttling  on different nodes to
> minimize read impact times.
>
> I can already balance data ownership through either token allocation or
> vnode counts.
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
>