You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anishek Agarwal <an...@gmail.com> on 2015/04/21 13:04:00 UTC

LCS Strategy, compaction pending tasks keep increasing

Hello,

I am inserting about 100 million entries via datastax-java driver to a
cassandra cluster of 3 nodes.

Table structure is as

create keyspace test with replication = {'class':
'NetworkTopologyStrategy', 'DC' : 3};

CREATE TABLE test_bits(id bigint primary key , some_bits text) with
gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
and compression={'sstable_compression' : ''};

have 75 threads that are inserting data into the above table with each
thread having non over lapping keys.

I see that the number of pending tasks via "nodetool compactionstats" keeps
increasing and looks like from "nodetool cfstats test.test_bits" has
SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],

Why is compaction not kicking in ?

thanks
anishek

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
I am on version 2.0.14, will update once i get the stats up for the writes
again


On Tue, Apr 21, 2015 at 4:46 PM, Carlos Rolo <ro...@pythian.com> wrote:

> Are you on version 2.1.x?
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> the "some_bits" column has about 14-15 bytes of data per key.
>>
>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am inserting about 100 million entries via datastax-java driver to a
>>> cassandra cluster of 3 nodes.
>>>
>>> Table structure is as
>>>
>>> create keyspace test with replication = {'class':
>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>
>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>> and compression={'sstable_compression' : ''};
>>>
>>> have 75 threads that are inserting data into the above table with each
>>> thread having non over lapping keys.
>>>
>>> I see that the number of pending tasks via "nodetool compactionstats"
>>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>
>>> Why is compaction not kicking in ?
>>>
>>> thanks
>>> anishek
>>>
>>
>>
>
> --
>
>
>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Carlos Rolo <ro...@pythian.com>.
Are you on version 2.1.x?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com> wrote:

> the "some_bits" column has about 14-15 bytes of data per key.
>
> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am inserting about 100 million entries via datastax-java driver to a
>> cassandra cluster of 3 nodes.
>>
>> Table structure is as
>>
>> create keyspace test with replication = {'class':
>> 'NetworkTopologyStrategy', 'DC' : 3};
>>
>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>> and compression={'sstable_compression' : ''};
>>
>> have 75 threads that are inserting data into the above table with each
>> thread having non over lapping keys.
>>
>> I see that the number of pending tasks via "nodetool compactionstats"
>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>
>> Why is compaction not kicking in ?
>>
>> thanks
>> anishek
>>
>
>

-- 


--




Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Brice Dutheil <br...@gmail.com>.
Reads are mostly limited by IO so I’d set concurrent_read to something
related to your disks, we have set it to 64 (yet we have SSDs)
Writes are mostly limited by CPU, so the number of cores matter, we set
concurrent_read to 48 and 128 (depending on the CPU on the nodes)

Careful with LCS it is not recommended for write heavy workload. LCS is
good to optimize reads, in that it avoids t read many SSTables.
​

-- Brice

On Wed, Apr 22, 2015 at 6:53 AM, Anishek Agarwal <an...@gmail.com> wrote:

> Thanks Brice for the input,
>
> I am confused as to how to calculate the value of concurrent_read,
> following is what i found recommended on sites and in configuration docs.
>
> concurrent_read : some places its 16 X number of drives or 4 X number of
> cores
> which of the above should i pick ?  i have 40 core cpu with 3 disks(non
> ssd) one used for commitlog and other two for data directories, I am having
> 3 nodes in my cluster.
>
> I think there are tools out there that allow the max write speed to disk,
> i am going to run them too to find out the write throughput i can get to
> see that i am not trying to overachieve something, currently we are stuck
> at 35MBps
>
> @Sebastian
> the concurrent_compactors is at default value of 32 for us and i think
> that should be fine.
> Since we had lot of cores i thought it would be better to use multithreaded_compaction
> but i think i will try one set with it turned off again.
>
> Question is still,
>
> how do i find what write load should i aim for per node such that it is
> able to compact data while inserting, is it just try and error ? or there
> is a certain QPS i can target for per node ?
>
> Our business case is
> -- new client comes and create a new keyspace for him, initially there
> will be lots of new keys ( i think size tired might work better here)
> -- as time progresses we are going to update the existing keys very
> frequently ( i think LCS will work better here -- we are going with this
> strategy for long term benefit)
>
>
>
>
> On Wed, Apr 22, 2015 at 4:17 AM, Brice Dutheil <br...@gmail.com>
> wrote:
>
>> Yes I was referring referring to multithreaded_compaction, but just
>> because we didn’t get bitten by this setting just doesn’t mean it’s right,
>> and the jira is a clear indication of that ;)
>>
>> @Anishek that reminds me of these settings to look at as well:
>>
>>    - concurrent_write and concurrent_read both need to be adapted to
>>    your actual hardware though.
>>
>>  Cassandra is, more often than not, disk constrained though this can
>> change for some workloads with SSD’s.
>>
>> Yes that is typically the case, SSDs are more and more commons but so are
>> multi-core CPUs and the trend to multiple cores is not going to stop ; just
>> look at the next Intel *flagship* : Knights Landing
>> <http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed>
>> => *72 cores*.
>>
>> Nowadays it is not rare to have boxes with multicore CPU, either way if
>> they are not used because of some IO bottleneck there’s no reason to be
>> licensed for that, and if IO is not an issue the CPUs are most probably
>> next in line. While node is much more about a combination of that plus much
>> more added value like the linear scaling of Cassandra. And I’m not even
>> listing the other nifty integration that DSE ships in.
>>
>> But on this matter I believe we shouldn’t hijack the original thread
>> purpose.
>>
>> — Brice
>>
>> On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez
>> [sebastian.estevez@datastax.com](mailto:sebastian.estevez@datastax.com)
>> <http://mailto:%5Bsebastian.estevez@datastax.com%5D(mailto:sebastian.estevez@datastax.com)>
>> wrote:
>>
>> I want to draw a distinction between a) multithreaded compaction (the
>>> jira I just pointed to) and b) concurrent_compactors. I'm not clear on
>>> which one you are recommending at this stage.
>>>
>>> a) Multithreaded compaction is what I warned against in my last note. b)
>>> Concurrent compactors is the number of separate compaction tasks (on
>>> different tables) that can run simultaneously. You can crank this up
>>> without much risk though the old default of num cores was too aggressive
>>> (CASSANDRA-7139). 2 seems to be the sweet-spot.
>>>
>>> Cassandra is, more often than not, disk constrained though this can
>>> change for some workloads with SSD's.
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>> <http://cassandrasummit-datastax.com/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil <br...@gmail.com>
>>> wrote:
>>>
>>>> Oh, thank you Sebastian for this input and the ticket reference !
>>>> We did notice an increase in CPU usage, but kept the concurrent
>>>> compaction low enough for our usage, by default it takes the number of
>>>> cores. We did use a number up to 30% of our available cores. But under
>>>> heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper
>>>> threaded cores per node.
>>>>
>>>> In a related topic : I’m a bit concerned by datastax communication,
>>>> usually people talk about IO as being the weak spot, but in our case it’s
>>>> more about CPU. Fortunately the Moore law doesn’t really apply anymore
>>>> vertically, now we have have multi core processors *and* the trend is
>>>> going that way. Yet Datastax terms feels a bit *antiquated* and maybe
>>>> a bit too much Oracle-y : http://www.datastax.com/enterprise-terms
>>>> Node licensing is more appropriate for this century.
>>>> ​
>>>>
>>>> -- Brice
>>>>
>>>> On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
>>>> sebastian.estevez@datastax.com> wrote:
>>>>
>>>>> Do not enable multithreaded compaction. Overhead usually outweighs any
>>>>> benefit. It's removed in 2.1 because it harms more than helps:
>>>>>
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6142
>>>>>
>>>>> All the best,
>>>>>
>>>>>
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> Sebastián Estévez
>>>>>
>>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>>>
>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax>
>>>>>
>>>>> <http://cassandrasummit-datastax.com/>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <
>>>>> brice.dutheil@gmail.com> wrote:
>>>>>
>>>>>> I’m not sure I get everything about storm stuff, but my understanding
>>>>>> of LCS is that compaction count may increase the more one update data
>>>>>> (that’s why I was wondering about duplicate primary keys).
>>>>>>
>>>>>> Another option is that the code is sending too much write request/s
>>>>>> to the cassandra cluster. I don’t know haw many nodes you have, but the
>>>>>> less node there is the more compactions.
>>>>>> Also I’d look at the CPU / load, maybe the config is too
>>>>>> *restrictive*, look at the following properties in the cassandra.yaml
>>>>>>
>>>>>>    - compaction_throughput_mb_per_sec, by default the value is 16,
>>>>>>    you may want to increase it but be careful on mechanical drives, if already
>>>>>>    in SSD IO is rarely the issue, we have 64 (with SSDs)
>>>>>>    - multithreaded_compaction by default it is false, we enabled it.
>>>>>>
>>>>>> Compaction thread are niced, so it shouldn’t be much an issue for
>>>>>> serving production r/w requests. But you never know, always keep an eye on
>>>>>> IO and CPU.
>>>>>>
>>>>>> — Brice
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> sorry i take that back we will modify different keys across threads
>>>>>>> not the same key, our storm topology is going to use field grouping to get
>>>>>>> updates for same keys to same set of bolts.
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>>>>>> range with no overlaps this does not seem to be the case now. However we
>>>>>>>> will have to test where we have to modify the same key across threads -- do
>>>>>>>> u think that will cause a problem ? As far as i have read LCS is
>>>>>>>> recommended for such cases. should i just switch back to
>>>>>>>> SizeTiredCompactionStrategy.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <
>>>>>>>> brice.dutheil@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>>>>>
>>>>>>>>> -- Brice
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <
>>>>>>>>> krummas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata
>>>>>>>>>> gives you sstable level information
>>>>>>>>>>
>>>>>>>>>> and, it is also likely that since you get so many L0 sstables,
>>>>>>>>>> you will be doing size tiered compaction in L0 for a while.
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <
>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> @Marcus I did look and that is where i got the above but it
>>>>>>>>>>> doesnt show any detail about moving from L0 -L1 any specific arguments i
>>>>>>>>>>> should try with ?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <
>>>>>>>>>>> krummas@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> you need to look at nodetool compactionstats - there is
>>>>>>>>>>>> probably a big L0 -> L1 compaction going on that blocks other compactions
>>>>>>>>>>>> from starting
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <
>>>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am inserting about 100 million entries via datastax-java
>>>>>>>>>>>>>> driver to a cassandra cluster of 3 nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Table structure is as
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits
>>>>>>>>>>>>>> text) with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> have 75 threads that are inserting data into the above table
>>>>>>>>>>>>>> with each thread having non over lapping keys.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>> anishek
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>  ​
>>>>>>
>>>>>
>>>>>
>>>>
>>>  ​
>>
>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
Thanks Brice for the input,

I am confused as to how to calculate the value of concurrent_read,
following is what i found recommended on sites and in configuration docs.

concurrent_read : some places its 16 X number of drives or 4 X number of
cores
which of the above should i pick ?  i have 40 core cpu with 3 disks(non
ssd) one used for commitlog and other two for data directories, I am having
3 nodes in my cluster.

I think there are tools out there that allow the max write speed to disk, i
am going to run them too to find out the write throughput i can get to see
that i am not trying to overachieve something, currently we are stuck at
35MBps

@Sebastian
the concurrent_compactors is at default value of 32 for us and i think that
should be fine.
Since we had lot of cores i thought it would be better to use
multithreaded_compaction
but i think i will try one set with it turned off again.

Question is still,

how do i find what write load should i aim for per node such that it is
able to compact data while inserting, is it just try and error ? or there
is a certain QPS i can target for per node ?

Our business case is
-- new client comes and create a new keyspace for him, initially there will
be lots of new keys ( i think size tired might work better here)
-- as time progresses we are going to update the existing keys very
frequently ( i think LCS will work better here -- we are going with this
strategy for long term benefit)




On Wed, Apr 22, 2015 at 4:17 AM, Brice Dutheil <br...@gmail.com>
wrote:

> Yes I was referring referring to multithreaded_compaction, but just
> because we didn’t get bitten by this setting just doesn’t mean it’s right,
> and the jira is a clear indication of that ;)
>
> @Anishek that reminds me of these settings to look at as well:
>
>    - concurrent_write and concurrent_read both need to be adapted to your
>    actual hardware though.
>
>  Cassandra is, more often than not, disk constrained though this can
> change for some workloads with SSD’s.
>
> Yes that is typically the case, SSDs are more and more commons but so are
> multi-core CPUs and the trend to multiple cores is not going to stop ; just
> look at the next Intel *flagship* : Knights Landing
> <http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed>
> => *72 cores*.
>
> Nowadays it is not rare to have boxes with multicore CPU, either way if
> they are not used because of some IO bottleneck there’s no reason to be
> licensed for that, and if IO is not an issue the CPUs are most probably
> next in line. While node is much more about a combination of that plus much
> more added value like the linear scaling of Cassandra. And I’m not even
> listing the other nifty integration that DSE ships in.
>
> But on this matter I believe we shouldn’t hijack the original thread
> purpose.
>
> — Brice
>
> On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez
> [sebastian.estevez@datastax.com](mailto:sebastian.estevez@datastax.com)
> <http://mailto:%5Bsebastian.estevez@datastax.com%5D(mailto:sebastian.estevez@datastax.com)>
> wrote:
>
> I want to draw a distinction between a) multithreaded compaction (the jira
>> I just pointed to) and b) concurrent_compactors. I'm not clear on which one
>> you are recommending at this stage.
>>
>> a) Multithreaded compaction is what I warned against in my last note. b)
>> Concurrent compactors is the number of separate compaction tasks (on
>> different tables) that can run simultaneously. You can crank this up
>> without much risk though the old default of num cores was too aggressive
>> (CASSANDRA-7139). 2 seems to be the sweet-spot.
>>
>> Cassandra is, more often than not, disk constrained though this can
>> change for some workloads with SSD's.
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>> <http://cassandrasummit-datastax.com/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil <br...@gmail.com>
>> wrote:
>>
>>> Oh, thank you Sebastian for this input and the ticket reference !
>>> We did notice an increase in CPU usage, but kept the concurrent
>>> compaction low enough for our usage, by default it takes the number of
>>> cores. We did use a number up to 30% of our available cores. But under
>>> heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper
>>> threaded cores per node.
>>>
>>> In a related topic : I’m a bit concerned by datastax communication,
>>> usually people talk about IO as being the weak spot, but in our case it’s
>>> more about CPU. Fortunately the Moore law doesn’t really apply anymore
>>> vertically, now we have have multi core processors *and* the trend is
>>> going that way. Yet Datastax terms feels a bit *antiquated* and maybe a
>>> bit too much Oracle-y : http://www.datastax.com/enterprise-terms
>>> Node licensing is more appropriate for this century.
>>> ​
>>>
>>> -- Brice
>>>
>>> On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
>>> sebastian.estevez@datastax.com> wrote:
>>>
>>>> Do not enable multithreaded compaction. Overhead usually outweighs any
>>>> benefit. It's removed in 2.1 because it harms more than helps:
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-6142
>>>>
>>>> All the best,
>>>>
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Sebastián Estévez
>>>>
>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>>
>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax>
>>>>
>>>> <http://cassandrasummit-datastax.com/>
>>>>
>>>> DataStax is the fastest, most scalable distributed database
>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>> scalable to any size. With more than 500 customers in 45 countries, DataStax
>>>> is the database technology and transactional backbone of choice for the
>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <brice.dutheil@gmail.com
>>>> > wrote:
>>>>
>>>>> I’m not sure I get everything about storm stuff, but my understanding
>>>>> of LCS is that compaction count may increase the more one update data
>>>>> (that’s why I was wondering about duplicate primary keys).
>>>>>
>>>>> Another option is that the code is sending too much write request/s to
>>>>> the cassandra cluster. I don’t know haw many nodes you have, but the less
>>>>> node there is the more compactions.
>>>>> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
>>>>> look at the following properties in the cassandra.yaml
>>>>>
>>>>>    - compaction_throughput_mb_per_sec, by default the value is 16,
>>>>>    you may want to increase it but be careful on mechanical drives, if already
>>>>>    in SSD IO is rarely the issue, we have 64 (with SSDs)
>>>>>    - multithreaded_compaction by default it is false, we enabled it.
>>>>>
>>>>> Compaction thread are niced, so it shouldn’t be much an issue for
>>>>> serving production r/w requests. But you never know, always keep an eye on
>>>>> IO and CPU.
>>>>>
>>>>> — Brice
>>>>>
>>>>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> sorry i take that back we will modify different keys across threads
>>>>>> not the same key, our storm topology is going to use field grouping to get
>>>>>> updates for same keys to same set of bolts.
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>>>>> range with no overlaps this does not seem to be the case now. However we
>>>>>>> will have to test where we have to modify the same key across threads -- do
>>>>>>> u think that will cause a problem ? As far as i have read LCS is
>>>>>>> recommended for such cases. should i just switch back to
>>>>>>> SizeTiredCompactionStrategy.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <
>>>>>>> brice.dutheil@gmail.com> wrote:
>>>>>>>
>>>>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>>>>
>>>>>>>> -- Brice
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <krummas@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata
>>>>>>>>> gives you sstable level information
>>>>>>>>>
>>>>>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <
>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> @Marcus I did look and that is where i got the above but it
>>>>>>>>>> doesnt show any detail about moving from L0 -L1 any specific arguments i
>>>>>>>>>> should try with ?
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <
>>>>>>>>>> krummas@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> you need to look at nodetool compactionstats - there is probably
>>>>>>>>>>> a big L0 -> L1 compaction going on that blocks other compactions from
>>>>>>>>>>> starting
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <
>>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am inserting about 100 million entries via datastax-java
>>>>>>>>>>>>> driver to a cassandra cluster of 3 nodes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Table structure is as
>>>>>>>>>>>>>
>>>>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>>>>
>>>>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>>>>>
>>>>>>>>>>>>> have 75 threads that are inserting data into the above table
>>>>>>>>>>>>> with each thread having non over lapping keys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>> anishek
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>  ​
>>>>>
>>>>
>>>>
>>>
>>  ​
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Brice Dutheil <br...@gmail.com>.
Yes I was referring referring to multithreaded_compaction, but just because
we didn’t get bitten by this setting just doesn’t mean it’s right, and the
jira is a clear indication of that ;)

@Anishek that reminds me of these settings to look at as well:

   - concurrent_write and concurrent_read both need to be adapted to your
   actual hardware though.

 Cassandra is, more often than not, disk constrained though this can change
for some workloads with SSD’s.

Yes that is typically the case, SSDs are more and more commons but so are
multi-core CPUs and the trend to multiple cores is not going to stop ; just
look at the next Intel *flagship* : Knights Landing
<http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed>
=> *72 cores*.

Nowadays it is not rare to have boxes with multicore CPU, either way if
they are not used because of some IO bottleneck there’s no reason to be
licensed for that, and if IO is not an issue the CPUs are most probably
next in line. While node is much more about a combination of that plus much
more added value like the linear scaling of Cassandra. And I’m not even
listing the other nifty integration that DSE ships in.

But on this matter I believe we shouldn’t hijack the original thread
purpose.

— Brice

On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez
[sebastian.estevez@datastax.com](mailto:sebastian.estevez@datastax.com)
<http://mailto:[sebastian.estevez@datastax.com](mailto:sebastian.estevez@datastax.com)>
wrote:

I want to draw a distinction between a) multithreaded compaction (the jira
> I just pointed to) and b) concurrent_compactors. I'm not clear on which one
> you are recommending at this stage.
>
> a) Multithreaded compaction is what I warned against in my last note. b)
> Concurrent compactors is the number of separate compaction tasks (on
> different tables) that can run simultaneously. You can crank this up
> without much risk though the old default of num cores was too aggressive
> (CASSANDRA-7139). 2 seems to be the sweet-spot.
>
> Cassandra is, more often than not, disk constrained though this can change
> for some workloads with SSD's.
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
> <http://cassandrasummit-datastax.com/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil <br...@gmail.com>
> wrote:
>
>> Oh, thank you Sebastian for this input and the ticket reference !
>> We did notice an increase in CPU usage, but kept the concurrent
>> compaction low enough for our usage, by default it takes the number of
>> cores. We did use a number up to 30% of our available cores. But under
>> heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper
>> threaded cores per node.
>>
>> In a related topic : I’m a bit concerned by datastax communication,
>> usually people talk about IO as being the weak spot, but in our case it’s
>> more about CPU. Fortunately the Moore law doesn’t really apply anymore
>> vertically, now we have have multi core processors *and* the trend is
>> going that way. Yet Datastax terms feels a bit *antiquated* and maybe a
>> bit too much Oracle-y : http://www.datastax.com/enterprise-terms
>> Node licensing is more appropriate for this century.
>> ​
>>
>> -- Brice
>>
>> On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
>> sebastian.estevez@datastax.com> wrote:
>>
>>> Do not enable multithreaded compaction. Overhead usually outweighs any
>>> benefit. It's removed in 2.1 because it harms more than helps:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6142
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>> <http://cassandrasummit-datastax.com/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <br...@gmail.com>
>>> wrote:
>>>
>>>> I’m not sure I get everything about storm stuff, but my understanding
>>>> of LCS is that compaction count may increase the more one update data
>>>> (that’s why I was wondering about duplicate primary keys).
>>>>
>>>> Another option is that the code is sending too much write request/s to
>>>> the cassandra cluster. I don’t know haw many nodes you have, but the less
>>>> node there is the more compactions.
>>>> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
>>>> look at the following properties in the cassandra.yaml
>>>>
>>>>    - compaction_throughput_mb_per_sec, by default the value is 16, you
>>>>    may want to increase it but be careful on mechanical drives, if already in
>>>>    SSD IO is rarely the issue, we have 64 (with SSDs)
>>>>    - multithreaded_compaction by default it is false, we enabled it.
>>>>
>>>> Compaction thread are niced, so it shouldn’t be much an issue for
>>>> serving production r/w requests. But you never know, always keep an eye on
>>>> IO and CPU.
>>>>
>>>> — Brice
>>>>
>>>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
>>>> wrote:
>>>>
>>>> sorry i take that back we will modify different keys across threads not
>>>>> the same key, our storm topology is going to use field grouping to get
>>>>> updates for same keys to same set of bolts.
>>>>>
>>>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>>>> range with no overlaps this does not seem to be the case now. However we
>>>>>> will have to test where we have to modify the same key across threads -- do
>>>>>> u think that will cause a problem ? As far as i have read LCS is
>>>>>> recommended for such cases. should i just switch back to
>>>>>> SizeTiredCompactionStrategy.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <
>>>>>> brice.dutheil@gmail.com> wrote:
>>>>>>
>>>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>>>
>>>>>>> -- Brice
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata
>>>>>>>> gives you sstable level information
>>>>>>>>
>>>>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <anishek@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>>>>>> show any detail about moving from L0 -L1 any specific arguments i should
>>>>>>>>> try with ?
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <
>>>>>>>>> krummas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> you need to look at nodetool compactionstats - there is probably
>>>>>>>>>> a big L0 -> L1 compaction going on that blocks other compactions from
>>>>>>>>>> starting
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <
>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am inserting about 100 million entries via datastax-java
>>>>>>>>>>>> driver to a cassandra cluster of 3 nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Table structure is as
>>>>>>>>>>>>
>>>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>>>
>>>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>>>>
>>>>>>>>>>>> have 75 threads that are inserting data into the above table
>>>>>>>>>>>> with each thread having non over lapping keys.
>>>>>>>>>>>>
>>>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>>>>
>>>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks
>>>>>>>>>>>> anishek
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>  ​
>>>>
>>>
>>>
>>
>  ​

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Sebastian Estevez <se...@datastax.com>.
I want to draw a distinction between a) multithreaded compaction (the jira
I just pointed to) and b) concurrent_compactors. I'm not clear on which one
you are recommending at this stage.

a) Multithreaded compaction is what I warned against in my last note. b)
Concurrent compactors is the number of separate compaction tasks (on
different tables) that can run simultaneously. You can crank this up
without much risk though the old default of num cores was too aggressive
(CASSANDRA-7139). 2 seems to be the sweet-spot.

Cassandra is, more often than not, disk constrained though this can change
for some workloads with SSD's.


All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil <br...@gmail.com>
wrote:

> Oh, thank you Sebastian for this input and the ticket reference !
> We did notice an increase in CPU usage, but kept the concurrent compaction
> low enough for our usage, by default it takes the number of cores. We did
> use a number up to 30% of our available cores. But under heavy load clearly
> CPU is the bottleneck and we have 2 CPU with 8 hyper threaded cores per
> node.
>
> In a related topic : I’m a bit concerned by datastax communication,
> usually people talk about IO as being the weak spot, but in our case it’s
> more about CPU. Fortunately the Moore law doesn’t really apply anymore
> vertically, now we have have multi core processors *and* the trend is
> going that way. Yet Datastax terms feels a bit *antiquated* and maybe a
> bit too much Oracle-y : http://www.datastax.com/enterprise-terms
> Node licensing is more appropriate for this century.
> ​
>
> -- Brice
>
> On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
> sebastian.estevez@datastax.com> wrote:
>
>> Do not enable multithreaded compaction. Overhead usually outweighs any
>> benefit. It's removed in 2.1 because it harms more than helps:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6142
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>> <http://cassandrasummit-datastax.com/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <br...@gmail.com>
>> wrote:
>>
>>> I’m not sure I get everything about storm stuff, but my understanding of
>>> LCS is that compaction count may increase the more one update data (that’s
>>> why I was wondering about duplicate primary keys).
>>>
>>> Another option is that the code is sending too much write request/s to
>>> the cassandra cluster. I don’t know haw many nodes you have, but the less
>>> node there is the more compactions.
>>> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
>>> look at the following properties in the cassandra.yaml
>>>
>>>    - compaction_throughput_mb_per_sec, by default the value is 16, you
>>>    may want to increase it but be careful on mechanical drives, if already in
>>>    SSD IO is rarely the issue, we have 64 (with SSDs)
>>>    - multithreaded_compaction by default it is false, we enabled it.
>>>
>>> Compaction thread are niced, so it shouldn’t be much an issue for
>>> serving production r/w requests. But you never know, always keep an eye on
>>> IO and CPU.
>>>
>>> — Brice
>>>
>>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
>>> wrote:
>>>
>>> sorry i take that back we will modify different keys across threads not
>>>> the same key, our storm topology is going to use field grouping to get
>>>> updates for same keys to same set of bolts.
>>>>
>>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>>> range with no overlaps this does not seem to be the case now. However we
>>>>> will have to test where we have to modify the same key across threads -- do
>>>>> u think that will cause a problem ? As far as i have read LCS is
>>>>> recommended for such cases. should i just switch back to
>>>>> SizeTiredCompactionStrategy.
>>>>>
>>>>>
>>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <
>>>>> brice.dutheil@gmail.com> wrote:
>>>>>
>>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>>
>>>>>> -- Brice
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives
>>>>>>> you sstable level information
>>>>>>>
>>>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>>>>> show any detail about moving from L0 -L1 any specific arguments i should
>>>>>>>> try with ?
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <krummas@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> you need to look at nodetool compactionstats - there is probably a
>>>>>>>>> big L0 -> L1 compaction going on that blocks other compactions from starting
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <
>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am inserting about 100 million entries via datastax-java
>>>>>>>>>>> driver to a cassandra cluster of 3 nodes.
>>>>>>>>>>>
>>>>>>>>>>> Table structure is as
>>>>>>>>>>>
>>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>>>
>>>>>>>>>>> have 75 threads that are inserting data into the above table
>>>>>>>>>>> with each thread having non over lapping keys.
>>>>>>>>>>>
>>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>>>
>>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>> anishek
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>  ​
>>>
>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Brice Dutheil <br...@gmail.com>.
Oh, thank you Sebastian for this input and the ticket reference !
We did notice an increase in CPU usage, but kept the concurrent compaction
low enough for our usage, by default it takes the number of cores. We did
use a number up to 30% of our available cores. But under heavy load clearly
CPU is the bottleneck and we have 2 CPU with 8 hyper threaded cores per
node.

In a related topic : I’m a bit concerned by datastax communication, usually
people talk about IO as being the weak spot, but in our case it’s more
about CPU. Fortunately the Moore law doesn’t really apply anymore
vertically, now we have have multi core processors *and* the trend is going
that way. Yet Datastax terms feels a bit *antiquated* and maybe a bit too
much Oracle-y : http://www.datastax.com/enterprise-terms
Node licensing is more appropriate for this century.
​

-- Brice

On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> Do not enable multithreaded compaction. Overhead usually outweighs any
> benefit. It's removed in 2.1 because it harms more than helps:
>
> https://issues.apache.org/jira/browse/CASSANDRA-6142
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
> <http://cassandrasummit-datastax.com/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <br...@gmail.com>
> wrote:
>
>> I’m not sure I get everything about storm stuff, but my understanding of
>> LCS is that compaction count may increase the more one update data (that’s
>> why I was wondering about duplicate primary keys).
>>
>> Another option is that the code is sending too much write request/s to
>> the cassandra cluster. I don’t know haw many nodes you have, but the less
>> node there is the more compactions.
>> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
>> look at the following properties in the cassandra.yaml
>>
>>    - compaction_throughput_mb_per_sec, by default the value is 16, you
>>    may want to increase it but be careful on mechanical drives, if already in
>>    SSD IO is rarely the issue, we have 64 (with SSDs)
>>    - multithreaded_compaction by default it is false, we enabled it.
>>
>> Compaction thread are niced, so it shouldn’t be much an issue for
>> serving production r/w requests. But you never know, always keep an eye on
>> IO and CPU.
>>
>> — Brice
>>
>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>> sorry i take that back we will modify different keys across threads not
>>> the same key, our storm topology is going to use field grouping to get
>>> updates for same keys to same set of bolts.
>>>
>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>>> wrote:
>>>
>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>> range with no overlaps this does not seem to be the case now. However we
>>>> will have to test where we have to modify the same key across threads -- do
>>>> u think that will cause a problem ? As far as i have read LCS is
>>>> recommended for such cases. should i just switch back to
>>>> SizeTiredCompactionStrategy.
>>>>
>>>>
>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <brice.dutheil@gmail.com
>>>> > wrote:
>>>>
>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>
>>>>> -- Brice
>>>>>
>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives
>>>>>> you sstable level information
>>>>>>
>>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>>>> show any detail about moving from L0 -L1 any specific arguments i should
>>>>>>> try with ?
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> you need to look at nodetool compactionstats - there is probably a
>>>>>>>> big L0 -> L1 compaction going on that blocks other compactions from starting
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <anishek@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am inserting about 100 million entries via datastax-java driver
>>>>>>>>>> to a cassandra cluster of 3 nodes.
>>>>>>>>>>
>>>>>>>>>> Table structure is as
>>>>>>>>>>
>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>
>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>>
>>>>>>>>>> have 75 threads that are inserting data into the above table with
>>>>>>>>>> each thread having non over lapping keys.
>>>>>>>>>>
>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>>
>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>> anishek
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>  ​
>>
>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Sebastian Estevez <se...@datastax.com>.
Do not enable multithreaded compaction. Overhead usually outweighs any
benefit. It's removed in 2.1 because it harms more than helps:

https://issues.apache.org/jira/browse/CASSANDRA-6142

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <br...@gmail.com>
wrote:

> I’m not sure I get everything about storm stuff, but my understanding of
> LCS is that compaction count may increase the more one update data (that’s
> why I was wondering about duplicate primary keys).
>
> Another option is that the code is sending too much write request/s to the
> cassandra cluster. I don’t know haw many nodes you have, but the less node
> there is the more compactions.
> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
> look at the following properties in the cassandra.yaml
>
>    - compaction_throughput_mb_per_sec, by default the value is 16, you
>    may want to increase it but be careful on mechanical drives, if already in
>    SSD IO is rarely the issue, we have 64 (with SSDs)
>    - multithreaded_compaction by default it is false, we enabled it.
>
> Compaction thread are niced, so it shouldn’t be much an issue for serving
> production r/w requests. But you never know, always keep an eye on IO and
> CPU.
>
> — Brice
>
> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
> sorry i take that back we will modify different keys across threads not
>> the same key, our storm topology is going to use field grouping to get
>> updates for same keys to same set of bolts.
>>
>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> @Bruice : I dont think so as i am giving each thread a specific key
>>> range with no overlaps this does not seem to be the case now. However we
>>> will have to test where we have to modify the same key across threads -- do
>>> u think that will cause a problem ? As far as i have read LCS is
>>> recommended for such cases. should i just switch back to
>>> SizeTiredCompactionStrategy.
>>>
>>>
>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <br...@gmail.com>
>>> wrote:
>>>
>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>
>>>> -- Brice
>>>>
>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives
>>>>> you sstable level information
>>>>>
>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>
>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>>> show any detail about moving from L0 -L1 any specific arguments i should
>>>>>> try with ?
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> you need to look at nodetool compactionstats - there is probably a
>>>>>>> big L0 -> L1 compaction going on that blocks other compactions from starting
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <anishek@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am inserting about 100 million entries via datastax-java driver
>>>>>>>>> to a cassandra cluster of 3 nodes.
>>>>>>>>>
>>>>>>>>> Table structure is as
>>>>>>>>>
>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>
>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>> 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''};
>>>>>>>>>
>>>>>>>>> have 75 threads that are inserting data into the above table with
>>>>>>>>> each thread having non over lapping keys.
>>>>>>>>>
>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>>
>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>
>>>>>>>>> thanks
>>>>>>>>> anishek
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>  ​
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Brice Dutheil <br...@gmail.com>.
I’m not sure I get everything about storm stuff, but my understanding of
LCS is that compaction count may increase the more one update data (that’s
why I was wondering about duplicate primary keys).

Another option is that the code is sending too much write request/s to the
cassandra cluster. I don’t know haw many nodes you have, but the less node
there is the more compactions.
Also I’d look at the CPU / load, maybe the config is too *restrictive*,
look at the following properties in the cassandra.yaml

   - compaction_throughput_mb_per_sec, by default the value is 16, you may
   want to increase it but be careful on mechanical drives, if already in SSD
   IO is rarely the issue, we have 64 (with SSDs)
   - multithreaded_compaction by default it is false, we enabled it.

Compaction thread are niced, so it shouldn’t be much an issue for serving
production r/w requests. But you never know, always keep an eye on IO and
CPU.

— Brice

On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <an...@gmail.com> wrote:

sorry i take that back we will modify different keys across threads not the
> same key, our storm topology is going to use field grouping to get updates
> for same keys to same set of bolts.
>
> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> @Bruice : I dont think so as i am giving each thread a specific key range
>> with no overlaps this does not seem to be the case now. However we will
>> have to test where we have to modify the same key across threads -- do u
>> think that will cause a problem ? As far as i have read LCS is recommended
>> for such cases. should i just switch back to SizeTiredCompactionStrategy.
>>
>>
>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <br...@gmail.com>
>> wrote:
>>
>>> Could it that the app is inserting _duplicate_ keys ?
>>>
>>> -- Brice
>>>
>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>>> wrote:
>>>
>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives
>>>> you sstable level information
>>>>
>>>> and, it is also likely that since you get so many L0 sstables, you will
>>>> be doing size tiered compaction in L0 for a while.
>>>>
>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>> show any detail about moving from L0 -L1 any specific arguments i should
>>>>> try with ?
>>>>>
>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> you need to look at nodetool compactionstats - there is probably a
>>>>>> big L0 -> L1 compaction going on that blocks other compactions from starting
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am inserting about 100 million entries via datastax-java driver
>>>>>>>> to a cassandra cluster of 3 nodes.
>>>>>>>>
>>>>>>>> Table structure is as
>>>>>>>>
>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>
>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>>>>>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>>>>>>> and compression={'sstable_compression' : ''};
>>>>>>>>
>>>>>>>> have 75 threads that are inserting data into the above table with
>>>>>>>> each thread having non over lapping keys.
>>>>>>>>
>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>>
>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> anishek
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>  ​

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
sorry i take that back we will modify different keys across threads not the
same key, our storm topology is going to use field grouping to get updates
for same keys to same set of bolts.

On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <an...@gmail.com> wrote:

> @Bruice : I dont think so as i am giving each thread a specific key range
> with no overlaps this does not seem to be the case now. However we will
> have to test where we have to modify the same key across threads -- do u
> think that will cause a problem ? As far as i have read LCS is recommended
> for such cases. should i just switch back to SizeTiredCompactionStrategy.
>
>
> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <br...@gmail.com>
> wrote:
>
>> Could it that the app is inserting _duplicate_ keys ?
>>
>> -- Brice
>>
>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
>> wrote:
>>
>>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
>>> sstable level information
>>>
>>> and, it is also likely that since you get so many L0 sstables, you will
>>> be doing size tiered compaction in L0 for a while.
>>>
>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>>> wrote:
>>>
>>>> @Marcus I did look and that is where i got the above but it doesnt show
>>>> any detail about moving from L0 -L1 any specific arguments i should try
>>>> with ?
>>>>
>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> you need to look at nodetool compactionstats - there is probably a big
>>>>> L0 -> L1 compaction going on that blocks other compactions from starting
>>>>>
>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am inserting about 100 million entries via datastax-java driver to
>>>>>>> a cassandra cluster of 3 nodes.
>>>>>>>
>>>>>>> Table structure is as
>>>>>>>
>>>>>>> create keyspace test with replication = {'class':
>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>
>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>>>>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>>>>>> and compression={'sstable_compression' : ''};
>>>>>>>
>>>>>>> have 75 threads that are inserting data into the above table with
>>>>>>> each thread having non over lapping keys.
>>>>>>>
>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>> compactionstats" keeps increasing and looks like from "nodetool cfstats
>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>>
>>>>>>> Why is compaction not kicking in ?
>>>>>>>
>>>>>>> thanks
>>>>>>> anishek
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
@Bruice : I dont think so as i am giving each thread a specific key range
with no overlaps this does not seem to be the case now. However we will
have to test where we have to modify the same key across threads -- do u
think that will cause a problem ? As far as i have read LCS is recommended
for such cases. should i just switch back to SizeTiredCompactionStrategy.


On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <br...@gmail.com>
wrote:

> Could it that the app is inserting _duplicate_ keys ?
>
> -- Brice
>
> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com>
> wrote:
>
>> nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
>> sstable level information
>>
>> and, it is also likely that since you get so many L0 sstables, you will
>> be doing size tiered compaction in L0 for a while.
>>
>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> @Marcus I did look and that is where i got the above but it doesnt show
>>> any detail about moving from L0 -L1 any specific arguments i should try
>>> with ?
>>>
>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>>> wrote:
>>>
>>>> you need to look at nodetool compactionstats - there is probably a big
>>>> L0 -> L1 compaction going on that blocks other compactions from starting
>>>>
>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>
>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am inserting about 100 million entries via datastax-java driver to
>>>>>> a cassandra cluster of 3 nodes.
>>>>>>
>>>>>> Table structure is as
>>>>>>
>>>>>> create keyspace test with replication = {'class':
>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>
>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>>>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>>>>> and compression={'sstable_compression' : ''};
>>>>>>
>>>>>> have 75 threads that are inserting data into the above table with
>>>>>> each thread having non over lapping keys.
>>>>>>
>>>>>> I see that the number of pending tasks via "nodetool compactionstats"
>>>>>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>>>>>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>>
>>>>>> Why is compaction not kicking in ?
>>>>>>
>>>>>> thanks
>>>>>> anishek
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Brice Dutheil <br...@gmail.com>.
Could it that the app is inserting _duplicate_ keys ?

-- Brice

On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <kr...@gmail.com> wrote:

> nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
> sstable level information
>
> and, it is also likely that since you get so many L0 sstables, you will be
> doing size tiered compaction in L0 for a while.
>
> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> @Marcus I did look and that is where i got the above but it doesnt show
>> any detail about moving from L0 -L1 any specific arguments i should try
>> with ?
>>
>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
>> wrote:
>>
>>> you need to look at nodetool compactionstats - there is probably a big
>>> L0 -> L1 compaction going on that blocks other compactions from starting
>>>
>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>>> wrote:
>>>
>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>
>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am inserting about 100 million entries via datastax-java driver to a
>>>>> cassandra cluster of 3 nodes.
>>>>>
>>>>> Table structure is as
>>>>>
>>>>> create keyspace test with replication = {'class':
>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>
>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>>>> and compression={'sstable_compression' : ''};
>>>>>
>>>>> have 75 threads that are inserting data into the above table with each
>>>>> thread having non over lapping keys.
>>>>>
>>>>> I see that the number of pending tasks via "nodetool compactionstats"
>>>>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>>>>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>>
>>>>> Why is compaction not kicking in ?
>>>>>
>>>>> thanks
>>>>> anishek
>>>>>
>>>>
>>>>
>>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Marcus Eriksson <kr...@gmail.com>.
nope, but you can correlate I guess, tools/bin/sstablemetadata gives you
sstable level information

and, it is also likely that since you get so many L0 sstables, you will be
doing size tiered compaction in L0 for a while.

On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <an...@gmail.com> wrote:

> @Marcus I did look and that is where i got the above but it doesnt show
> any detail about moving from L0 -L1 any specific arguments i should try
> with ?
>
> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com>
> wrote:
>
>> you need to look at nodetool compactionstats - there is probably a big L0
>> -> L1 compaction going on that blocks other compactions from starting
>>
>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>
>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am inserting about 100 million entries via datastax-java driver to a
>>>> cassandra cluster of 3 nodes.
>>>>
>>>> Table structure is as
>>>>
>>>> create keyspace test with replication = {'class':
>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>
>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>>> and compression={'sstable_compression' : ''};
>>>>
>>>> have 75 threads that are inserting data into the above table with each
>>>> thread having non over lapping keys.
>>>>
>>>> I see that the number of pending tasks via "nodetool compactionstats"
>>>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>>>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>>
>>>> Why is compaction not kicking in ?
>>>>
>>>> thanks
>>>> anishek
>>>>
>>>
>>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
@Marcus I did look and that is where i got the above but it doesnt show any
detail about moving from L0 -L1 any specific arguments i should try with ?

On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <kr...@gmail.com> wrote:

> you need to look at nodetool compactionstats - there is probably a big L0
> -> L1 compaction going on that blocks other compactions from starting
>
> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> the "some_bits" column has about 14-15 bytes of data per key.
>>
>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am inserting about 100 million entries via datastax-java driver to a
>>> cassandra cluster of 3 nodes.
>>>
>>> Table structure is as
>>>
>>> create keyspace test with replication = {'class':
>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>
>>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>>> and compression={'sstable_compression' : ''};
>>>
>>> have 75 threads that are inserting data into the above table with each
>>> thread having non over lapping keys.
>>>
>>> I see that the number of pending tasks via "nodetool compactionstats"
>>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>>
>>> Why is compaction not kicking in ?
>>>
>>> thanks
>>> anishek
>>>
>>
>>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Marcus Eriksson <kr...@gmail.com>.
you need to look at nodetool compactionstats - there is probably a big L0
-> L1 compaction going on that blocks other compactions from starting

On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <an...@gmail.com> wrote:

> the "some_bits" column has about 14-15 bytes of data per key.
>
> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am inserting about 100 million entries via datastax-java driver to a
>> cassandra cluster of 3 nodes.
>>
>> Table structure is as
>>
>> create keyspace test with replication = {'class':
>> 'NetworkTopologyStrategy', 'DC' : 3};
>>
>> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
>> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
>> and compression={'sstable_compression' : ''};
>>
>> have 75 threads that are inserting data into the above table with each
>> thread having non over lapping keys.
>>
>> I see that the number of pending tasks via "nodetool compactionstats"
>> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
>> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>>
>> Why is compaction not kicking in ?
>>
>> thanks
>> anishek
>>
>
>

Re: LCS Strategy, compaction pending tasks keep increasing

Posted by Anishek Agarwal <an...@gmail.com>.
the "some_bits" column has about 14-15 bytes of data per key.

On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <an...@gmail.com> wrote:

> Hello,
>
> I am inserting about 100 million entries via datastax-java driver to a
> cassandra cluster of 3 nodes.
>
> Table structure is as
>
> create keyspace test with replication = {'class':
> 'NetworkTopologyStrategy', 'DC' : 3};
>
> CREATE TABLE test_bits(id bigint primary key , some_bits text) with
> gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'}
> and compression={'sstable_compression' : ''};
>
> have 75 threads that are inserting data into the above table with each
> thread having non over lapping keys.
>
> I see that the number of pending tasks via "nodetool compactionstats"
> keeps increasing and looks like from "nodetool cfstats test.test_bits" has
> SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0],
>
> Why is compaction not kicking in ?
>
> thanks
> anishek
>