You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jean Carlo <je...@gmail.com> on 2016/01/29 17:26:17 UTC

Tuning chunk_length_kb in cassandra 2.1.12

Hi guys

I want to set the param chunk_length_kb in order to improve the read
latency of my cassandra_stress's test.

This is the table

CREATE TABLE "Keyspace1".standard1 (
    key blob PRIMARY KEY,
    "C0" blob,
    "C1" blob,
    "C2" blob,
    "C3" blob,
    "C4" blob
) WITH bloom_filter_fp_chance = 0.1
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'sstable_size_in_mb': '160', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

I have 6 columns of type blob. This table is filled by cassandra_stres

admin@cqlsh:Keyspace1> select * from standard1 limit 2;

 key                    |
C0                                                                     |
C1                                                                     |
C2                                                                     |
C3                                                                     | C4
------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------
 0x4b343050393536353531 |
0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e

I am having a read latency of  ~500 microseconds, I think it takes to much
time comparing to the write latency of ~30 microseconds.

My first clue is to fix the  chunk_length_kb to a value close to the size
of the rows in kb

Am I in the right direction? If it is true, how can I compute the size of a
row?

Other question, the value of "Compacted partition" of the command nodetool
cfstats migth give me a value close to the chunk_length_kb ?

Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

Re: Tuning chunk_length_kb in cassandra 2.1.12

Posted by Ben Bromhead <be...@instaclustr.com>.
You will need to experiment with chunk_length based on your dataset. At the
end of the day its about finding the sweetspot as chunk_length needs to be
big enough such that you can get a decent compression rate (large chunks
increases the likelihood of a better compression ratio, which means you
will read less from disk) but you also want it to be small so that you are
not reading unrelated data from disk.

But... before you go down the chunk_length testing rabbit hole. Make sure
you are using a sane read_ahead value on the block device your data
directory sits on. For example if you are on AWS and using a raid device
built with mdadm the read_ahead value for the block device can be as high
as 128kb by default. If you are on SSDs you can safely drop it to 8 or 16
(or even 0) and see a big uptick in read performance.

For lots of juicy low level disk tuning and further details see Al Tobey's
guide https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

On Fri, 29 Jan 2016 at 08:26 Jean Carlo <je...@gmail.com> wrote:

> Hi guys
>
> I want to set the param chunk_length_kb in order to improve the read
> latency of my cassandra_stress's test.
>
> This is the table
>
> CREATE TABLE "Keyspace1".standard1 (
>     key blob PRIMARY KEY,
>     "C0" blob,
>     "C1" blob,
>     "C2" blob,
>     "C3" blob,
>     "C4" blob
> ) WITH bloom_filter_fp_chance = 0.1
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'sstable_size_in_mb': '160', 'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
>
> I have 6 columns of type blob. This table is filled by cassandra_stres
>
> admin@cqlsh:Keyspace1> select * from standard1 limit 2;
>
>  key                    |
> C0                                                                     |
> C1                                                                     |
> C2                                                                     |
> C3                                                                     | C4
>
> ------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------
>  0x4b343050393536353531 |
> 0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
> 0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
> 0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
> 0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
> 0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e
>
> I am having a read latency of  ~500 microseconds, I think it takes to much
> time comparing to the write latency of ~30 microseconds.
>
> My first clue is to fix the  chunk_length_kb to a value close to the size
> of the rows in kb
>
> Am I in the right direction? If it is true, how can I compute the size of
> a row?
>
> Other question, the value of "Compacted partition" of the command nodetool
> cfstats migth give me a value close to the chunk_length_kb ?
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer