You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ben Bromhead <be...@instaclustr.com> on 2016/02/17 18:52:17 UTC
Re: Tuning chunk_length_kb in cassandra 2.1.12

You will need to experiment with chunk_length based on your dataset. At the
end of the day its about finding the sweetspot as chunk_length needs to be
big enough such that you can get a decent compression rate (large chunks
increases the likelihood of a better compression ratio, which means you
will read less from disk) but you also want it to be small so that you are
not reading unrelated data from disk.

But... before you go down the chunk_length testing rabbit hole. Make sure
you are using a sane read_ahead value on the block device your data
directory sits on. For example if you are on AWS and using a raid device
built with mdadm the read_ahead value for the block device can be as high
as 128kb by default. If you are on SSDs you can safely drop it to 8 or 16
(or even 0) and see a big uptick in read performance.

For lots of juicy low level disk tuning and further details see Al Tobey's
guide https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

On Fri, 29 Jan 2016 at 08:26 Jean Carlo <je...@gmail.com> wrote:

> Hi guys
>
> I want to set the param chunk_length_kb in order to improve the read
> latency of my cassandra_stress's test.
>
> This is the table
>
> CREATE TABLE "Keyspace1".standard1 (
>     key blob PRIMARY KEY,
>     "C0" blob,
>     "C1" blob,
>     "C2" blob,
>     "C3" blob,
>     "C4" blob
> ) WITH bloom_filter_fp_chance = 0.1
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'sstable_size_in_mb': '160', 'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
>
> I have 6 columns of type blob. This table is filled by cassandra_stres
>
> admin@cqlsh:Keyspace1> select * from standard1 limit 2;
>
>  key                    |
> C0                                                                     |
> C1                                                                     |
> C2                                                                     |
> C3                                                                     | C4
>
> ------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------
>  0x4b343050393536353531 |
> 0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
> 0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
> 0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
> 0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
> 0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e
>
> I am having a read latency of  ~500 microseconds, I think it takes to much
> time comparing to the write latency of ~30 microseconds.
>
> My first clue is to fix the  chunk_length_kb to a value close to the size
> of the rows in kb
>
> Am I in the right direction? If it is true, how can I compute the size of
> a row?
>
> Other question, the value of "Compacted partition" of the command nodetool
> cfstats migth give me a value close to the chunk_length_kb ?
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer