You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fay, , Storage, , ­ <fa...@coupang.com> on 2017/07/10 22:09:05 UTC

index_interval

BY defaults:

AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128

"Cassandra maintains index offsets per partition to speed up the lookup
process in the case of key cache misses (see cassandra read path overview
<http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_reads_c.html>).
By default it samples a subset of keys, somewhat similar to a skip list.
The sampling interval is configurable with min_index_interval and
max_index_interval CQL schema attributes (see describe table). For
relatively large blobs like HTML pages we seem to get better read latencies
by lowering the sampling interval from 128 min / 2048 max to 64 min / 512
max. For large tables like parsoid HTML with ~500G load per node this
change adds a modest ~25mb off-heap memory."

I wonder if any one has experience on working with max and min index_interval
to increase the read speed.

Thanks,
Fay

Re: index_interval

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
I would also optimize for your worst case, which is hitting zero caches.
If you're using the default settings when creating a table, you're going to
get compression settings that are terrible for reads.  If you've got memory
to spare, I suggest changing your chunk_length_in_kb to 4 and disabling
readahead on your drives entirely.  I've seen 50-100x improvement in read
latency and throughput just by changing those settings.  I just did a talk
on this topic last week, slides are here:
https://www.slideshare.net/JonHaddad/performance-tuning-86995333

Jon

On Wed, Jul 12, 2017 at 2:03 PM Jeff Jirsa <jj...@apache.org> wrote:

>
>
> On 2017-07-12 12:03 (-0700), Fay Hou [Storage Service] ­ <
> fayhou@coupang.com> wrote:
> > First, a big thank to Jeff who spent endless time to help this mailing
> list.
> > Agreed that we should tune the key cache. In my case, my key cache hit
> rate
> > is about 20%. mainly because we do random read. We just going to leave
> the
> > index_interval as is for now.
> >
>
> That's pretty painful. If you can up that a bit, it'll probably help you
> out. You can adjust the index intervals, too, but I'd significantly
> increase key cache size first if it were my cluster.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: index_interval

Posted by Jeff Jirsa <jj...@apache.org>.

On 2017-07-12 12:03 (-0700), Fay Hou [Storage Service] ­ <fa...@coupang.com> wrote: 
> First, a big thank to Jeff who spent endless time to help this mailing list.
> Agreed that we should tune the key cache. In my case, my key cache hit rate
> is about 20%. mainly because we do random read. We just going to leave the
> index_interval as is for now.
> 

That's pretty painful. If you can up that a bit, it'll probably help you out. You can adjust the index intervals, too, but I'd significantly increase key cache size first if it were my cluster.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: index_interval

Posted by Fay, , Storage, , ­ <fa...@coupang.com>.
First, a big thank to Jeff who spent endless time to help this mailing list.
Agreed that we should tune the key cache. In my case, my key cache hit rate
is about 20%. mainly because we do random read. We just going to leave the
index_interval as is for now.

On Mon, Jul 10, 2017 at 8:47 PM, Jeff Jirsa <jj...@apache.org> wrote:

>
>
> On 2017-07-10 15:09 (-0700), Fay Hou [Storage Service] ­ <
> fayhou@coupang.com> wrote:
> > BY defaults:
> >
> > AND max_index_interval = 2048
> >     AND memtable_flush_period_in_ms = 0
> >     AND min_index_interval = 128
> >
> > "Cassandra maintains index offsets per partition to speed up the lookup
> > process in the case of key cache misses (see cassandra read path overview
> > <http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/
> dml_about_reads_c.html>).
> > By default it samples a subset of keys, somewhat similar to a skip list.
> > The sampling interval is configurable with min_index_interval and
> > max_index_interval CQL schema attributes (see describe table). For
> > relatively large blobs like HTML pages we seem to get better read
> latencies
> > by lowering the sampling interval from 128 min / 2048 max to 64 min / 512
> > max. For large tables like parsoid HTML with ~500G load per node this
> > change adds a modest ~25mb off-heap memory."
> >
> > I wonder if any one has experience on working with max and min
> index_interval
> > to increase the read speed.
>
> It's usually more efficient to try to tune the key cache, and hope you
> never have to hit the partition index at all. Do you have reason to believe
> you're spending an inordinate amount of IO scanning the partition index? Do
> you know what your key cache hit rate is?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: index_interval

Posted by Jeff Jirsa <jj...@apache.org>.

On 2017-07-10 15:09 (-0700), Fay Hou [Storage Service] ­ <fa...@coupang.com> wrote: 
> BY defaults:
> 
> AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
> 
> "Cassandra maintains index offsets per partition to speed up the lookup
> process in the case of key cache misses (see cassandra read path overview
> <http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_reads_c.html>).
> By default it samples a subset of keys, somewhat similar to a skip list.
> The sampling interval is configurable with min_index_interval and
> max_index_interval CQL schema attributes (see describe table). For
> relatively large blobs like HTML pages we seem to get better read latencies
> by lowering the sampling interval from 128 min / 2048 max to 64 min / 512
> max. For large tables like parsoid HTML with ~500G load per node this
> change adds a modest ~25mb off-heap memory."
> 
> I wonder if any one has experience on working with max and min index_interval
> to increase the read speed.

It's usually more efficient to try to tune the key cache, and hope you never have to hit the partition index at all. Do you have reason to believe you're spending an inordinate amount of IO scanning the partition index? Do you know what your key cache hit rate is? 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org