You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeremy Hanna <je...@gmail.com> on 2011/07/03 23:29:49 UTC

secondary index performance

Anyone know if secondary index performance should be in the 100-500 ms range.  That's what we're seeing right now when doing lookups on a single value.  We've increased keys_cached and rows_cached to 100% for that column family and assume that the secondary index gets the same attributes.  I've also reduced read_repair_chance to 0.2 because it doesn't get overwritten very frequently.

Is the assumption that rows/keys cached is inherited correct?  Is there any way to see cfstats on secondary index sub-column families?

Thanks,

Jeremy

Re: secondary index performance

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Jul 3, 2011 at 5:12 PM, Jeremy Hanna <je...@gmail.com> wrote:
> Trying some other stuff with tools mentioned here: http://spyced.blogspot.com/2010/01/linux-performance-basics.html but not seeing anything particularly disk bound, though await (from iostat -x) seems high on one of the devices.

Are you then seeing that it's CPU bound?  (Assuming that you are
pushing enough requests to saturate it.)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: secondary index performance

Posted by aaron morton <aa...@thelastpickle.com>.
> Is the assumption that rows/keys cached is inherited correct?  Is there any way to see cfstats on secondary index sub-column families?

They are inherited, but AFAIK only at the time the secondary index is created. You would need to drop and re-create the secondary index to see it change. 

cfstats for secondary index CF's are available via JMX / JConsole. 

Cheers	
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 4 Jul 2011, at 10:12, Jeremy Hanna wrote:

> 
> On Jul 3, 2011, at 4:29 PM, Jeremy Hanna wrote:
> 
>> Anyone know if secondary index performance should be in the 100-500 ms range.  That's what we're seeing right now when doing lookups on a single value.  We've increased keys_cached and rows_cached to 100% for that column family and assume that the secondary index gets the same attributes.  I've also reduced read_repair_chance to 0.2 because it doesn't get overwritten very frequently.
>> 
>> Is the assumption that rows/keys cached is inherited correct?  Is there any way to see cfstats on secondary index sub-column families?
> 
> the answer appears to be no and no.
> 
> Trying some other stuff with tools mentioned here: http://spyced.blogspot.com/2010/01/linux-performance-basics.html but not seeing anything particularly disk bound, though await (from iostat -x) seems high on one of the devices.
> 
> One of our guys said he pointed at our realtime nodes (instead of analytic nodes) but said the performance was worse.  Granted our analytic nodes are m4.xl and our realtime nodes are currently large, but still with no load on them, it should be quite fast I would think.
> 
>> 
>> Thanks,
>> 
>> Jeremy
> 


Re: secondary index performance

Posted by Jeremy Hanna <je...@gmail.com>.
On Jul 3, 2011, at 4:29 PM, Jeremy Hanna wrote:

> Anyone know if secondary index performance should be in the 100-500 ms range.  That's what we're seeing right now when doing lookups on a single value.  We've increased keys_cached and rows_cached to 100% for that column family and assume that the secondary index gets the same attributes.  I've also reduced read_repair_chance to 0.2 because it doesn't get overwritten very frequently.
> 
> Is the assumption that rows/keys cached is inherited correct?  Is there any way to see cfstats on secondary index sub-column families?

the answer appears to be no and no.

Trying some other stuff with tools mentioned here: http://spyced.blogspot.com/2010/01/linux-performance-basics.html but not seeing anything particularly disk bound, though await (from iostat -x) seems high on one of the devices.

One of our guys said he pointed at our realtime nodes (instead of analytic nodes) but said the performance was worse.  Granted our analytic nodes are m4.xl and our realtime nodes are currently large, but still with no load on them, it should be quite fast I would think.
 
> 
> Thanks,
> 
> Jeremy