You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jean-Francois Im (JIRA)" <ji...@apache.org> on 2011/07/16 21:29:59 UTC

[jira] [Commented] (CASSANDRA-2904) get_range_slices with no columns could be made faster by scanning the index file

    [ https://issues.apache.org/jira/browse/CASSANDRA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066513#comment-13066513 ] 

Jean-Francois Im commented on CASSANDRA-2904:
---------------------------------------------

I forgot to mention that I am interested in writing a patch for this; I implemented something quick and dirty on my end to get an idea of the performance improvement, but it assumes that there is nothing else going on at the same moment (ie. nobody else is writing, consistency level is always ONE, no compaction or anything else is going on, there's only one client doing this kind of query, etc.).

Writing something more general purpose would be trickier and I would probably need some pointers for some things(how to handle a compaction, query cursors and a consistency level other than ONE, mostly), but it sounds really fun. Is there any interest for this?

> get_range_slices with no columns could be made faster by scanning the index file
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2904
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2904
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Jean-Francois Im
>
> When scanning a column family using get_range_slices() and a predicate that contains no columns, the scan operates on the actual data, not the index file.
> Our use case for this is that we have a column family that has relatively wide rows(varying from 10kb to over 100kb of data per row) and we need to do iterate through all the keys to figure out which rows we are interested in; obviously, going through the index file than the data is faster in this case(in the order of minutes versus hours).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira