You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Goir Riog (JIRA)" <ji...@apache.org> on 2012/08/15 13:50:38 UTC

[jira] [Commented] (CASSANDRA-3777) get_range_slices() always returns list of KeySlice containing all available rows even if column size is empty

    [ https://issues.apache.org/jira/browse/CASSANDRA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434994#comment-13434994 ] 

Goir Riog commented on CASSANDRA-3777:
--------------------------------------

Hi,

whats the status on this one ?
this "bug" still exists all versions. Is there any reason why this is like described above ?
Its a huge network and processing overhead which can easily avoided.

A short comment on this one would be nice.

Thanks
Goir
                
> get_range_slices() always returns list of KeySlice containing all available rows even if column size is empty
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3777
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3777
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>
> Hi,
> we are using Cassandra to store data in super column families with a date as their name. We would like to iterate over the keys only containing data which matches given slice range (e.g. a certain day). In fact, method get_range_slices() always returns all rows where getColumnSize() on given KeySlice is 0.
> In combination with Hadoop we use the ColumnFamilyInputFormat which currently only supports SliceRanges. In our setup we might have billions of rows within a column family. Even though setting a slice range we always have to iterate all row keys, which in my opinion doesn't make any sense.
> Lets have a look at a very simple example:
>         Cassandra.Client client = ConfigHelper.createConnection("localhost", 9160, true);
>         client.set_keyspace("Foo");
>         SlicePredicate predicate = new SlicePredicate();
>         SliceRange sliceRange = new SliceRange();
>         sliceRange.start = Util.bb("I@1327273200");
>         sliceRange.finish = Util.bb("I@1327273200~");
>         predicate.slice_range = sliceRange;
>         
>         KeyRange keyRange = new KeyRange();
>         keyRange.start_key = Util.bb("");
>         keyRange.end_key = Util.bb("");
>         List<KeySlice> rows = client.get_range_slices(new ColumnParent("Bar"), predicate,
>                 keyRange, ConsistencyLevel.ONE);
>         
>         for (KeySlice slice : rows)
>         {
>             System.out.println("key: " + new String(slice.getKey()) + ", columns: " + slice.getColumnsSize());
>         }
> This is the output:
> key: I@1327359600@14@2074@478@32798@80445@2011@138@205@4320@0, columns: 0
> key: I@1327273200@12@1151@139@801@1728@2033@138@219@4476@0, columns: 1
> key: I@1327359600@14@2055@359@1032@2078@2011@138@205@4320@0, columns: 0
> key: I@1327359600@14@1151@139@801@1728@2011@138@205@4320@0, columns: 0
> key: I@1327273200@12@2074@478@32798@80445@2033@138@219@4476@0, columns: 1
> key: I@1327273200@12@2055@359@1032@2079@2033@138@219@4476@0, columns: 1
> Searching by slice ranges works fine, but for all other row keys not matching given slice range they are still part of the result list. We are filtering out such key slices by checking their column size, but it would make more sense to get only those keys we are looking for (which have obviously column size > 0).
> ColumnFamilyRecordReader creates sorted maps from the result list which means creating billions of maps and passing them to the mapper which are finally thrown away because they do not contain any content.
> The question is: Is there a chance by using slice ranges to get only those key slices which matches given slice range? Or is there any reason why this behaviour is like described above?
> Best Regards
> Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira