You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Halliday (JIRA)" <ji...@apache.org> on 2014/04/09 14:44:15 UTC

[jira] [Created] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

Jonathan Halliday created CASSANDRA-7016:
--------------------------------------------

             Summary: can't map/reduce over subset of rows with cql
                 Key: CASSANDRA-7016
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
             Project: Cassandra
          Issue Type: Bug
          Components: Core, Hadoop
            Reporter: Jonathan Halliday


select ... where token(k) < x and token(k) >= y and k in (a,b) allow filtering;

This fails on 2.0.6: can't restrict k by more than one relation.

In the context of map/reduce (hence the token range) I want to map over only a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql is substantially cheaper than pulling all rows to the client and then discarding most of them.

Currently this is possible only if the hadoop integration code is altered to apply the AND on the client side and use cql that contains only the resulting filtered 'in' set.  The problem is not hadoop specific though, so IMO it should really be solved in cql not the hadoop integration code.

Most restrictions on cql syntax seem to exist to prevent unduly expensive queries. This one seems to be doing the opposite.



--
This message was sent by Atlassian JIRA
(v6.2#6252)