You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jon Haddad (JIRA)" <ji...@apache.org> on 2015/08/28 17:24:46 UTC

[jira] [Updated] (CASSANDRA-10221) arbitrary predicate pushdown on CL=ONE

     [ https://issues.apache.org/jira/browse/CASSANDRA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jon Haddad updated CASSANDRA-10221:
-----------------------------------
    Description: 
For analytics workloads (in particular I'm thinking spark) it would be nice if we could add any predicate to the WHERE clause.  I added the CL=ONE requirement since it seems like this may be insane to do with any other level of consistency.

Currently in the spark connector if you want to filter on an arbitrary column of a table, you have to pull the entire table in memory via what is effectively a distributed SELECT * with token ranges and CL=ONE (typically).  It would be much nicer to avoid pulling the extra data into memory and just noop on the row if it doesn't satisfy the predicates. 

I think for sanity this should require the ALLOW FILTERING clause.

  was:For analytics workloads it would be nice if we could add any predicate.  I added the CL=ONE requirement since it seems like this may be insane to do with any other level of consistency.


> arbitrary predicate pushdown on CL=ONE
> --------------------------------------
>
>                 Key: CASSANDRA-10221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10221
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>
> For analytics workloads (in particular I'm thinking spark) it would be nice if we could add any predicate to the WHERE clause.  I added the CL=ONE requirement since it seems like this may be insane to do with any other level of consistency.
> Currently in the spark connector if you want to filter on an arbitrary column of a table, you have to pull the entire table in memory via what is effectively a distributed SELECT * with token ranges and CL=ONE (typically).  It would be much nicer to avoid pulling the extra data into memory and just noop on the row if it doesn't satisfy the predicates. 
> I think for sanity this should require the ALLOW FILTERING clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)