You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2016/05/02 12:12:13 UTC
[jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for First Partition Key

    [ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266326#comment-15266326 ] 

Sylvain Lebresne commented on CASSANDRA-11031:
----------------------------------------------

So you're right, this case wasn't handled in CASSANDRA-6377 and we can handle it.

However, it's worth noting that this will be pretty seriously inefficient. In particular, we'll have to read *all* partitions and cannot use the {{where tenant_id = 'datastax'}} for speeding up the query in any way (I suppose we could for an ordered partitioner but we strongly discourage its use for many other reason so we're not gonna optimize for that now).

In particular, regarding:

bq. we can support allow filtering on Partition Key, as far as I know, Partition Key is in memory, so we can easily filter them, and then read required data from SSTable

I'm not entirely sure what you are referring to, but that's pretty much false: we don't keep all partition keys in memory. Maybe what you are referring to is that we could do the filtering early in the pipeline, eliminating keys that don't match the filter directly at the sstable index stage. And that's true, but it would add quite a bit of complexity (to push the filters through the sstable code) without making the query really efficient: we would still have to read every keys. So that I have fair doubts that the ratio complexity added/benefits is good enough.

Anyway, I'd be fine supporting this through basic filtering (that is, pretty much querying everything and filtering in {{RowFilter}} as usual) since this is guarded by {{ALLOW FILTERING}}.

But regarding the patch you've attached, a few remarks:
* as this is a new feature and a pretty minor one imo for the reason discussed above, it should really only go to trunk at this point. This will change the patch substantially.
* even if we support filtering on missing partition key, I really don't see a reason to special case it only for the first one. The code should be more generic than this.
* we'd obviously need the patch to have some testing coverage to consider it.

As a side note regarding our process, we don't assign specific "fix version" until commit so please leave "3.x" for now.


> MultiTenant : support “ALLOW FILTERING" for First Partition Key
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-11031
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: ZhaoYang
>            Assignee: ZhaoYang
>             Fix For: 3.x
>
>         Attachments: CASSANDRA-11031.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or clustering columns. And it's slow, because Cassandra will read all data from SSTABLE from hard-disk to memory to filter.
> But we can support allow filtering on Partition Key, as far as I know, Partition Key is in memory, so we can easily filter them, and then read required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
> 	tenant_id text,
> 	pk2 text,
> 	c1 text,
> 	c2 text,
> 	v1 text,
> 	v2 text,
> 	PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)