You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kant Kodali <ka...@peernova.com> on 2017/05/09 13:32:36 UTC

Cassandra 3.10 has partial partition key search but does it result in a table scan?

Hi All,

It looks like Cassandra 3.10 has partial partition key search but does it
result in a table scan? for example I can have the following

create table hello(
a text,
b int,
c text,
d text,
primary key((a,b), c)
);

Now I can do select * from hello where a='foo' allow filtering;// This
works in 3.10 but I wonder if this query results in table scan and if so is
there any way to limit such that I get max b?

Thanks!

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.
If you have to allow filtering for the query to work it usually always
results in a table scan.

greetings Daniel

On Tue, 9 May 2017 at 15:33 Jon Haddad <jo...@gmail.com> wrote:

> I don’t see any way it wouldn’t.  Have you tried tracing it?
>
> > On May 9, 2017, at 8:32 AM, Kant Kodali <ka...@peernova.com> wrote:
> >
> > Hi All,
> >
> > It looks like Cassandra 3.10 has partial partition key search but does
> it result in a table scan? for example I can have the following
> >
> > create table hello(
> > a text,
> > b int,
> > c text,
> > d text,
> > primary key((a,b), c)
> > );
> >
> > Now I can do select * from hello where a='foo' allow filtering;// This
> works in 3.10 but I wonder if this query results in table scan and if so is
> there any way to limit such that I get max b?
> >
> > Thanks!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Kant Kodali <ka...@peernova.com>.
Thanks a lot guys!

On Tue, May 9, 2017 at 7:32 AM, Alexander Dejanovski <alex@thelastpickle.com
> wrote:

> Hi Kant,
>
> Unless you provide the full partition key, I see no way for Cassandra to
> avoid doing a full table scan.
> In order to know on which specific nodes to search (and in which sstables
> ,etc...) it needs to have a token. The token is a hash of the whole
> partition key.
> For a specific value of column "a" and different values of column "b" you
> always end up with different tokens that have no guaranty to be stored on
> the same node.
> After that, bloom filters, partition indexes, etc... require the full
> token too, so a full scan is further necessary on each node to get the
> data.
>
> TL;DR : no way to avoid a full cluster scan unless you provide the full
> partition key in your where clause.
>
> Cheers,
>
> On Tue, May 9, 2017 at 4:24 PM Jon Haddad <jo...@gmail.com>
> wrote:
>
>> Nope, I didn’t comment on that query.   I specifically answered your
>> question about "select * from hello where a='foo' allow filtering;”
>>
>> The query you’ve listed here looks like it would also do a full table
>> scan (again, I don’t see how it would be avoided).
>>
>> I recommend firing up a 3 node cluster using CCM, creating a key space
>> with RF=1, and seeing what it does.
>>
>> On May 9, 2017, at 9:12 AM, Kant Kodali <ka...@peernova.com> wrote:
>>
>> Hi,
>>
>> Are you saying The following query select max(b) from hello where a='a1'
>> allow filtering; doesn't result in a table scan? I got the result for
>> this query and yes I just tried tracing it and looks like it is indeed
>> doing a table scan on ReadStage-2 although I am not sure if I am
>> interpreting it right? Finally is there anyway to prevent table scan while
>> providing the partial partition key and get the max b ?
>>
>> <Screen Shot 2017-05-09 at 7.07.46 AM.png>
>> ​
>>
>>
>> On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jo...@gmail.com>
>> wrote:
>>
>>> I don’t see any way it wouldn’t.  Have you tried tracing it?
>>>
>>> > On May 9, 2017, at 8:32 AM, Kant Kodali <ka...@peernova.com> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > It looks like Cassandra 3.10 has partial partition key search but does
>>> it result in a table scan? for example I can have the following
>>> >
>>> > create table hello(
>>> > a text,
>>> > b int,
>>> > c text,
>>> > d text,
>>> > primary key((a,b), c)
>>> > );
>>> >
>>> > Now I can do select * from hello where a='foo' allow filtering;// This
>>> works in 3.10 but I wonder if this query results in table scan and if so is
>>> there any way to limit such that I get max b?
>>> >
>>> > Thanks!
>>>
>>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Alexander Dejanovski <al...@thelastpickle.com>.
Hi Kant,

Unless you provide the full partition key, I see no way for Cassandra to
avoid doing a full table scan.
In order to know on which specific nodes to search (and in which sstables
,etc...) it needs to have a token. The token is a hash of the whole
partition key.
For a specific value of column "a" and different values of column "b" you
always end up with different tokens that have no guaranty to be stored on
the same node.
After that, bloom filters, partition indexes, etc... require the full token
too, so a full scan is further necessary on each node to get the data.

TL;DR : no way to avoid a full cluster scan unless you provide the full
partition key in your where clause.

Cheers,

On Tue, May 9, 2017 at 4:24 PM Jon Haddad <jo...@gmail.com> wrote:

> Nope, I didn’t comment on that query.   I specifically answered your
> question about "select * from hello where a='foo' allow filtering;”
>
> The query you’ve listed here looks like it would also do a full table scan
> (again, I don’t see how it would be avoided).
>
> I recommend firing up a 3 node cluster using CCM, creating a key space
> with RF=1, and seeing what it does.
>
> On May 9, 2017, at 9:12 AM, Kant Kodali <ka...@peernova.com> wrote:
>
> Hi,
>
> Are you saying The following query select max(b) from hello where a='a1'
> allow filtering; doesn't result in a table scan? I got the result for
> this query and yes I just tried tracing it and looks like it is indeed
> doing a table scan on ReadStage-2 although I am not sure if I am
> interpreting it right? Finally is there anyway to prevent table scan while
> providing the partial partition key and get the max b ?
>
> <Screen Shot 2017-05-09 at 7.07.46 AM.png>
> ​
>
>
> On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jo...@gmail.com>
> wrote:
>
>> I don’t see any way it wouldn’t.  Have you tried tracing it?
>>
>> > On May 9, 2017, at 8:32 AM, Kant Kodali <ka...@peernova.com> wrote:
>> >
>> > Hi All,
>> >
>> > It looks like Cassandra 3.10 has partial partition key search but does
>> it result in a table scan? for example I can have the following
>> >
>> > create table hello(
>> > a text,
>> > b int,
>> > c text,
>> > d text,
>> > primary key((a,b), c)
>> > );
>> >
>> > Now I can do select * from hello where a='foo' allow filtering;// This
>> works in 3.10 but I wonder if this query results in table scan and if so is
>> there any way to limit such that I get max b?
>> >
>> > Thanks!
>>
>>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Jon Haddad <jo...@gmail.com>.
Output from both queries, demonstrating full cluster scans:

https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06 <https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06>

Jon

> On May 9, 2017, at 9:24 AM, Jon Haddad <jo...@gmail.com> wrote:
> 
> Nope, I didn’t comment on that query.   I specifically answered your question about "select * from hello where a='foo' allow filtering;”
> 
> The query you’ve listed here looks like it would also do a full table scan (again, I don’t see how it would be avoided).
> 
> I recommend firing up a 3 node cluster using CCM, creating a key space with RF=1, and seeing what it does.  
> 
>> On May 9, 2017, at 9:12 AM, Kant Kodali <kant@peernova.com <ma...@peernova.com>> wrote:
>> 
>> Hi,
>> 
>> Are you saying The following query select max(b) from hello where a='a1' allow filtering; doesn't result in a table scan? I got the result for this query and yes I just tried tracing it and looks like it is indeed doing a table scan on ReadStage-2 although I am not sure if I am interpreting it right? Finally is there anyway to prevent table scan while providing the partial partition key and get the max b ?
>> 
>> <Screen Shot 2017-05-09 at 7.07.46 AM.png>
>> ​
>> 
>> 
>> On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jonathan.haddad@gmail.com <ma...@gmail.com>> wrote:
>> I don’t see any way it wouldn’t.  Have you tried tracing it?
>> 
>> > On May 9, 2017, at 8:32 AM, Kant Kodali <kant@peernova.com <ma...@peernova.com>> wrote:
>> >
>> > Hi All,
>> >
>> > It looks like Cassandra 3.10 has partial partition key search but does it result in a table scan? for example I can have the following
>> >
>> > create table hello(
>> > a text,
>> > b int,
>> > c text,
>> > d text,
>> > primary key((a,b), c)
>> > );
>> >
>> > Now I can do select * from hello where a='foo' allow filtering;// This works in 3.10 but I wonder if this query results in table scan and if so is there any way to limit such that I get max b?
>> >
>> > Thanks!
>> 
>> 
> 


Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Jon Haddad <jo...@gmail.com>.
Nope, I didn’t comment on that query.   I specifically answered your question about "select * from hello where a='foo' allow filtering;”

The query you’ve listed here looks like it would also do a full table scan (again, I don’t see how it would be avoided).

I recommend firing up a 3 node cluster using CCM, creating a key space with RF=1, and seeing what it does.  

> On May 9, 2017, at 9:12 AM, Kant Kodali <ka...@peernova.com> wrote:
> 
> Hi,
> 
> Are you saying The following query select max(b) from hello where a='a1' allow filtering; doesn't result in a table scan? I got the result for this query and yes I just tried tracing it and looks like it is indeed doing a table scan on ReadStage-2 although I am not sure if I am interpreting it right? Finally is there anyway to prevent table scan while providing the partial partition key and get the max b ?
> 
> <Screen Shot 2017-05-09 at 7.07.46 AM.png>
> ​
> 
> 
> On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jonathan.haddad@gmail.com <ma...@gmail.com>> wrote:
> I don’t see any way it wouldn’t.  Have you tried tracing it?
> 
> > On May 9, 2017, at 8:32 AM, Kant Kodali <kant@peernova.com <ma...@peernova.com>> wrote:
> >
> > Hi All,
> >
> > It looks like Cassandra 3.10 has partial partition key search but does it result in a table scan? for example I can have the following
> >
> > create table hello(
> > a text,
> > b int,
> > c text,
> > d text,
> > primary key((a,b), c)
> > );
> >
> > Now I can do select * from hello where a='foo' allow filtering;// This works in 3.10 but I wonder if this query results in table scan and if so is there any way to limit such that I get max b?
> >
> > Thanks!
> 
> 


Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Kant Kodali <ka...@peernova.com>.
Hi,

Are you saying The following query select max(b) from hello where a='a1'
allow filtering; doesn't result in a table scan? I got the result for this
query and yes I just tried tracing it and looks like it is indeed doing a
table scan on ReadStage-2 although I am not sure if I am interpreting it
right? Finally is there anyway to prevent table scan while providing the
partial partition key and get the max b ?


​


On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jo...@gmail.com>
wrote:

> I don’t see any way it wouldn’t.  Have you tried tracing it?
>
> > On May 9, 2017, at 8:32 AM, Kant Kodali <ka...@peernova.com> wrote:
> >
> > Hi All,
> >
> > It looks like Cassandra 3.10 has partial partition key search but does
> it result in a table scan? for example I can have the following
> >
> > create table hello(
> > a text,
> > b int,
> > c text,
> > d text,
> > primary key((a,b), c)
> > );
> >
> > Now I can do select * from hello where a='foo' allow filtering;// This
> works in 3.10 but I wonder if this query results in table scan and if so is
> there any way to limit such that I get max b?
> >
> > Thanks!
>
>

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

Posted by Jon Haddad <jo...@gmail.com>.
I don’t see any way it wouldn’t.  Have you tried tracing it?

> On May 9, 2017, at 8:32 AM, Kant Kodali <ka...@peernova.com> wrote:
> 
> Hi All,
> 
> It looks like Cassandra 3.10 has partial partition key search but does it result in a table scan? for example I can have the following
> 
> create table hello(
> a text,
> b int,
> c text,
> d text,
> primary key((a,b), c)
> );
> 
> Now I can do select * from hello where a='foo' allow filtering;// This works in 3.10 but I wonder if this query results in table scan and if so is there any way to limit such that I get max b?
> 
> Thanks!


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org