You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Raj Janakarajan <ra...@zephyrhealthinc.com> on 2014/05/19 18:50:19 UTC

Filtering on Collections

Hello all,

I am using Cassandra version 2.0.7.  I am wondering if "collections" is
efficient for filtering.  We are thinking of using "collections" to
maintain a list for a customer row but we have to be able to filter on the
collection values.

Select UUID from customer where eligibility_state IN (CA, NC)

Eligibility_state being a collection.  The above query would be used
frequently.
Would you recommend collections for modeling from a performance perspective?

Raj
-- 

Data Architect ❘ Zephyr Health
589 Howard St. ❘ San Francisco, CA 94105
m: +1 9176477433 ❘ f: +1 415 520-9288
o: +1 415 529-7649 | s: raj.janakarajan

http://www.zephyrhealth.com

Re: Filtering on Collections

Posted by Raj Janakarajan <ra...@zephyrhealthinc.com>.
Thank you Patricia.  This is helpful.

Raj


On Mon, May 19, 2014 at 10:54 AM, Patricia Gorla <patricia@thelastpickle.com
> wrote:

> Raj,
>
> Secondary indexes across CQL3 collections were introduced into 2.1 beta1,
> so will be available in future versions. See
> https://issues.apache.org/jira/browse/CASSANDRA-4511
>
> If your main concern is performance then you should find another way to
> model the data: each collection is read entirely into memory to access a
> single item.
>
>
>
> On Mon, May 19, 2014 at 11:03 AM, Eric Plowe <er...@gmail.com> wrote:
>
>> Collection types cannot be used for filtering (as part of the where
>> statement).
>> They cannot be used as a primary key or part of a primary key.
>> Secondary indexes are not supported as well.
>>
>>
>> On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan <
>> raj@zephyrhealthinc.com> wrote:
>>
>>> Hello all,
>>>
>>> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
>>> efficient for filtering.  We are thinking of using "collections" to
>>> maintain a list for a customer row but we have to be able to filter on the
>>> collection values.
>>>
>>> Select UUID from customer where eligibility_state IN (CA, NC)
>>>
>>> Eligibility_state being a collection.  The above query would be used
>>> frequently.
>>> Would you recommend collections for modeling from a performance
>>> perspective?
>>>
>>> Raj
>>> --
>>>
>>>  Data Architect ❘ Zephyr Health
>>>  589 Howard St. ❘ San Francisco, CA 94105
>>> m: +1 9176477433 ❘ f: +1 415 520-9288
>>>  o: +1 415 529-7649 | s: raj.janakarajan
>>>
>>> http://www.zephyrhealth.com
>>>
>>>
>>
>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com <http://thelastpickle.com>
>



-- 

Data Architect ❘ Zephyr Health
589 Howard St. ❘ San Francisco, CA 94105
m: +1 9176477433 ❘ f: +1 415 520-9288
o: +1 415 529-7649 | s: raj.janakarajan

http://www.zephyrhealth.com

Re: Filtering on Collections

Posted by Patricia Gorla <pa...@thelastpickle.com>.
I'm not sure about that — allowing collections as a primary key would be a
much different implementation than setting up a secondary index.

The primary key in CQL3 is actually the partition key which determines
which token the row is assigned, so you would still need to have one
partition key. Also, I don't see a use case for collections as a primary
key that you couldn't achieve with a composite key.


On Mon, May 19, 2014 at 1:35 PM, Eric Plowe <er...@gmail.com> wrote:

> Ah, that is interesting, Patricia. Since they can be a secondary index,
> it's not too far off for them being able to be a primary key, no?
>
>
> On Mon, May 19, 2014 at 1:54 PM, Patricia Gorla <
> patricia@thelastpickle.com> wrote:
>
>> Raj,
>>
>> Secondary indexes across CQL3 collections were introduced into 2.1 beta1,
>> so will be available in future versions. See
>> https://issues.apache.org/jira/browse/CASSANDRA-4511
>>
>> If your main concern is performance then you should find another way to
>> model the data: each collection is read entirely into memory to access a
>> single item.
>>
>>
>>
>> On Mon, May 19, 2014 at 11:03 AM, Eric Plowe <er...@gmail.com>wrote:
>>
>>> Collection types cannot be used for filtering (as part of the where
>>> statement).
>>> They cannot be used as a primary key or part of a primary key.
>>> Secondary indexes are not supported as well.
>>>
>>>
>>> On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan <
>>> raj@zephyrhealthinc.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
>>>> efficient for filtering.  We are thinking of using "collections" to
>>>> maintain a list for a customer row but we have to be able to filter on the
>>>> collection values.
>>>>
>>>> Select UUID from customer where eligibility_state IN (CA, NC)
>>>>
>>>> Eligibility_state being a collection.  The above query would be used
>>>> frequently.
>>>> Would you recommend collections for modeling from a performance
>>>> perspective?
>>>>
>>>> Raj
>>>> --
>>>>
>>>>  Data Architect ❘ Zephyr Health
>>>>  589 Howard St. ❘ San Francisco, CA 94105
>>>> m: +1 9176477433 ❘ f: +1 415 520-9288
>>>>  o: +1 415 529-7649 | s: raj.janakarajan
>>>>
>>>> http://www.zephyrhealth.com
>>>>
>>>>
>>>
>>
>>
>> --
>> Patricia Gorla
>> @patriciagorla
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com <http://thelastpickle.com>
>>
>
>


-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://thelastpickle.com>

Re: Filtering on Collections

Posted by Eric Plowe <er...@gmail.com>.
Ah, that is interesting, Patricia. Since they can be a secondary index,
it's not too far off for them being able to be a primary key, no?


On Mon, May 19, 2014 at 1:54 PM, Patricia Gorla
<pa...@thelastpickle.com>wrote:

> Raj,
>
> Secondary indexes across CQL3 collections were introduced into 2.1 beta1,
> so will be available in future versions. See
> https://issues.apache.org/jira/browse/CASSANDRA-4511
>
> If your main concern is performance then you should find another way to
> model the data: each collection is read entirely into memory to access a
> single item.
>
>
>
> On Mon, May 19, 2014 at 11:03 AM, Eric Plowe <er...@gmail.com> wrote:
>
>> Collection types cannot be used for filtering (as part of the where
>> statement).
>> They cannot be used as a primary key or part of a primary key.
>> Secondary indexes are not supported as well.
>>
>>
>> On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan <
>> raj@zephyrhealthinc.com> wrote:
>>
>>> Hello all,
>>>
>>> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
>>> efficient for filtering.  We are thinking of using "collections" to
>>> maintain a list for a customer row but we have to be able to filter on the
>>> collection values.
>>>
>>> Select UUID from customer where eligibility_state IN (CA, NC)
>>>
>>> Eligibility_state being a collection.  The above query would be used
>>> frequently.
>>> Would you recommend collections for modeling from a performance
>>> perspective?
>>>
>>> Raj
>>> --
>>>
>>>  Data Architect ❘ Zephyr Health
>>>  589 Howard St. ❘ San Francisco, CA 94105
>>> m: +1 9176477433 ❘ f: +1 415 520-9288
>>>  o: +1 415 529-7649 | s: raj.janakarajan
>>>
>>> http://www.zephyrhealth.com
>>>
>>>
>>
>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com <http://thelastpickle.com>
>

Re: Filtering on Collections

Posted by Patricia Gorla <pa...@thelastpickle.com>.
Raj,

Secondary indexes across CQL3 collections were introduced into 2.1 beta1,
so will be available in future versions. See
https://issues.apache.org/jira/browse/CASSANDRA-4511

If your main concern is performance then you should find another way to
model the data: each collection is read entirely into memory to access a
single item.



On Mon, May 19, 2014 at 11:03 AM, Eric Plowe <er...@gmail.com> wrote:

> Collection types cannot be used for filtering (as part of the where
> statement).
> They cannot be used as a primary key or part of a primary key.
> Secondary indexes are not supported as well.
>
>
> On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan <raj@zephyrhealthinc.com
> > wrote:
>
>> Hello all,
>>
>> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
>> efficient for filtering.  We are thinking of using "collections" to
>> maintain a list for a customer row but we have to be able to filter on the
>> collection values.
>>
>> Select UUID from customer where eligibility_state IN (CA, NC)
>>
>> Eligibility_state being a collection.  The above query would be used
>> frequently.
>> Would you recommend collections for modeling from a performance
>> perspective?
>>
>> Raj
>> --
>>
>>  Data Architect ❘ Zephyr Health
>>  589 Howard St. ❘ San Francisco, CA 94105
>> m: +1 9176477433 ❘ f: +1 415 520-9288
>>  o: +1 415 529-7649 | s: raj.janakarajan
>>
>> http://www.zephyrhealth.com
>>
>>
>


-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://thelastpickle.com>

Re: Filtering on Collections

Posted by Raj Janakarajan <ra...@zephyrhealthinc.com>.
Thanks Eric for the information.  It looks like it will be supported in
future versions.

Raj


On Mon, May 19, 2014 at 10:03 AM, Eric Plowe <er...@gmail.com> wrote:

> Collection types cannot be used for filtering (as part of the where
> statement).
> They cannot be used as a primary key or part of a primary key.
> Secondary indexes are not supported as well.
>
>
> On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan <raj@zephyrhealthinc.com
> > wrote:
>
>> Hello all,
>>
>> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
>> efficient for filtering.  We are thinking of using "collections" to
>> maintain a list for a customer row but we have to be able to filter on the
>> collection values.
>>
>> Select UUID from customer where eligibility_state IN (CA, NC)
>>
>> Eligibility_state being a collection.  The above query would be used
>> frequently.
>> Would you recommend collections for modeling from a performance
>> perspective?
>>
>> Raj
>> --
>>
>>  Data Architect ❘ Zephyr Health
>>  589 Howard St. ❘ San Francisco, CA 94105
>> m: +1 9176477433 ❘ f: +1 415 520-9288
>>  o: +1 415 529-7649 | s: raj.janakarajan
>>
>> http://www.zephyrhealth.com
>>
>>
>


-- 

Data Architect ❘ Zephyr Health
589 Howard St. ❘ San Francisco, CA 94105
m: +1 9176477433 ❘ f: +1 415 520-9288
o: +1 415 529-7649 | s: raj.janakarajan

http://www.zephyrhealth.com

Re: Filtering on Collections

Posted by Eric Plowe <er...@gmail.com>.
Collection types cannot be used for filtering (as part of the where
statement).
They cannot be used as a primary key or part of a primary key.
Secondary indexes are not supported as well.


On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan
<ra...@zephyrhealthinc.com>wrote:

> Hello all,
>
> I am using Cassandra version 2.0.7.  I am wondering if "collections" is
> efficient for filtering.  We are thinking of using "collections" to
> maintain a list for a customer row but we have to be able to filter on the
> collection values.
>
> Select UUID from customer where eligibility_state IN (CA, NC)
>
> Eligibility_state being a collection.  The above query would be used
> frequently.
> Would you recommend collections for modeling from a performance
> perspective?
>
> Raj
> --
>
> Data Architect ❘ Zephyr Health
>  589 Howard St. ❘ San Francisco, CA 94105
> m: +1 9176477433 ❘ f: +1 415 520-9288
> o: +1 415 529-7649 | s: raj.janakarajan
>
> http://www.zephyrhealth.com
>
>