You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/05/11 15:29:01 UTC

Re: Is multiget_slice performant when you're looking for lots of keys?

multiget performs in O(N) with the number of rows requested.  so will
range scanning.

if you want to query millions of records of one type i would create a
CF per type and use hadoop to parallelize the computation.

On Fri, May 7, 2010 at 6:16 PM, James <re...@gmail.com> wrote:
> Hi all,
> Apologies if I'm still stuck in RDBMS mentality - first project using
> Cassandra!
> I'll be using Cassandra to store quite a lot (10s of millions) of records,
> each of which has a type.
> I'll want to query the records to get all of a certain type; it's an
> analagous situation to the TaggedPosts schema from Arin's blog post
> (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
> The thing is, each type (or tag) row key will be pointing at millions of
> records. I know I can use multiget_slice with all those record IDs as one
> request, but is this The Right Way of "filtering" a large column family by
> type?
> Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
> Thanks!
> James



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Is multiget_slice performant when you're looking for lots of keys?

Posted by Schubert Zhang <zs...@gmail.com>.

Is it a problem for me to have millions of columns in a supercolumn?
You will have problem, because there is no index in supercolumn for
subcolumns.

On Tue, May 11, 2010 at 10:03 PM, David Boxenhorn <da...@lookin2.com> wrote:

> I have a similar issue, but I can't create a CF per type, because types are
> an open-ended set in my case (they are geographical locations). So I wanted
> to have one CF for types, and a supercolumn for each type, with the keys as
> columns per supercolumn.
>
> Is it a problem for me to have millions of columns in a supercolumn?
>
>
> On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> multiget performs in O(N) with the number of rows requested.  so will
>> range scanning.
>>
>> if you want to query millions of records of one type i would create a
>> CF per type and use hadoop to parallelize the computation.
>>
>> On Fri, May 7, 2010 at 6:16 PM, James <re...@gmail.com> wrote:
>> > Hi all,
>> > Apologies if I'm still stuck in RDBMS mentality - first project using
>> > Cassandra!
>> > I'll be using Cassandra to store quite a lot (10s of millions) of
>> records,
>> > each of which has a type.
>> > I'll want to query the records to get all of a certain type; it's an
>> > analagous situation to the TaggedPosts schema from Arin's blog post
>> > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
>> > The thing is, each type (or tag) row key will be pointing at millions of
>> > records. I know I can use multiget_slice with all those record IDs as
>> one
>> > request, but is this The Right Way of "filtering" a large column family
>> by
>> > type?
>> > Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
>> > Thanks!
>> > James
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: Is multiget_slice performant when you're looking for lots of keys?

Posted by David Boxenhorn <da...@lookin2.com>.

I have a similar issue, but I can't create a CF per type, because types are
an open-ended set in my case (they are geographical locations). So I wanted
to have one CF for types, and a supercolumn for each type, with the keys as
columns per supercolumn.

Is it a problem for me to have millions of columns in a supercolumn?

On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> multiget performs in O(N) with the number of rows requested.  so will
> range scanning.
>
> if you want to query millions of records of one type i would create a
> CF per type and use hadoop to parallelize the computation.
>
> On Fri, May 7, 2010 at 6:16 PM, James <re...@gmail.com> wrote:
> > Hi all,
> > Apologies if I'm still stuck in RDBMS mentality - first project using
> > Cassandra!
> > I'll be using Cassandra to store quite a lot (10s of millions) of
> records,
> > each of which has a type.
> > I'll want to query the records to get all of a certain type; it's an
> > analagous situation to the TaggedPosts schema from Arin's blog post
> > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
> > The thing is, each type (or tag) row key will be pointing at millions of
> > records. I know I can use multiget_slice with all those record IDs as one
> > request, but is this The Right Way of "filtering" a large column family
> by
> > type?
> > Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
> > Thanks!
> > James
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>