You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by al...@ceid.upatras.gr on 2010/11/19 15:33:18 UTC

How can I get rows in groups?

Hello,

I would like one of the cluster's nodes to use get_range_slices() to
retrieve the values of a specific column for the entire keyspace. I
obviously don't want to do it for the whole keyspace at once, so I'd like
to do it in groups of n, which should be configurable.

I get the first n values using a KeyRange with the current node's local
token as start_token and end_token, which equals the whole keyspace.

After that, it makes sense to have a loop, and to use each time a new
KeyRange with the largest key returned by the previous iteration as the
start_key. However, I don't know what to use as end_key, and Cassandra
complains that if one of (start_key, end_key) is not null, the other can't
be either. What can I do?

Can I use tokens? I read that a KeyRange with tokens is end-inclusive, and
can wrap, so I can just give the local node's token as the end_token all
the time, so when the traversing reaches that node again, it will know the
whole keyspace was traversed. Or are tokens different semantically?

I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.

Alexander Altanis






Re: How can I get rows in groups?

Posted by aaron morton <aa...@thelastpickle.com>.
If you are working inside the cassandra code base, take a look at o.a.c.hadoop.ColumnFamilyRecordReader. It reads all the rows in a CF using tokens. I'm not sure that code cares too much about reading a row twice. AFAIK using tokens for is considered an internal feature.

WRT the start key / end key issue, why not take a look at how the pycassa, phpcassa or hector libraries do it? 

Aaron


On 22 Nov 2010, at 22:10, altanis@ceid.upatras.gr wrote:

> I am not using any client, I am trying to extend Cassandra with a new API
> call so that a _node_ will do that on behalf of clients. Thank you for the
> answer, but it doesn't answer my question!
> 
> Alexander
> 
>> Most of the high level clients do this for you.
>> 
>> For example, pycassa and phpcassa both do this by returning an
>> iterator from get_range() and breaking it up behind the scenes.
>> 
>> Hector also has something similar, but I think it's in the examples
>> section.
>> 
>> What client are you using?
>> 
>> (By the way, beta1 is old and buggy! You should switch to beta3.)
>> 
>> - Tyler
>> 
>> On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:
>> 
>>> Hello,
>>> 
>>> I would like one of the cluster's nodes to use get_range_slices() to
>>> retrieve the values of a specific column for the entire keyspace. I
>>> obviously don't want to do it for the whole keyspace at once, so I'd
>>> like
>>> to do it in groups of n, which should be configurable.
>>> 
>>> I get the first n values using a KeyRange with the current node's local
>>> token as start_token and end_token, which equals the whole keyspace.
>>> 
>>> After that, it makes sense to have a loop, and to use each time a new
>>> KeyRange with the largest key returned by the previous iteration as the
>>> start_key. However, I don't know what to use as end_key, and Cassandra
>>> complains that if one of (start_key, end_key) is not null, the other
>>> can't
>>> be either. What can I do?
>>> 
>>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive,
>>> and
>>> can wrap, so I can just give the local node's token as the end_token all
>>> the time, so when the traversing reaches that node again, it will know
>>> the
>>> whole keyspace was traversed. Or are tokens different semantically?
>>> 
>>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>>> 
>>> Alexander Altanis
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


Re: How can I get rows in groups?

Posted by al...@ceid.upatras.gr.
I am not using any client, I am trying to extend Cassandra with a new API
call so that a _node_ will do that on behalf of clients. Thank you for the
answer, but it doesn't answer my question!

Alexander

> Most of the high level clients do this for you.
>
> For example, pycassa and phpcassa both do this by returning an
> iterator from get_range() and breaking it up behind the scenes.
>
> Hector also has something similar, but I think it's in the examples
> section.
>
> What client are you using?
>
> (By the way, beta1 is old and buggy! You should switch to beta3.)
>
> - Tyler
>
> On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:
>
>> Hello,
>>
>> I would like one of the cluster's nodes to use get_range_slices() to
>> retrieve the values of a specific column for the entire keyspace. I
>> obviously don't want to do it for the whole keyspace at once, so I'd
>> like
>> to do it in groups of n, which should be configurable.
>>
>> I get the first n values using a KeyRange with the current node's local
>> token as start_token and end_token, which equals the whole keyspace.
>>
>> After that, it makes sense to have a loop, and to use each time a new
>> KeyRange with the largest key returned by the previous iteration as the
>> start_key. However, I don't know what to use as end_key, and Cassandra
>> complains that if one of (start_key, end_key) is not null, the other
>> can't
>> be either. What can I do?
>>
>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive,
>> and
>> can wrap, so I can just give the local node's token as the end_token all
>> the time, so when the traversing reaches that node again, it will know
>> the
>> whole keyspace was traversed. Or are tokens different semantically?
>>
>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>>
>> Alexander Altanis
>>
>>
>>
>>
>>
>>
>


Re: How can I get rows in groups?

Posted by Tyler Hobbs <ty...@riptano.com>.
Most of the high level clients do this for you.

For example, pycassa and phpcassa both do this by returning an
iterator from get_range() and breaking it up behind the scenes.

Hector also has something similar, but I think it's in the examples
section.

What client are you using?

(By the way, beta1 is old and buggy! You should switch to beta3.)

- Tyler

On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:

> Hello,
>
> I would like one of the cluster's nodes to use get_range_slices() to
> retrieve the values of a specific column for the entire keyspace. I
> obviously don't want to do it for the whole keyspace at once, so I'd like
> to do it in groups of n, which should be configurable.
>
> I get the first n values using a KeyRange with the current node's local
> token as start_token and end_token, which equals the whole keyspace.
>
> After that, it makes sense to have a loop, and to use each time a new
> KeyRange with the largest key returned by the previous iteration as the
> start_key. However, I don't know what to use as end_key, and Cassandra
> complains that if one of (start_key, end_key) is not null, the other can't
> be either. What can I do?
>
> Can I use tokens? I read that a KeyRange with tokens is end-inclusive, and
> can wrap, so I can just give the local node's token as the end_token all
> the time, so when the traversing reaches that node again, it will know the
> whole keyspace was traversed. Or are tokens different semantically?
>
> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>
> Alexander Altanis
>
>
>
>
>
>