You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by al...@ceid.upatras.gr on 2010/11/19 15:33:18 UTC
How can I get rows in groups?
Hello,
I would like one of the cluster's nodes to use get_range_slices() to
retrieve the values of a specific column for the entire keyspace. I
obviously don't want to do it for the whole keyspace at once, so I'd like
to do it in groups of n, which should be configurable.
I get the first n values using a KeyRange with the current node's local
token as start_token and end_token, which equals the whole keyspace.
After that, it makes sense to have a loop, and to use each time a new
KeyRange with the largest key returned by the previous iteration as the
start_key. However, I don't know what to use as end_key, and Cassandra
complains that if one of (start_key, end_key) is not null, the other can't
be either. What can I do?
Can I use tokens? I read that a KeyRange with tokens is end-inclusive, and
can wrap, so I can just give the local node's token as the end_token all
the time, so when the traversing reaches that node again, it will know the
whole keyspace was traversed. Or are tokens different semantically?
I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
Alexander Altanis
Re: How can I get rows in groups?
Posted by aaron morton <aa...@thelastpickle.com>.
If you are working inside the cassandra code base, take a look at o.a.c.hadoop.ColumnFamilyRecordReader. It reads all the rows in a CF using tokens. I'm not sure that code cares too much about reading a row twice. AFAIK using tokens for is considered an internal feature.
WRT the start key / end key issue, why not take a look at how the pycassa, phpcassa or hector libraries do it?
Aaron
On 22 Nov 2010, at 22:10, altanis@ceid.upatras.gr wrote:
> I am not using any client, I am trying to extend Cassandra with a new API
> call so that a _node_ will do that on behalf of clients. Thank you for the
> answer, but it doesn't answer my question!
>
> Alexander
>
>> Most of the high level clients do this for you.
>>
>> For example, pycassa and phpcassa both do this by returning an
>> iterator from get_range() and breaking it up behind the scenes.
>>
>> Hector also has something similar, but I think it's in the examples
>> section.
>>
>> What client are you using?
>>
>> (By the way, beta1 is old and buggy! You should switch to beta3.)
>>
>> - Tyler
>>
>> On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:
>>
>>> Hello,
>>>
>>> I would like one of the cluster's nodes to use get_range_slices() to
>>> retrieve the values of a specific column for the entire keyspace. I
>>> obviously don't want to do it for the whole keyspace at once, so I'd
>>> like
>>> to do it in groups of n, which should be configurable.
>>>
>>> I get the first n values using a KeyRange with the current node's local
>>> token as start_token and end_token, which equals the whole keyspace.
>>>
>>> After that, it makes sense to have a loop, and to use each time a new
>>> KeyRange with the largest key returned by the previous iteration as the
>>> start_key. However, I don't know what to use as end_key, and Cassandra
>>> complains that if one of (start_key, end_key) is not null, the other
>>> can't
>>> be either. What can I do?
>>>
>>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive,
>>> and
>>> can wrap, so I can just give the local node's token as the end_token all
>>> the time, so when the traversing reaches that node again, it will know
>>> the
>>> whole keyspace was traversed. Or are tokens different semantically?
>>>
>>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>>>
>>> Alexander Altanis
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
Re: How can I get rows in groups?
Posted by al...@ceid.upatras.gr.
I am not using any client, I am trying to extend Cassandra with a new API
call so that a _node_ will do that on behalf of clients. Thank you for the
answer, but it doesn't answer my question!
Alexander
> Most of the high level clients do this for you.
>
> For example, pycassa and phpcassa both do this by returning an
> iterator from get_range() and breaking it up behind the scenes.
>
> Hector also has something similar, but I think it's in the examples
> section.
>
> What client are you using?
>
> (By the way, beta1 is old and buggy! You should switch to beta3.)
>
> - Tyler
>
> On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:
>
>> Hello,
>>
>> I would like one of the cluster's nodes to use get_range_slices() to
>> retrieve the values of a specific column for the entire keyspace. I
>> obviously don't want to do it for the whole keyspace at once, so I'd
>> like
>> to do it in groups of n, which should be configurable.
>>
>> I get the first n values using a KeyRange with the current node's local
>> token as start_token and end_token, which equals the whole keyspace.
>>
>> After that, it makes sense to have a loop, and to use each time a new
>> KeyRange with the largest key returned by the previous iteration as the
>> start_key. However, I don't know what to use as end_key, and Cassandra
>> complains that if one of (start_key, end_key) is not null, the other
>> can't
>> be either. What can I do?
>>
>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive,
>> and
>> can wrap, so I can just give the local node's token as the end_token all
>> the time, so when the traversing reaches that node again, it will know
>> the
>> whole keyspace was traversed. Or are tokens different semantically?
>>
>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>>
>> Alexander Altanis
>>
>>
>>
>>
>>
>>
>
Re: How can I get rows in groups?
Posted by Tyler Hobbs <ty...@riptano.com>.
Most of the high level clients do this for you.
For example, pycassa and phpcassa both do this by returning an
iterator from get_range() and breaking it up behind the scenes.
Hector also has something similar, but I think it's in the examples
section.
What client are you using?
(By the way, beta1 is old and buggy! You should switch to beta3.)
- Tyler
On Fri, Nov 19, 2010 at 8:33 AM, <al...@ceid.upatras.gr> wrote:
> Hello,
>
> I would like one of the cluster's nodes to use get_range_slices() to
> retrieve the values of a specific column for the entire keyspace. I
> obviously don't want to do it for the whole keyspace at once, so I'd like
> to do it in groups of n, which should be configurable.
>
> I get the first n values using a KeyRange with the current node's local
> token as start_token and end_token, which equals the whole keyspace.
>
> After that, it makes sense to have a loop, and to use each time a new
> KeyRange with the largest key returned by the previous iteration as the
> start_key. However, I don't know what to use as end_key, and Cassandra
> complains that if one of (start_key, end_key) is not null, the other can't
> be either. What can I do?
>
> Can I use tokens? I read that a KeyRange with tokens is end-inclusive, and
> can wrap, so I can just give the local node's token as the end_token all
> the time, so when the traversing reaches that node again, it will know the
> whole keyspace was traversed. Or are tokens different semantically?
>
> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>
> Alexander Altanis
>
>
>
>
>
>