You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/09/27 23:42:37 UTC

paging through an entire table in chunks?

I need a way to do a full table scan across all of our data.

Can’t I just use token() for this?

This way I could split up our entire keyspace into say 1024 chunks, and
then have one activemq task work with range 0, then range 1, etc… that way
I can easily just map() my whole table.

and since it’s token() I should (generally) read a contiguous range from a
given table.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: paging through an entire table in chunks?

Posted by Brice Dutheil <br...@gmail.com>.
You may be using the async feature
<http://www.datastax.com/documentation/developer/java-driver/1.0/java-driver/asynchronous_t.html>
of the java driver. In order to manage complexity related to do several
queries I used RxJava, it leverages readability and asynchronicity in a
very elegant way (much more than Futures). However you may need to code
some code to bridge Rx and the Java driver but it’s worth it.

— Brice

On Sun, Sep 28, 2014 at 12:57 AM, Kevin Burton <bu...@spinn3r.com> wrote:

Agreed… but I’d like to parallelize it… Eventually I’ll just have too much
> data to do it on one server… plus, I need suspend/resume and this way if
> I’m doing like 10MB at a time I’ll be able to suspend / resume as well as
> track progress.
>
> On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Use the java driver and paging feature:
>> http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int)
>>
>> 1) Do you "SELECT * FROM" without any selection
>> 2) Set fetchSize to a sensitive value
>> 3) Execute the query and get an iterator from the ResultSet
>> 4) Iterate
>>
>>
>>
>> On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton <bu...@spinn3r.com>
>> wrote:
>>
>>> I need a way to do a full table scan across all of our data.
>>>
>>> Can’t I just use token() for this?
>>>
>>> This way I could split up our entire keyspace into say 1024 chunks, and
>>> then have one activemq task work with range 0, then range 1, etc… that way
>>> I can easily just map() my whole table.
>>>
>>> and since it’s token() I should (generally) read a contiguous range from
>>> a given table.
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>  ​

Re: paging through an entire table in chunks?

Posted by Kevin Burton <bu...@spinn3r.com>.
Agreed… but I’d like to parallelize it… Eventually I’ll just have too much
data to do it on one server… plus, I need suspend/resume and this way if
I’m doing like 10MB at a time I’ll be able to suspend / resume as well as
track progress.

On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan <do...@gmail.com> wrote:

> Use the java driver and paging feature:
> http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int)
>
> 1) Do you "SELECT * FROM" without any selection
> 2) Set fetchSize to a sensitive value
> 3) Execute the query and get an iterator from the ResultSet
> 4) Iterate
>
>
>
> On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> I need a way to do a full table scan across all of our data.
>>
>> Can’t I just use token() for this?
>>
>> This way I could split up our entire keyspace into say 1024 chunks, and
>> then have one activemq task work with range 0, then range 1, etc… that way
>> I can easily just map() my whole table.
>>
>> and since it’s token() I should (generally) read a contiguous range from
>> a given table.
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: paging through an entire table in chunks?

Posted by DuyHai Doan <do...@gmail.com>.
Use the java driver and paging feature:
http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int)

1) Do you "SELECT * FROM" without any selection
2) Set fetchSize to a sensitive value
3) Execute the query and get an iterator from the ResultSet
4) Iterate



On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> I need a way to do a full table scan across all of our data.
>
> Can’t I just use token() for this?
>
> This way I could split up our entire keyspace into say 1024 chunks, and
> then have one activemq task work with range 0, then range 1, etc… that way
> I can easily just map() my whole table.
>
> and since it’s token() I should (generally) read a contiguous range from a
> given table.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>