You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ken Matsumoto <ke...@nri.com> on 2010/07/29 03:51:24 UTC

any better way to retrieve data than using get_range_slices

Hi all,

Are there any better way to retrieve data from Cassandra than using 
get_range_slices?

Now I'm going to port some programs using MySQL to Cassandra. The 
program query is like
below:
"select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and 
locationID = 1"
The result of the query will have over 1M records at a time.

In Cassandra, get_range_slices can only return 600 rows in our H/W 
condition.
We have to iterate get_range_slices many times, but it takes a lot of 
time in the lineary manner.

Is Cassandra not suitable for this kind of usage or not?

Best regards,

Ken.

-- 
Ken Matsumoto
VP / Research & Development
Nomura Research Institute America, Inc.
NRI Pacific
1400 Fashion Island Blvd., Suite 1010
San Mateo, CA 94404, U.S.A.

PLEASE READ：This e-mail is confidential and intended for the named 
recipient only. If you are not an intended recipient, please notify the 
sender and delete this e-mail.

Re: any better way to retrieve data than using get_range_slices

Posted by Ken Matsumoto <ke...@nri.com>.

Thank you, Aaron.

Yes, we're now thinking Hadoop would be one of choices, too.
So far, it doesn't matter if we use "SQL" or not as long as Cassandra
can process millions of rows at a time in a practical time.

As a result, what kind of patterns should be Cassandra more powerful 
than MySQL from the point of access patterns?

Best,

Ken.

On 2010/07/28 19:20, Aaron Morton wrote:
> If you want to process millions of rows at a time take a look at the
> Hadoop and Pig integration. Try the Cloudera distro of Hadoop CHD3 it
> includes Pig with it.
>
> Pig is a "SQL" like language for doing large scale data analysis that
> compiles down to Java that is run in Hadoop jobs.
> http://hadoop.apache.org/pig/
>
> There are examples in the contrib directory in the source and some
> information in the wiki.
>
> I'd be interested to know how you get on, as hopefully I'll get to play
> with it soon.
> Aaron
>
>
> On 29 Jul, 2010,at 01:51 PM, Ken Matsumoto <ke...@nri.com> wrote:
>
>> Hi all,
>>
>> Are there any better way to retrieve data from Cassandra than using
>> get_range_slices?
>>
>> Now I'm going to port some programs using MySQL to Cassandra. The
>> program query is like
>> below:
>> "select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and
>> locationID = 1"
>> The result of the query will have over 1M records at a time.
>>
>> In Cassandra, get_range_slices can only return 600 rows in our H/W
>> condition.
>> We have to iterate get_range_slices many times, but it takes a lot of
>> time in the lineary manner.
>>
>> Is Cassandra not suitable for this kind of usage or not?
>>
>> Best regards,
>>
>> Ken.
>>
>> --
>> Ken Matsumoto
>> VP / Research & Development
>> Nomura Research Institute America, Inc.
>> NRI Pacific
>> 1400 Fashion Island Blvd., Suite 1010
>> San Mateo, CA 94404, U.S.A.
>>
>> PLEASE READ：This e-mail is confidential and intended for the named
>> recipient only. If you are not an intended recipient, please notify the
>> sender and delete this e-mail.
>>


-- 
Ken Matsumoto
VP / Research & Development
Nomura Research Institute America, Inc.
NRI Pacific
1400 Fashion Island Blvd., Suite 1010
San Mateo, CA 94404, U.S.A.

PLEASE READ：This e-mail is confidential and intended for the named 
recipient only. If you are not an intended recipient, please notify the 
sender and delete this e-mail.

Re: any better way to retrieve data than using get_range_slices

Posted by Aaron Morton <aa...@thelastpickle.com>.

If you want to process millions of rows at a time take a look at the Hadoop and Pig integration. Try the Cloudera distro of Hadoop CHD3 it includes Pig with it.

Pig is a "SQL" like language for doing large scale data analysis that compiles down to Java that is run in Hadoop jobs.
http://hadoop.apache.org/pig/

There are examples in the contrib directory in the source and some information in the wiki.

I'd be interested to know how you get on, as hopefully I'll get to play with it soon.
Aaron

On 29 Jul, 2010,at 01:51 PM, Ken Matsumoto <ke...@nri.com> wrote:

> Hi all,
>
> Are there any better way to retrieve data from Cassandra than using
> get_range_slices?
>
> Now I'm going to port some programs using MySQL to Cassandra. The
> program query is like
> below:
> "select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and
> locationID = 1"
> The result of the query will have over 1M records at a time.
>
> In Cassandra, get_range_slices can only return 600 rows in our H/W
> condition.
> We have to iterate get_range_slices many times, but it takes a lot of
> time in the lineary manner.
>
> Is Cassandra not suitable for this kind of usage or not?
>
> Best regards,
>
> Ken.
>
> -- 
> Ken Matsumoto
> VP / Research & Development
> Nomura Research Institute America, Inc.
> NRI Pacific
> 1400 Fashion Island Blvd., Suite 1010
> San Mateo, CA 94404, U.S.A.
>
> PLEASE READ：This e-mail is confidential and intended for the named
> recipient only. If you are not an intended recipient, please notify the
> sender and delete this e-mail.
>