You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Wei Zhu <wz...@yahoo.com> on 2012/12/10 22:07:47 UTC

multiget_slice SlicePredicate

I know it's probably not a good idea to use multiget, but for my use case, it's the only choice,

I have question regarding the SlicePredicate argument of the multiget_slice


The SlicePredicate takes slice_range which takes start, end and range. I suppose start and end will apply to each individual row. How about range, is it a accumulative column count of all the rows or to the individual row? 
If I set range to 100, is it 100 columns per row, or total?

Thanks for you reply,
-Wei


multiget_slice
	* map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

Re: multiget_slice SlicePredicate

Posted by Jason Wee <pe...@gmail.com>.

if you have rows like 10k and get 100 column per row, this gonna choke the
cluster...been there. if you really still have to use multiget_slice, try
slice your data before calling multiget_slice and check if your cluster
read request pending increase... try to slow down the client sending
request to the cluster if the pending going up. :)


On Tue, Dec 11, 2012 at 6:15 AM, Wei Zhu <wz...@yahoo.com> wrote:

> Well, not sure how parallel is multiget. Someone is saying it's in
> parallel sending requests to the different nodes and on each node it's
> executed sequentially. I didn't bother looking into the source code yet.
> Anyone knows it for sure?
>
> I am using Hector, just copied the thrift definition from Cassandra site
> for reference.
>
> You are right, the count is for each individual row.
>
> Thanks.
> -Wei
>
>   ------------------------------
> *From:* "Hiller, Dean" <De...@nrel.gov>
> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>; Wei Zhu <
> wz1975@yahoo.com>
> *Sent:* Monday, December 10, 2012 1:13 PM
> *Subject:* Re: multiget_slice SlicePredicate
>
> What's wrong with multiget…parallel performance is great from multiple
> disks and so usually that is a good thing.
>
> Also, something looks wrong, since you have list<binary> keys, I would
> expect the Map to be Map<binary, list<ColumnOrSuperColumn>>
>
> Are you sure you have that correct?  IF you set range to 100, it should be
> 100 columns each row but it never hurts to run the code and verify.
>
> Later,
> Dean
> PlayOrm Developer
>
>
> From: Wei Zhu <wz...@yahoo.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>, Wei Zhu <
> wz1975@yahoo.com<ma...@yahoo.com>>
> Date: Monday, December 10, 2012 2:07 PM
> To: Cassandr usergroup <user@cassandra.apache.org<mailto:
> user@cassandra.apache.org>>
> Subject: multiget_slice SlicePredicate
>
> I know it's probably not a good idea to use multiget, but for my use case,
> it's the only choice,
>
> I have question regarding the SlicePredicate argument of the multiget_slice
>
>
> The SlicePredicate takes slice_range which takes start, end and range. I
> suppose start and end will apply to each individual row. How about range,
> is it a accumulative column count of all the rows or to the individual row?
> If I set range to 100, is it 100 columns per row, or total?
>
> Thanks for you reply,
> -Wei
>
> multiget_slice
>
> *
> map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys,
> ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel
> consistency_level)
>
>
>
>
>

Re: multiget_slice SlicePredicate

Posted by aaron morton <aa...@thelastpickle.com>.

I tend to caution against making very large batch mutations or multi gets, by which I mean 100's of rows at a time. 

Each row request becomes a task and they can temporarily fill the mutation or read thread pool. Meaning overall *client* request throughout drops while a big request is chewed through.  

This this is more of an issue with smaller clusters. As Dean says, the client request is performed in parallel on multiple machines. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/12/2012, at 3:03 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

> Each node is doing it's thing in parallel….they on purpose do NOT co-ordinate as they do not need to so each one is doing it's scan on the rows it has individually.
> 
> If all rows "happen" to be on the same server, sure some may be done sequentially depending on number of rows vs. thread pool size.
> 
> As far as a single row is concerned, I know mutations to a single row are serialised as Aaron has said as much but you are talking about multiple rows here.
> 
> Later,
> Dean
> 
> From: Wei Zhu <wz...@yahoo.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>, Wei Zhu <wz...@yahoo.com>>
> Date: Monday, December 10, 2012 3:15 PM
> To: Cassandr usergroup <us...@cassandra.apache.org>>
> Subject: Re: multiget_slice SlicePredicate
> 
> Well, not sure how parallel is multiget. Someone is saying it's in parallel sending requests to the different nodes and on each node it's executed sequentially. I didn't bother looking into the source code yet. Anyone knows it for sure?
> 
> I am using Hector, just copied the thrift definition from Cassandra site for reference.
> 
> You are right, the count is for each individual row.
> 
> Thanks.
> -Wei
> 
> ________________________________
> From: "Hiller, Dean" <De...@nrel.gov>>
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>; Wei Zhu <wz...@yahoo.com>>
> Sent: Monday, December 10, 2012 1:13 PM
> Subject: Re: multiget_slice SlicePredicate
> 
> What's wrong with multiget…parallel performance is great from multiple disks and so usually that is a good thing.
> 
> Also, something looks wrong, since you have list<binary> keys, I would expect the Map to be Map<binary, list<ColumnOrSuperColumn>>
> 
> Are you sure you have that correct?  IF you set range to 100, it should be 100 columns each row but it never hurts to run the code and verify.
> 
> Later,
> Dean
> PlayOrm Developer
> 
> 
> From: Wei Zhu <wz...@yahoo.com>>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>" <us...@cassandra.apache.org>>>, Wei Zhu <wz...@yahoo.com>>>
> Date: Monday, December 10, 2012 2:07 PM
> To: Cassandr usergroup <us...@cassandra.apache.org>>>
> Subject: multiget_slice SlicePredicate
> 
> I know it's probably not a good idea to use multiget, but for my use case, it's the only choice,
> 
> I have question regarding the SlicePredicate argument of the multiget_slice
> 
> 
> The SlicePredicate takes slice_range which takes start, end and range. I suppose start and end will apply to each individual row. How about range, is it a accumulative column count of all the rows or to the individual row?
> If I set range to 100, is it 100 columns per row, or total?
> 
> Thanks for you reply,
> -Wei
> 
> multiget_slice
> 
> *
> map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)
> 
> 
> 
>

Re: multiget_slice SlicePredicate

Posted by "Hiller, Dean" <De...@nrel.gov>.

Each node is doing it's thing in parallel….they on purpose do NOT co-ordinate as they do not need to so each one is doing it's scan on the rows it has individually.

If all rows "happen" to be on the same server, sure some may be done sequentially depending on number of rows vs. thread pool size.

As far as a single row is concerned, I know mutations to a single row are serialised as Aaron has said as much but you are talking about multiple rows here.

Later,
Dean

From: Wei Zhu <wz...@yahoo.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>, Wei Zhu <wz...@yahoo.com>>
Date: Monday, December 10, 2012 3:15 PM
To: Cassandr usergroup <us...@cassandra.apache.org>>
Subject: Re: multiget_slice SlicePredicate

Well, not sure how parallel is multiget. Someone is saying it's in parallel sending requests to the different nodes and on each node it's executed sequentially. I didn't bother looking into the source code yet. Anyone knows it for sure?

I am using Hector, just copied the thrift definition from Cassandra site for reference.

You are right, the count is for each individual row.

Thanks.
-Wei

________________________________
From: "Hiller, Dean" <De...@nrel.gov>>
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>; Wei Zhu <wz...@yahoo.com>>
Sent: Monday, December 10, 2012 1:13 PM
Subject: Re: multiget_slice SlicePredicate

What's wrong with multiget…parallel performance is great from multiple disks and so usually that is a good thing.

Also, something looks wrong, since you have list<binary> keys, I would expect the Map to be Map<binary, list<ColumnOrSuperColumn>>

Are you sure you have that correct?  IF you set range to 100, it should be 100 columns each row but it never hurts to run the code and verify.

Later,
Dean
PlayOrm Developer


From: Wei Zhu <wz...@yahoo.com>>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>" <us...@cassandra.apache.org>>>, Wei Zhu <wz...@yahoo.com>>>
Date: Monday, December 10, 2012 2:07 PM
To: Cassandr usergroup <us...@cassandra.apache.org>>>
Subject: multiget_slice SlicePredicate

I know it's probably not a good idea to use multiget, but for my use case, it's the only choice,

I have question regarding the SlicePredicate argument of the multiget_slice


The SlicePredicate takes slice_range which takes start, end and range. I suppose start and end will apply to each individual row. How about range, is it a accumulative column count of all the rows or to the individual row?
If I set range to 100, is it 100 columns per row, or total?

Thanks for you reply,
-Wei

multiget_slice

*
map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

Re: multiget_slice SlicePredicate

Posted by Wei Zhu <wz...@yahoo.com>.

Well, not sure how parallel is multiget. Someone is saying it's in parallel sending requests to the different nodes and on each node it's executed sequentially. I didn't bother looking into the source code yet. Anyone knows it for sure?

I am using Hector, just copied the thrift definition from Cassandra site for reference.

You are right, the count is for each individual row.

Thanks.
-Wei 


________________________________
 From: "Hiller, Dean" <De...@nrel.gov>
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>; Wei Zhu <wz...@yahoo.com> 
Sent: Monday, December 10, 2012 1:13 PM
Subject: Re: multiget_slice SlicePredicate
 
What's wrong with multiget…parallel performance is great from multiple disks and so usually that is a good thing.

Also, something looks wrong, since you have list<binary> keys, I would expect the Map to be Map<binary, list<ColumnOrSuperColumn>>

Are you sure you have that correct?  IF you set range to 100, it should be 100 columns each row but it never hurts to run the code and verify.

Later,
Dean
PlayOrm Developer


From: Wei Zhu <wz...@yahoo.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>, Wei Zhu <wz...@yahoo.com>>
Date: Monday, December 10, 2012 2:07 PM
To: Cassandr usergroup <us...@cassandra.apache.org>>
Subject: multiget_slice SlicePredicate

I know it's probably not a good idea to use multiget, but for my use case, it's the only choice,

I have question regarding the SlicePredicate argument of the multiget_slice


The SlicePredicate takes slice_range which takes start, end and range. I suppose start and end will apply to each individual row. How about range, is it a accumulative column count of all the rows or to the individual row?
If I set range to 100, is it 100 columns per row, or total?

Thanks for you reply,
-Wei

multiget_slice

*
map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

Re: multiget_slice SlicePredicate

Posted by "Hiller, Dean" <De...@nrel.gov>.

What's wrong with multiget…parallel performance is great from multiple disks and so usually that is a good thing.

Also, something looks wrong, since you have list<binary> keys, I would expect the Map to be Map<binary, list<ColumnOrSuperColumn>>

Are you sure you have that correct?  IF you set range to 100, it should be 100 columns each row but it never hurts to run the code and verify.

Later,
Dean
PlayOrm Developer


From: Wei Zhu <wz...@yahoo.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>, Wei Zhu <wz...@yahoo.com>>
Date: Monday, December 10, 2012 2:07 PM
To: Cassandr usergroup <us...@cassandra.apache.org>>
Subject: multiget_slice SlicePredicate

I know it's probably not a good idea to use multiget, but for my use case, it's the only choice,

I have question regarding the SlicePredicate argument of the multiget_slice


The SlicePredicate takes slice_range which takes start, end and range. I suppose start and end will apply to each individual row. How about range, is it a accumulative column count of all the rows or to the individual row?
If I set range to 100, is it 100 columns per row, or total?

Thanks for you reply,
-Wei

multiget_slice

 *
map<string,list<ColumnOrSuperColumn>> multiget_slice(list<binary> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)