You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Himanshu Vashishtha <hv...@cs.ualberta.ca> on 2011/05/27 03:40:17 UTC

Coprocessor experiments

I did some experiments using coprocessors and compare the result with
vanilla scan, and in one case with mapreduce. I wrote up a blog about these
experiments as it was getting a bit difficult for me to explain it on mail,
without figures etc. Please refer to
http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html

The result seems to suggest the coprocessor endpoints are a useful feature
when one need to access a larger number of rows (well I can't quantify it as
of now) and generating some sparse results. The main advantage is that the
processing is done in parallel (region level granularity) and it can be
extended to come up with a parallel scanner functionality.
Interestingly, the single result coprocessor endpoints (aka the existing
one) fails when I increased the table data. I tried to do a row count on a
100m rows. I need to dig more into it, but have mentioned my initial
thoughts in the blog.

I want to test them more rigorously and will really appreciate your feedback
on the experiments. I have been on it for a while now, therefore need new
pair of eyes to do some review.

Thanks a lot for your time.

Cheers,
Himanshu

Re: Coprocessor experiments

Posted by Lars George <la...@gmail.com>.
Awesome Himanshu,

I was also trying to test using CPs and see where the sweetspot is
between number of threads to process in parallel, and overloading the
servers since you potentially send a heavy resource bound task to
already taxed servers and therefore taking a huge hit everywhere. I
was thinking of running a YCSB in parallel with mainly reads and then
compare the impact if I do a 1) linear, 2) MR based, and 3) CP based
full table scan.

Lars

On Fri, May 27, 2011 at 3:40 AM, Himanshu Vashishtha
<hv...@cs.ualberta.ca> wrote:
> I did some experiments using coprocessors and compare the result with
> vanilla scan, and in one case with mapreduce. I wrote up a blog about these
> experiments as it was getting a bit difficult for me to explain it on mail,
> without figures etc. Please refer to
> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>
> The result seems to suggest the coprocessor endpoints are a useful feature
> when one need to access a larger number of rows (well I can't quantify it as
> of now) and generating some sparse results. The main advantage is that the
> processing is done in parallel (region level granularity) and it can be
> extended to come up with a parallel scanner functionality.
> Interestingly, the single result coprocessor endpoints (aka the existing
> one) fails when I increased the table data. I tried to do a row count on a
> 100m rows. I need to dig more into it, but have mentioned my initial
> thoughts in the blog.
>
> I want to test them more rigorously and will really appreciate your feedback
> on the experiments. I have been on it for a while now, therefore need new
> pair of eyes to do some review.
>
> Thanks a lot for your time.
>
> Cheers,
> Himanshu
>