You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Boris Aleksandrovsky <ba...@gmail.com> on 2010/01/14 22:39:33 UTC

optimizing random reads

I have a moderately large HBase table of about 1M rows distributed across 4
region servers. I also have a requirement to retrieve a 1000 rows from that
table (one can assume keys are randomly distributed) at the same time. I
would ideally like to have a facility to batch read of all 1000 rows at the
same time, but I do not think HBase has such a facility. The only way I was
able to find is to sequentially retrieve one row at a time using Get row
API. Is there a way to improve on it?

One way I can think of is to create an HTable object pool and issue
concurrent requests to HBase. This would probably help, but I do not expect
for performance to increase X time, where X is the number of threads in my
application.

Is there a clever way to batch requests, so at least you can issue multiple
row reads to the same region server at the same time? Any other tricks or
suggestions will be appreciated.

-- 
Thanks,

Boris
http://twitter.com/baleksan
http://www.linkedin.com/in/baleksan

Re: optimizing random reads

Posted by Boris Aleksandrovsky <ba...@gmail.com>.

Thanks Joydeep and St.Ack, will follow your suggestions!

On Thu, Jan 14, 2010 at 2:46 PM, stack <st...@duboce.net> wrote:

> We need a multiget (HBASE-1845).
> St.Ack
>
>
>
> On Thu, Jan 14, 2010 at 1:39 PM, Boris Aleksandrovsky <baleksan@gmail.com
> >wrote:
>
> > I have a moderately large HBase table of about 1M rows distributed across
> 4
> > region servers. I also have a requirement to retrieve a 1000 rows from
> that
> > table (one can assume keys are randomly distributed) at the same time. I
> > would ideally like to have a facility to batch read of all 1000 rows at
> the
> > same time, but I do not think HBase has such a facility. The only way I
> was
> > able to find is to sequentially retrieve one row at a time using Get row
> > API. Is there a way to improve on it?
> >
> > One way I can think of is to create an HTable object pool and issue
> > concurrent requests to HBase. This would probably help, but I do not
> expect
> > for performance to increase X time, where X is the number of threads in
> my
> > application.
> >
> > Is there a clever way to batch requests, so at least you can issue
> multiple
> > row reads to the same region server at the same time? Any other tricks or
> > suggestions will be appreciated.
> >
> > --
> > Thanks,
> >
> > Boris
> > http://twitter.com/baleksan
> > http://www.linkedin.com/in/baleksan
> >
>



-- 
Thanks,

Boris
http://twitter.com/baleksan
http://www.linkedin.com/in/baleksan

Re: optimizing random reads

Posted by Boris Aleksandrovsky <ba...@gmail.com>.

Thanks Joydeep and St.Ack, will follow your suggestions!

On Thu, Jan 14, 2010 at 2:46 PM, stack <st...@duboce.net> wrote:

> We need a multiget (HBASE-1845).
> St.Ack
>
>
>
> On Thu, Jan 14, 2010 at 1:39 PM, Boris Aleksandrovsky <baleksan@gmail.com
> >wrote:
>
> > I have a moderately large HBase table of about 1M rows distributed across
> 4
> > region servers. I also have a requirement to retrieve a 1000 rows from
> that
> > table (one can assume keys are randomly distributed) at the same time. I
> > would ideally like to have a facility to batch read of all 1000 rows at
> the
> > same time, but I do not think HBase has such a facility. The only way I
> was
> > able to find is to sequentially retrieve one row at a time using Get row
> > API. Is there a way to improve on it?
> >
> > One way I can think of is to create an HTable object pool and issue
> > concurrent requests to HBase. This would probably help, but I do not
> expect
> > for performance to increase X time, where X is the number of threads in
> my
> > application.
> >
> > Is there a clever way to batch requests, so at least you can issue
> multiple
> > row reads to the same region server at the same time? Any other tricks or
> > suggestions will be appreciated.
> >
> > --
> > Thanks,
> >
> > Boris
> > http://twitter.com/baleksan
> > http://www.linkedin.com/in/baleksan
> >
>



-- 
Thanks,

Boris
http://twitter.com/baleksan
http://www.linkedin.com/in/baleksan

Re: optimizing random reads

Posted by stack <st...@duboce.net>.

We need a multiget (HBASE-1845).
St.Ack



On Thu, Jan 14, 2010 at 1:39 PM, Boris Aleksandrovsky <ba...@gmail.com>wrote:

> I have a moderately large HBase table of about 1M rows distributed across 4
> region servers. I also have a requirement to retrieve a 1000 rows from that
> table (one can assume keys are randomly distributed) at the same time. I
> would ideally like to have a facility to batch read of all 1000 rows at the
> same time, but I do not think HBase has such a facility. The only way I was
> able to find is to sequentially retrieve one row at a time using Get row
> API. Is there a way to improve on it?
>
> One way I can think of is to create an HTable object pool and issue
> concurrent requests to HBase. This would probably help, but I do not expect
> for performance to increase X time, where X is the number of threads in my
> application.
>
> Is there a clever way to batch requests, so at least you can issue multiple
> row reads to the same region server at the same time? Any other tricks or
> suggestions will be appreciated.
>
> --
> Thanks,
>
> Boris
> http://twitter.com/baleksan
> http://www.linkedin.com/in/baleksan
>

Re: optimizing random reads

Posted by stack <st...@duboce.net>.

We need a multiget (HBASE-1845).
St.Ack



On Thu, Jan 14, 2010 at 1:39 PM, Boris Aleksandrovsky <ba...@gmail.com>wrote:

> I have a moderately large HBase table of about 1M rows distributed across 4
> region servers. I also have a requirement to retrieve a 1000 rows from that
> table (one can assume keys are randomly distributed) at the same time. I
> would ideally like to have a facility to batch read of all 1000 rows at the
> same time, but I do not think HBase has such a facility. The only way I was
> able to find is to sequentially retrieve one row at a time using Get row
> API. Is there a way to improve on it?
>
> One way I can think of is to create an HTable object pool and issue
> concurrent requests to HBase. This would probably help, but I do not expect
> for performance to increase X time, where X is the number of threads in my
> application.
>
> Is there a clever way to batch requests, so at least you can issue multiple
> row reads to the same region server at the same time? Any other tricks or
> suggestions will be appreciated.
>
> --
> Thanks,
>
> Boris
> http://twitter.com/baleksan
> http://www.linkedin.com/in/baleksan
>

Re: optimizing random reads

Posted by Joydeep Sarma <js...@gmail.com>.

increasing threads would increase performance - but of course it's
subject to limitations of what hbase end can do.

sometime recently, we had run a similar benchmark (4 RS, 16 disks) and
random iops over a very large dataset scaled from 85 op/s@~10ms to
500op/s@50ms peak. I don't remember the number of threads at the peak
(it was about 30-50 i believe). for these sort of benchmarks - we have
tried many-threads@single-machine to few-threads@multiple-machines -
and there's been some difference (in favor of the latter) - but not
significant (particularly for random reads). (so client side
multithreading issues are not significant for this sort of test).

in our case - the data set was large enough that we were bound by the
disk reads (by DFS) (iostat showed heavy disk traffic). Depending on
ur data size and key locality - the speedup may be more.

we weren't using lzo compression (that would have produced better
numbers most likely).

On Thu, Jan 14, 2010 at 1:39 PM, Boris Aleksandrovsky
<ba...@gmail.com> wrote:
> I have a moderately large HBase table of about 1M rows distributed across 4
> region servers. I also have a requirement to retrieve a 1000 rows from that
> table (one can assume keys are randomly distributed) at the same time. I
> would ideally like to have a facility to batch read of all 1000 rows at the
> same time, but I do not think HBase has such a facility. The only way I was
> able to find is to sequentially retrieve one row at a time using Get row
> API. Is there a way to improve on it?
>
> One way I can think of is to create an HTable object pool and issue
> concurrent requests to HBase. This would probably help, but I do not expect
> for performance to increase X time, where X is the number of threads in my
> application.
>
> Is there a clever way to batch requests, so at least you can issue multiple
> row reads to the same region server at the same time? Any other tricks or
> suggestions will be appreciated.
>
> --
> Thanks,
>
> Boris
> http://twitter.com/baleksan
> http://www.linkedin.com/in/baleksan
>