You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stack <st...@duboce.net> on 2010/01/25 19:29:30 UTC

Re: public numbers for IHBase? (was Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845)

No real numbers at the moment.  HBASE-2167 adds a
PerformanceEvaluation for IHBase (Indexed HBase).  PE is sort of not
the right use-case for IHBase with its largish, random values -- the
latter requires RAM and writes are slowed.  Nonetheless, search for
random values with the IHBase index can be up to two orders of
magnitude better in this hostile test: e.g.  20 scans for 20 random
values on a single node cluster with 1.5GB of memory allocated to the
RS VM.

Without an index: 732989ms at offset 0 for 1048576 rows
With an index: 2160ms at offset 0 for 1048576 rows

St.Ack

On Sun, Jan 24, 2010 at 1:17 AM, Andrew Purtell <ap...@apache.org> wrote:
> Stack, any way you might persuade the IHBase guys to post some numbers publicly?
> I'd like to know more.
>
>   - Andy
>
>
>
> ----- Original Message ----
>> From: Stack <st...@duboce.net>
>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845
> [...]
>> Let us know how IHBase works out for you (indexed hbase).  Its a RAM
>> hog but the speed improvement finding matching cells can be startling.
>
>
>
>
>

Re: public numbers for IHBase? (was Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845)

Posted by Dan Washusen <da...@reactive.org>.
Hi Sriram,
I can't really provide a recommended heap size at the moment.  For my tests
I'm using 5 nodes each with 48GB of memory (the region server get 8GB).  My
table contains about 30 millions rows with two columns that require an
index.  One column is a short integer and other a byte array.  Both my
indexed columns contain a lot of repetition (the byte[] can contain one of
thirty possible values, the short is a range between 1 and 20) which means
my index memory footprint isn't very big.  The region server VM's seem to
settle at around 6GB used (although I have increased the
hfile.block.cache.size property to 0.4).

Just doing some quick sums I don't think you are going to be able to use
IHbase in it's current state with your current hardware.  Assuming your user
+ date is something like "igorthebrave" + currentTimeMillis then you are
looking at over 20GB (25 bytes * 10 billion) of memory for the keys alone.

I'm sure as the IHbase contrib matures it will become better at this kind of
use case (for example, a disk backed index) but at the moment you'll have to
either add considerably more resources to your region servers or try and
work with the row key alone...

Cheers,
Dan


2010/1/26 Sriram Muthuswamy Chittathoor <sr...@ivycomptech.com>

> What I am finding is that it really hogs memeory when I was trying to
> insert rows and the region server kept crashing on me.  For example I was
> able to successfully create 2 million rows with around 2 GB but after that
> it keep needing more memory.
>
> Is this the experience with everyone or I am doing something wrong.
>  Basically at most on my Linux box I can go upto 2.7 GB on a 32 b it JVM.
>  For 1 billion rows and using IHBase -- what kind of memeory do I need
>
> Thanks
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Tuesday, January 26, 2010 12:00 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: public numbers for IHBase? (was Re: Support for MultiGet / SQL
> In clause -- error in patch HBASE-1845)
>
> No real numbers at the moment.  HBASE-2167 adds a
> PerformanceEvaluation for IHBase (Indexed HBase).  PE is sort of not
> the right use-case for IHBase with its largish, random values -- the
> latter requires RAM and writes are slowed.  Nonetheless, search for
> random values with the IHBase index can be up to two orders of
> magnitude better in this hostile test: e.g.  20 scans for 20 random
> values on a single node cluster with 1.5GB of memory allocated to the
> RS VM.
>
> Without an index: 732989ms at offset 0 for 1048576 rows
> With an index: 2160ms at offset 0 for 1048576 rows
>
> St.Ack
>
> On Sun, Jan 24, 2010 at 1:17 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> > Stack, any way you might persuade the IHBase guys to post some numbers
> publicly?
> > I'd like to know more.
> >
> >   - Andy
> >
> >
> >
> > ----- Original Message ----
> >> From: Stack <st...@duboce.net>
> >> Subject: Re: Support for MultiGet / SQL In clause -- error in patch
> HBASE-1845
> > [...]
> >> Let us know how IHBase works out for you (indexed hbase).  Its a RAM
> >> hog but the speed improvement finding matching cells can be startling.
> >
> >
> >
> >
> >
>
> This email is sent for and on behalf of Ivy Comptech Private Limited. Ivy
> Comptech Private Limited is a limited liability company.
>
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended recipient
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system.
> Any views or opinions are solely those of the sender.  This communication
> is not intended to form a binding contract on behalf of Ivy Comptech Private
> Limited unless expressly indicated to the contrary and properly authorised.
> Any actions taken on the basis of this email are at the recipient's own
> risk.
>
> Registered office:
> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara Hills,
> Hyderabad 500 033, Andhra Pradesh, India. Registered number: 37994.
> Registered in India. A list of members' names is available for inspection at
> the registered office.
>
>

RE: public numbers for IHBase? (was Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845)

Posted by Sriram Muthuswamy Chittathoor <sr...@ivycomptech.com>.
What I am finding is that it really hogs memeory when I was trying to insert rows and the region server kept crashing on me.  For example I was able to successfully create 2 million rows with around 2 GB but after that it keep needing more memory.

Is this the experience with everyone or I am doing something wrong.  Basically at most on my Linux box I can go upto 2.7 GB on a 32 b it JVM.  For 1 billion rows and using IHBase -- what kind of memeory do I need 

Thanks



-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, January 26, 2010 12:00 AM
To: hbase-user@hadoop.apache.org
Subject: Re: public numbers for IHBase? (was Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845)

No real numbers at the moment.  HBASE-2167 adds a
PerformanceEvaluation for IHBase (Indexed HBase).  PE is sort of not
the right use-case for IHBase with its largish, random values -- the
latter requires RAM and writes are slowed.  Nonetheless, search for
random values with the IHBase index can be up to two orders of
magnitude better in this hostile test: e.g.  20 scans for 20 random
values on a single node cluster with 1.5GB of memory allocated to the
RS VM.

Without an index: 732989ms at offset 0 for 1048576 rows
With an index: 2160ms at offset 0 for 1048576 rows

St.Ack

On Sun, Jan 24, 2010 at 1:17 AM, Andrew Purtell <ap...@apache.org> wrote:
> Stack, any way you might persuade the IHBase guys to post some numbers publicly?
> I'd like to know more.
>
>   - Andy
>
>
>
> ----- Original Message ----
>> From: Stack <st...@duboce.net>
>> Subject: Re: Support for MultiGet / SQL In clause -- error in patch HBASE-1845
> [...]
>> Let us know how IHBase works out for you (indexed hbase).  Its a RAM
>> hog but the speed improvement finding matching cells can be startling.
>
>
>
>
>

This email is sent for and on behalf of Ivy Comptech Private Limited. Ivy Comptech Private Limited is a limited liability company.  

This email and any attachments are confidential, and may be legally privileged and protected by copyright. If you are not the intended recipient dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. 
Any views or opinions are solely those of the sender.  This communication is not intended to form a binding contract on behalf of Ivy Comptech Private Limited unless expressly indicated to the contrary and properly authorised. Any actions taken on the basis of this email are at the recipient's own risk.

Registered office:
Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number: 37994. Registered in India. A list of members' names is available for inspection at the registered office.