You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Billy Pearson <sa...@pearsonwholesale.com> on 2008/10/27 07:22:43 UTC

how many rows per GB memory

Is there some numbers I can figure on per GB of heap that I can get in to a 
regionserver?

say something like this:
(x bytes avg per rowkey * max rows) / index interval = y GB Heap


Billy





Re: how many rows per GB memory

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
sorry late here but looking to see if there is a way to figure how much 
memory each row uses of the heap

Billy


"Billy Pearson" <sa...@pearsonwholesale.com> 
wrote in message news:ge3mng$sds$1@ger.gmane.org...
> Is there some numbers I can figure on per GB of heap that I can get in to 
> a regionserver?
>
> say something like this:
> (x bytes avg per rowkey * max rows) / index interval = y GB Heap
>
>
> Billy
>
>
>
>
> 



Re: how many rows per GB memory

Posted by stack <st...@duboce.net>.
Billy Pearson wrote:
> Say I have 8 regions each with 4 families how much memory should it 
> take to load the index files and do read request on the regions 
> assuming no write will happen.
>
If no writes, then memcaches will have no content.  So, only memory load 
should be the indexes on the store files.

> Currently I have compression turned on for the family and they hold 
> around 900M rows total but use 1.3GB of
> memory when the region server is started and done loading regions for 
> the table.
> Is there a way to lower that memory usage?
>
> If I add up all the index files that are with the data mapfiles they 
> total only 173MB
> So do we hold only index data for reads if so why is my region server 
> using 1.3GB vs something like 173MB on start up? 

I'm not sure why its so much.  YOu have your heap size set to 2G?  Maybe 
the churn around startup had the heap grow out to 1.3G but now its 
settled, it might run in less?  You could try setting down your heapsize 
if you're > 1G default.  Also, enable gc logging.   That'll give you a 
better clue as to how much memory is actually being used.  Add something 
like this to your hbase-env.sh:

export HBASE_OPTS="-server -Xloggc:/tmp/gc.log"

St.Ack

Re: how many rows per GB memory

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
I understand the memory requirements needed to hold all the memcache for 
writes.

Say I have 8 regions each with 4 families how much memory should it take to 
load the index files and do read request on the regions assuming no write 
will happen.

Currently I have compression turned on for the family and they hold around 
900M rows total but use 1.3GB of
memory when the region server is started and done loading regions for the 
table.
Is there a way to lower that memory usage?

If I add up all the index files that are with the data mapfiles they total 
only 173MB
So do we hold only index data for reads if so why is my region server using 
1.3GB vs something like 173MB on start up?

Billy




"Jim Kellerman (POWERSET)" 
<Ji...@microsoft.com> wrote in message 
news:EAFDEC03CDA5D644878904F4A8F0158A5FE9070954@NA-EXMSG-C103.redmond.corp.microsoft.com...
The 64MB cache flush size is configurable.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf 
> Of Billy Pearson
> Sent: Monday, October 27, 2008 12:07 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: how many rows per GB memory
>
> Is the 64MB changeable or is that a hard limit in the code?
> Billy
>
> "Jim Kellerman (POWERSET)"
> <Ji...@microsoft.com> wrote in message
> news:EAFDEC03CDA5D644878904F4A8F0158A5FE9070743@NA-EXMSG-
> C103.redmond.corp.microsoft.com...
> Each memcache will consume approximately
>  ((x bytes per key (HStoreKey = row + column + timestamp)) +
>   (avg value size))
>  * (number of rows)
> which is limited to 64MB per memcache.
>
> The heap size required is determined by:
>  (number of regions being hosted) * (number of families) * 64MB
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
> > -----Original Message-----
> > From: news [mailto:news@ger.gmane.org] On 
> > Behalf
> > Of Billy Pearson
> > Sent: Sunday, October 26, 2008 11:23 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: how many rows per GB memory
> >
> > Is there some numbers I can figure on per GB of heap that I can get in
> to
> > a
> > regionserver?
> >
> > say something like this:
> > (x bytes avg per rowkey * max rows) / index interval = y GB Heap
> >
> >
> > Billy
> >
> >
> >
> >
>
>
>




RE: how many rows per GB memory

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
The 64MB cache flush size is configurable.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Billy Pearson
> Sent: Monday, October 27, 2008 12:07 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: how many rows per GB memory
>
> Is the 64MB changeable or is that a hard limit in the code?
> Billy
>
> "Jim Kellerman (POWERSET)"
> <Ji...@microsoft.com> wrote in message
> news:EAFDEC03CDA5D644878904F4A8F0158A5FE9070743@NA-EXMSG-
> C103.redmond.corp.microsoft.com...
> Each memcache will consume approximately
>  ((x bytes per key (HStoreKey = row + column + timestamp)) +
>   (avg value size))
>  * (number of rows)
> which is limited to 64MB per memcache.
>
> The heap size required is determined by:
>  (number of regions being hosted) * (number of families) * 64MB
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
> > -----Original Message-----
> > From: news [mailto:news@ger.gmane.org] On Behalf
> > Of Billy Pearson
> > Sent: Sunday, October 26, 2008 11:23 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: how many rows per GB memory
> >
> > Is there some numbers I can figure on per GB of heap that I can get in
> to
> > a
> > regionserver?
> >
> > say something like this:
> > (x bytes avg per rowkey * max rows) / index interval = y GB Heap
> >
> >
> > Billy
> >
> >
> >
> >
>
>
>


Re: how many rows per GB memory

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
Is the 64MB changeable or is that a hard limit in the code?
Billy

"Jim Kellerman (POWERSET)" 
<Ji...@microsoft.com> wrote in message 
news:EAFDEC03CDA5D644878904F4A8F0158A5FE9070743@NA-EXMSG-C103.redmond.corp.microsoft.com...
Each memcache will consume approximately
 ((x bytes per key (HStoreKey = row + column + timestamp)) +
  (avg value size))
 * (number of rows)
which is limited to 64MB per memcache.

The heap size required is determined by:
 (number of regions being hosted) * (number of families) * 64MB

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf 
> Of Billy Pearson
> Sent: Sunday, October 26, 2008 11:23 PM
> To: hbase-user@hadoop.apache.org
> Subject: how many rows per GB memory
>
> Is there some numbers I can figure on per GB of heap that I can get in to
> a
> regionserver?
>
> say something like this:
> (x bytes avg per rowkey * max rows) / index interval = y GB Heap
>
>
> Billy
>
>
>
>




RE: how many rows per GB memory

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
Each memcache will consume approximately
 ((x bytes per key (HStoreKey = row + column + timestamp)) +
  (avg value size))
 * (number of rows)
which is limited to 64MB per memcache.

The heap size required is determined by:
 (number of regions being hosted) * (number of families) * 64MB

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Billy Pearson
> Sent: Sunday, October 26, 2008 11:23 PM
> To: hbase-user@hadoop.apache.org
> Subject: how many rows per GB memory
>
> Is there some numbers I can figure on per GB of heap that I can get in to
> a
> regionserver?
>
> say something like this:
> (x bytes avg per rowkey * max rows) / index interval = y GB Heap
>
>
> Billy
>
>
>
>