You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2013/01/17 22:15:25 UTC

Hbase heap size

Hi,

I was wondering how much folks typical give to hbase and how much they
leave for the file system cache for the region server. I am using hbase
0.94 and running only the region server and data node daemons. I have a
system with 15G ram.

Thanks

Re: Hbase heap size

Posted by lars hofhansl <la...@apache.org>.
That is true.
Mind telling us more about your setup?I think that would be interesting knowledge.

-- Lars



________________________________
 From: Adrien Mogenet <ad...@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, January 18, 2013 12:28 PM
Subject: Re: Hbase heap size
 
On Fri, Jan 18, 2013 at 3:24 AM, lars hofhansl <la...@apache.org> wrote:

> - The largest useful region size is 20G (at least that is the current
> common tribal knowledge).
>

I'm using much larger region size (~200 GB) and it's not a real problem if
you're controlling compactions ; am I right ? It allows a strong reduction
of number of regions, and thus less memstores, smoother flushes, etc. Of
course, as usual it might depends on your workload but can perfectly fit
some needs IHMO.

-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Hbase heap size

Posted by Varun Sharma <va...@pinterest.com>.
I meant controlling compaction activity by emitting fewer hfiles but of
larger size.

On Fri, Jan 18, 2013 at 12:28 PM, Adrien Mogenet
<ad...@gmail.com>wrote:

> On Fri, Jan 18, 2013 at 3:24 AM, lars hofhansl <la...@apache.org> wrote:
>
> > - The largest useful region size is 20G (at least that is the current
> > common tribal knowledge).
> >
>
> I'm using much larger region size (~200 GB) and it's not a real problem if
> you're controlling compactions ; am I right ? It allows a strong reduction
> of number of regions, and thus less memstores, smoother flushes, etc. Of
> course, as usual it might depends on your workload but can perfectly fit
> some needs IHMO.
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Re: Hbase heap size

Posted by Adrien Mogenet <ad...@gmail.com>.
On Fri, Jan 18, 2013 at 3:24 AM, lars hofhansl <la...@apache.org> wrote:

> - The largest useful region size is 20G (at least that is the current
> common tribal knowledge).
>

I'm using much larger region size (~200 GB) and it's not a real problem if
you're controlling compactions ; am I right ? It allows a strong reduction
of number of regions, and thus less memstores, smoother flushes, etc. Of
course, as usual it might depends on your workload but can perfectly fit
some needs IHMO.

-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Hbase heap size

Posted by Varun Sharma <va...@pinterest.com>.
Thanks, Lars !

In my case, the amount of data on disk is a lot lower so I can do with
fewer regions. Neverthless, even if i set the flush cache too large - the
memstore lowerLimit and memstore upperLimit will cause flushes before we
need a lot of heap to support all the memstores. But then probably I will
get flushes before reaching the 600M limit.

I just found out that a 128M memstore for me gives an 8M sized hfile which
is tiny (the file is fast_diff encoded) which to me, sounds tiny in size.
So I felt that I should increase the flush size since the output files will
be anyways small in size. This would help reduce compaction activity. But
then yes, to your above comment, I with a 600M flush size and 3G to all
memstores, I can probably support around 5-10 regions per server.
Otherwise, I will hit the 3G ceiling too soon and memstore flushes will
happen far before reaching the 600M limit.

On Fri, Jan 18, 2013 at 4:45 AM, Chalcy Raja
<Ch...@careerbuilder.com>wrote:

> Looking forward to the blog!
>
> Thanks,
> Chalcy
>
> -----Original Message-----
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Thursday, January 17, 2013 9:24 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase heap size
>
> You'll  need more memory then, or more machines with not much disk
> attached.
>
> You can look at it this way:
> - The largest useful region size is 20G (at least that is the current
> common tribal knowledge).
> - Each region has at least one memstore (one per column family actually,
> let's just say one for the sake of argument).
>
> If you have 10T disks per region server then you need ~170 regions per
> region server (3*20G*170 ~ 10T).
> If you give the memstore 35% of your heap and have 128M memstores you
> would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
> If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap
> (if all memstores are being written to simultaneously).
>
> There are ways to address that.
> If you expect that not all memstores are written to at the same time, you
> can leave them smaller and increase their size multipliers, which allows
> them to be temporarily larger.
>
> Again, this is just back of the envelope.
>
> This is a lengthy topic, I'm planning a blog post around this. There are a
> bunch or parameters that can be tweaked based on workload.
>
> The main take away for HBase is that you have to match disk space with
> Java heap.
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <va...@pinterest.com>
> To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> Sent: Thursday, January 17, 2013 3:24 PM
> Subject: Re: Hbase heap size
>
> Thanks for the info. I am looking for a balance where I have a write heavy
> work load and need excellent read latency. So 40 % to block cache for
> caching, 35 % to memstore.
>
> But I would like to also reduce the number of HFiles and amount of
> compaction activity. So, having few number of regions and much larger
> memstore flush size - like 640M. Could a large memstore flush be a problem
> in some sense ? Are updates blocked on memstore flush ? In my case, I would
> expect a 600M sized memstore to materialize into a 200-300M sized HFile.
>
> On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:
>
> > A good rule of thumb that I found is to give each region server a Java
> > help that is roughly 1/100th of the size of the disk space per region
> > server.
> > (that is assuming all the default setting: 10G regions, 128M
> > memstores, 40% of heap for memstores, 20% of heap for block cache,
> > 3-way replication)
> >
> >
> > That is, if you give the region server a 10G heap, you can expect to
> > be able to serve about 1T worth of disk space.
> >
> > That can be tweaked of course (increase the region size to 20G, if
> > your load is mostly readonly you shrink the memstores, etc).
> > That way you can reduce that ratio to 1/200 or even less.
> >
> >
> > I'm sure other folks will have more detailed input.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Varun Sharma <va...@pinterest.com>
> > To: user@hbase.apache.org
> > Sent: Thursday, January 17, 2013 1:15 PM
> > Subject: Hbase heap size
> >
> > Hi,
> >
> > I was wondering how much folks typical give to hbase and how much they
> > leave for the file system cache for the region server. I am using
> > hbase
> > 0.94 and running only the region server and data node daemons. I have
> > a system with 15G ram.
> >
> > Thanks
> >
>

RE: Hbase heap size

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Looking forward to the blog!

Thanks,
Chalcy

-----Original Message-----
From: lars hofhansl [mailto:larsh@apache.org] 
Sent: Thursday, January 17, 2013 9:24 PM
To: user@hbase.apache.org
Subject: Re: Hbase heap size

You'll  need more memory then, or more machines with not much disk attached.

You can look at it this way:
- The largest useful region size is 20G (at least that is the current common tribal knowledge).
- Each region has at least one memstore (one per column family actually, let's just say one for the sake of argument).

If you have 10T disks per region server then you need ~170 regions per region server (3*20G*170 ~ 10T).
If you give the memstore 35% of your heap and have 128M memstores you would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap (if all memstores are being written to simultaneously).

There are ways to address that.
If you expect that not all memstores are written to at the same time, you can leave them smaller and increase their size multipliers, which allows them to be temporarily larger.

Again, this is just back of the envelope.

This is a lengthy topic, I'm planning a blog post around this. There are a bunch or parameters that can be tweaked based on workload.

The main take away for HBase is that you have to match disk space with Java heap.

-- Lars



________________________________
 From: Varun Sharma <va...@pinterest.com>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
Sent: Thursday, January 17, 2013 3:24 PM
Subject: Re: Hbase heap size
 
Thanks for the info. I am looking for a balance where I have a write heavy work load and need excellent read latency. So 40 % to block cache for caching, 35 % to memstore.

But I would like to also reduce the number of HFiles and amount of compaction activity. So, having few number of regions and much larger memstore flush size - like 640M. Could a large memstore flush be a problem in some sense ? Are updates blocked on memstore flush ? In my case, I would expect a 600M sized memstore to materialize into a 200-300M sized HFile.

On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:

> A good rule of thumb that I found is to give each region server a Java 
> help that is roughly 1/100th of the size of the disk space per region 
> server.
> (that is assuming all the default setting: 10G regions, 128M 
> memstores, 40% of heap for memstores, 20% of heap for block cache, 
> 3-way replication)
>
>
> That is, if you give the region server a 10G heap, you can expect to 
> be able to serve about 1T worth of disk space.
>
> That can be tweaked of course (increase the region size to 20G, if 
> your load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
>
>
> I'm sure other folks will have more detailed input.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <va...@pinterest.com>
> To: user@hbase.apache.org
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
>
> Hi,
>
> I was wondering how much folks typical give to hbase and how much they 
> leave for the file system cache for the region server. I am using 
> hbase
> 0.94 and running only the region server and data node daemons. I have 
> a system with 15G ram.
>
> Thanks
>

Re: Hbase heap size

Posted by lars hofhansl <la...@apache.org>.
You'll  need more memory then, or more machines with not much disk attached.

You can look at it this way:
- The largest useful region size is 20G (at least that is the current common tribal knowledge).
- Each region has at least one memstore (one per column family actually, let's just say one for the sake of argument).

If you have 10T disks per region server then you need ~170 regions per region server (3*20G*170 ~ 10T).
If you give the memstore 35% of your heap and have 128M memstores you would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap (if all memstores are being written to simultaneously).

There are ways to address that.
If you expect that not all memstores are written to at the same time, you can leave them smaller and increase their size multipliers, which allows them to be temporarily larger.

Again, this is just back of the envelope.

This is a lengthy topic, I'm planning a blog post around this. There are a bunch or parameters that can be tweaked based on workload.

The main take away for HBase is that you have to match disk space with Java heap.

-- Lars



________________________________
 From: Varun Sharma <va...@pinterest.com>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Thursday, January 17, 2013 3:24 PM
Subject: Re: Hbase heap size
 
Thanks for the info. I am looking for a balance where I have a write heavy
work load and need excellent read latency. So 40 % to block cache for
caching, 35 % to memstore.

But I would like to also reduce the number of HFiles and amount of
compaction activity. So, having few number of regions and much larger
memstore flush size - like 640M. Could a large memstore flush be a problem
in some sense ? Are updates blocked on memstore flush ? In my case, I would
expect a 600M sized memstore to materialize into a 200-300M sized HFile.

On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:

> A good rule of thumb that I found is to give each region server a Java
> help that is roughly 1/100th of the size of the disk space per region
> server.
> (that is assuming all the default setting: 10G regions, 128M memstores,
> 40% of heap for memstores, 20% of heap for block cache, 3-way replication)
>
>
> That is, if you give the region server a 10G heap, you can expect to be
> able to serve about 1T worth of disk space.
>
> That can be tweaked of course (increase the region size to 20G, if your
> load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
>
>
> I'm sure other folks will have more detailed input.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <va...@pinterest.com>
> To: user@hbase.apache.org
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
>
> Hi,
>
> I was wondering how much folks typical give to hbase and how much they
> leave for the file system cache for the region server. I am using hbase
> 0.94 and running only the region server and data node daemons. I have a
> system with 15G ram.
>
> Thanks
>

Re: Hbase heap size

Posted by Varun Sharma <va...@pinterest.com>.
Thanks for the info. I am looking for a balance where I have a write heavy
work load and need excellent read latency. So 40 % to block cache for
caching, 35 % to memstore.

But I would like to also reduce the number of HFiles and amount of
compaction activity. So, having few number of regions and much larger
memstore flush size - like 640M. Could a large memstore flush be a problem
in some sense ? Are updates blocked on memstore flush ? In my case, I would
expect a 600M sized memstore to materialize into a 200-300M sized HFile.

On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:

> A good rule of thumb that I found is to give each region server a Java
> help that is roughly 1/100th of the size of the disk space per region
> server.
> (that is assuming all the default setting: 10G regions, 128M memstores,
> 40% of heap for memstores, 20% of heap for block cache, 3-way replication)
>
>
> That is, if you give the region server a 10G heap, you can expect to be
> able to serve about 1T worth of disk space.
>
> That can be tweaked of course (increase the region size to 20G, if your
> load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
>
>
> I'm sure other folks will have more detailed input.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <va...@pinterest.com>
> To: user@hbase.apache.org
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
>
> Hi,
>
> I was wondering how much folks typical give to hbase and how much they
> leave for the file system cache for the region server. I am using hbase
> 0.94 and running only the region server and data node daemons. I have a
> system with 15G ram.
>
> Thanks
>

Re: Hbase heap size

Posted by lars hofhansl <la...@apache.org>.
A good rule of thumb that I found is to give each region server a Java help that is roughly 1/100th of the size of the disk space per region server.
(that is assuming all the default setting: 10G regions, 128M memstores, 40% of heap for memstores, 20% of heap for block cache, 3-way replication)


That is, if you give the region server a 10G heap, you can expect to be able to serve about 1T worth of disk space.

That can be tweaked of course (increase the region size to 20G, if your load is mostly readonly you shrink the memstores, etc).
That way you can reduce that ratio to 1/200 or even less.


I'm sure other folks will have more detailed input.


-- Lars



________________________________
 From: Varun Sharma <va...@pinterest.com>
To: user@hbase.apache.org 
Sent: Thursday, January 17, 2013 1:15 PM
Subject: Hbase heap size
 
Hi,

I was wondering how much folks typical give to hbase and how much they
leave for the file system cache for the region server. I am using hbase
0.94 and running only the region server and data node daemons. I have a
system with 15G ram.

Thanks