You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jonathan Gray <jg...@facebook.com> on 2010/08/01 20:39:14 UTC

RE: Memory Consumption and Processing questions


> -----Original Message-----
> From: Jacques [mailto:whshub@gmail.com]
> Sent: Friday, July 30, 2010 1:16 PM
> To: user@hbase.apache.org
> Subject: Memory Consumption and Processing questions
> 
> Hello all,
> 
> I'm planning an hbase implementation and had some questions I was
> hoping
> someone could help with.
> 
> 1. Can someone give me a basic overview of how memory is used in Hbase?
>  Various places on the web people state that 16-24gb is the minimum for
> region servers if they also operate as hdfs/mr nodes.  Assuming that
> hdfs/mr
> nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> seems
> like lots of people suggesting use of even 24gb+ for hbase.  Why so
> much?
>  Is it simply to avoid gc problems?  Have data in memory for fast
> random
> reads? Or?

Where exactly are you reading this from?  I'm not actually aware of people using 24GB+ heaps for HBase.

I would not recommend using less than 4GB for RegionServers.  Beyond that, it very much depends on your application.  8GB is often sufficient but I've seen as much as 16GB used in production.

You need at least 4GB because of GC.  General experience has been that below that the CMS GC does not work well.

Memory is used primarily for the MemStores (write cache) and Block Cache (read cache).  In addition, memory is allocated as part of normal operations to store in-memory state and in processing reads.

> 2. What types of things put more/less pressure on memory?  I saw
> insinuation
> that insert speed can create substantial memory pressure.  What type of
> relative memory pressure do scanners, random reads, random writes,
> region
> quantity and compactions cause?

Writes are buffered and flushed to disk when the write buffer gets to a local or global limit.  The local limit (per region) defaults to 64MB.  The global limit is based on the total amount of heap available (default, I think, is 40%).  So there is interplay between how much heap you have and how many regions are actively written to.  If you have too many regions and not enough memory to allow them to hit the local/region limit, you end up flushing undersized files.

Scanning/random reading will utilize the block cache, if configured to.  The more room for the block cache, the more data you can keep in-memory.  Reads from the block cache are significantly faster than non-cached reads, obviously.

Compactions are not generally an issue.

> 2. How cpu intensive are the region servers?  It seems like most of
> their
> performance is based on i/o.  (I've noted the caution in starving
> region
> servers of cycles--which seems primarily focused on avoiding zk timeout
> >
> region reassignment problems.)  Does anyone suggest or recommend
> against
> dedicating only one or two cores to a region server?  Do individual
> compactions benefit from multiple cores are they single-threaded?

I would dedicate at least one core to a region server, but as we add more and more concurrency, it may become important to have two cores available.  Many things, like compactions, are only single threaded today but there's a very good chance you will be able to configure multiple threads in the next major release.

> 3. What are the memory and cpu resource demands of the master server?
> It
> seems like more and more of that load is moving to zk.

Not too much.  I'm putting a change in TRUNK right now that keeps all region assignments in the master, so there is some memory usage, but not much.  I would think 2GB heap and 1-2 cores is sufficient.

> 4. General HDFS question-- when the namenode dies, what happens to the
> datanodes and how does that relate to Hbase?  E.g., can hbase continue
> to
> operate in a read-only mode (assuming no datanode/regionserver failures
> post
> namenode failure)?

Today, HBase will probably die ungracefully once it does start to hit the NN.  There are some open JIRAs about HBase behavior under different HDFS faults and trying to be as graceful as possible when they happen, including HBASE-2183 about riding over an HDFS restart.

> 
> Thanks for your help,
> Jacques

Re: Memory Consumption and Processing questions

Posted by Jacques <wh...@gmail.com>.
For other's future record, it looks like that is HBASE-57.

Thanks for the info,
Jacques

On Mon, Aug 2, 2010 at 4:44 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> > cluster restart, is there any "memory" of which region servers last
> served
> > which regions or some other method to improve data locality?
>
> Nope, not yet. The new master code for 0.90 has some basics, but it's
> a bit complicated and we're not there yet. It basically requires
> asking the Namenode for the locations of every block of every regions,
> and compute what should go where.
>
> J-D
>

Re: Memory Consumption and Processing questions

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> cluster restart, is there any "memory" of which region servers last served
> which regions or some other method to improve data locality?

Nope, not yet. The new master code for 0.90 has some basics, but it's
a bit complicated and we're not there yet. It basically requires
asking the Namenode for the locations of every block of every regions,
and compute what should go where.

J-D

Re: Memory Consumption and Processing questions

Posted by Jacques <wh...@gmail.com>.
Wow, with that in mind, it seems like block cache is way more important than
I originally thought (versus os cache).  It also precludes (or reduces)
effective use of things like l2arc ssds on OpenSolaris.  Thanks for pointing
that out.

Your mention of locality reminds me of a question that came up after reading
Lars George's excellent writeup here:
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

<http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html>Upon
cluster restart, is there any "memory" of which region servers last served
which regions or some other method to improve data locality?

I know I could get this answer reviewing the code but I just haven't gotten
to that level of detail yet.

thanks

On Mon, Aug 2, 2010 at 4:21 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Something to keep in mind is that the block cache is within the region
> server's JVM, whereas it has to go on the network to get data from the
> DNs (which should always be slower even if it's in the OS cache). But,
> on a production system, regions don't move that much so the local DN
> should always contain the blocks for it's RS's regions. If
> https://issues.apache.org/jira/browse/HDFS-347 was there, block
> caching could be almost useless if the OS is given a lot of room and
> there would be no need for IB and whatnot.
>
> J-D
>
> On Mon, Aug 2, 2010 at 4:00 PM, Jacques <wh...@gmail.com> wrote:
> > Makes me wonder if high speed interconnects and little to no block cache
> > would work better--basically rely on each machine to hold the highly used
> > blocks in os cache and push them around quickly if they are needed
> > elsewhere.  Of course it's all just a thought experiment at this point.
>  The
> > cost of having high speed interconnects would probably be substantially
> more
> > than provisioning extra memory to hold cached blocks twice.  There is
> also
> > the thought that if the blocks are cached by Hbase, they would appear
> rarely
> > used from the os standpoint and are, therefore, unlikely to be in cache.
> >
> >
> >
> >
> > On Mon, Aug 2, 2010 at 8:39 AM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >
> >> On Mon, Aug 2, 2010 at 11:33 AM, Jacques <wh...@gmail.com> wrote:
> >> > You're right, of course.  I shouldn't generalize too much.  I'm more
> >> trying
> >> > to understand the landscape than pinpoint anything specific.
> >> >
> >> > Quick question: since the block cache is unaware of the location of
> >> files,
> >> > wouldn't it overlap the os cache for hfiles once they are localized
> after
> >> > compaction?  Any guidance on how to tune the two?
> >> >
> >> > thanks,
> >> > Jacques
> >> >
> >> > On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <jg...@facebook.com>
> >> wrote:
> >> >
> >> >> One reason not to extrapolate that is that leaving lots of memory for
> >> the
> >> >> linux buffer cache is a good way to improve overall performance of
> >> typically
> >> >> i/o bound applications like Hadoop and HBase.
> >> >>
> >> >> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
> >> >> generally require almost no significant memory (though generally run
> >> with
> >> >> 1GB); their performance will improve with more free memory for the os
> >> buffer
> >> >> cache.  As for MR, this completely depends on the tasks running.  The
> >> >> TaskTrackers also don't require significant memory, so this
> completely
> >> >> depends on the number of tasks per node and the memory requirements
> of
> >> the
> >> >> tasks.
> >> >>
> >> >> Unfortunately you can't always generalize the requirements too much,
> >> >> especially in MR.
> >> >>
> >> >> JG
> >> >>
> >> >> > -----Original Message-----
> >> >> > From: Jacques [mailto:whshub@gmail.com]
> >> >> > Sent: Sunday, August 01, 2010 5:30 PM
> >> >> > To: user@hbase.apache.org
> >> >> > Subject: Re: Memory Consumption and Processing questions
> >> >> >
> >> >> > Thanks, that was very helpful.
> >> >> >
> >> >> > Regarding 24gb-- I saw people using servers with 32gb of server
> memory
> >> >> > (a
> >> >> > recent thread here and hstack.org).  I extrapolated the use since
> it
> >> >> > seems
> >> >> > most people use ~8 for hdfs/mr.
> >> >> >
> >> >> > -Jacques
> >> >> >
> >> >> >
> >> >> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jgray@facebook.com
> >
> >> >> > wrote:
> >> >> >
> >> >> > >
> >> >> > >
> >> >> > > > -----Original Message-----
> >> >> > > > From: Jacques [mailto:whshub@gmail.com]
> >> >> > > > Sent: Friday, July 30, 2010 1:16 PM
> >> >> > > > To: user@hbase.apache.org
> >> >> > > > Subject: Memory Consumption and Processing questions
> >> >> > > >
> >> >> > > > Hello all,
> >> >> > > >
> >> >> > > > I'm planning an hbase implementation and had some questions I
> was
> >> >> > > > hoping
> >> >> > > > someone could help with.
> >> >> > > >
> >> >> > > > 1. Can someone give me a basic overview of how memory is used
> in
> >> >> > Hbase?
> >> >> > > >  Various places on the web people state that 16-24gb is the
> >> minimum
> >> >> > for
> >> >> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
> >> >> > that
> >> >> > > > hdfs/mr
> >> >> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.
> >>  It
> >> >> > > > seems
> >> >> > > > like lots of people suggesting use of even 24gb+ for hbase.
>  Why
> >> so
> >> >> > > > much?
> >> >> > > >  Is it simply to avoid gc problems?  Have data in memory for
> fast
> >> >> > > > random
> >> >> > > > reads? Or?
> >> >> > >
> >> >> > > Where exactly are you reading this from?  I'm not actually aware
> of
> >> >> > people
> >> >> > > using 24GB+ heaps for HBase.
> >> >> > >
> >> >> > > I would not recommend using less than 4GB for RegionServers.
>  Beyond
> >> >> > that,
> >> >> > > it very much depends on your application.  8GB is often
> sufficient
> >> >> > but I've
> >> >> > > seen as much as 16GB used in production.
> >> >> > >
> >> >> > > You need at least 4GB because of GC.  General experience has been
> >> >> > that
> >> >> > > below that the CMS GC does not work well.
> >> >> > >
> >> >> > > Memory is used primarily for the MemStores (write cache) and
> Block
> >> >> > Cache
> >> >> > > (read cache).  In addition, memory is allocated as part of normal
> >> >> > operations
> >> >> > > to store in-memory state and in processing reads.
> >> >> > >
> >> >> > > > 2. What types of things put more/less pressure on memory?  I
> saw
> >> >> > > > insinuation
> >> >> > > > that insert speed can create substantial memory pressure.  What
> >> >> > type of
> >> >> > > > relative memory pressure do scanners, random reads, random
> writes,
> >> >> > > > region
> >> >> > > > quantity and compactions cause?
> >> >> > >
> >> >> > > Writes are buffered and flushed to disk when the write buffer
> gets
> >> to
> >> >> > a
> >> >> > > local or global limit.  The local limit (per region) defaults to
> >> >> > 64MB.  The
> >> >> > > global limit is based on the total amount of heap available
> >> (default,
> >> >> > I
> >> >> > > think, is 40%).  So there is interplay between how much heap you
> >> have
> >> >> > and
> >> >> > > how many regions are actively written to.  If you have too many
> >> >> > regions and
> >> >> > > not enough memory to allow them to hit the local/region limit,
> you
> >> >> > end up
> >> >> > > flushing undersized files.
> >> >> > >
> >> >> > > Scanning/random reading will utilize the block cache, if
> configured
> >> >> > to.
> >> >> > >  The more room for the block cache, the more data you can keep
> in-
> >> >> > memory.
> >> >> > >  Reads from the block cache are significantly faster than
> non-cached
> >> >> > reads,
> >> >> > > obviously.
> >> >> > >
> >> >> > > Compactions are not generally an issue.
> >> >> > >
> >> >> > > > 2. How cpu intensive are the region servers?  It seems like
> most
> >> of
> >> >> > > > their
> >> >> > > > performance is based on i/o.  (I've noted the caution in
> starving
> >> >> > > > region
> >> >> > > > servers of cycles--which seems primarily focused on avoiding zk
> >> >> > timeout
> >> >> > > > >
> >> >> > > > region reassignment problems.)  Does anyone suggest or
> recommend
> >> >> > > > against
> >> >> > > > dedicating only one or two cores to a region server?  Do
> >> individual
> >> >> > > > compactions benefit from multiple cores are they
> single-threaded?
> >> >> > >
> >> >> > > I would dedicate at least one core to a region server, but as we
> add
> >> >> > more
> >> >> > > and more concurrency, it may become important to have two cores
> >> >> > available.
> >> >> > >  Many things, like compactions, are only single threaded today
> but
> >> >> > there's a
> >> >> > > very good chance you will be able to configure multiple threads
> in
> >> >> > the next
> >> >> > > major release.
> >> >> > >
> >> >> > > > 3. What are the memory and cpu resource demands of the master
> >> >> > server?
> >> >> > > > It
> >> >> > > > seems like more and more of that load is moving to zk.
> >> >> > >
> >> >> > > Not too much.  I'm putting a change in TRUNK right now that keeps
> >> all
> >> >> > > region assignments in the master, so there is some memory usage,
> but
> >> >> > not
> >> >> > > much.  I would think 2GB heap and 1-2 cores is sufficient.
> >> >> > >
> >> >> > > > 4. General HDFS question-- when the namenode dies, what happens
> to
> >> >> > the
> >> >> > > > datanodes and how does that relate to Hbase?  E.g., can hbase
> >> >> > continue
> >> >> > > > to
> >> >> > > > operate in a read-only mode (assuming no datanode/regionserver
> >> >> > failures
> >> >> > > > post
> >> >> > > > namenode failure)?
> >> >> > >
> >> >> > > Today, HBase will probably die ungracefully once it does start to
> >> hit
> >> >> > the
> >> >> > > NN.  There are some open JIRAs about HBase behavior under
> different
> >> >> > HDFS
> >> >> > > faults and trying to be as graceful as possible when they happen,
> >> >> > including
> >> >> > > HBASE-2183 about riding over an HDFS restart.
> >> >> > >
> >> >> > > >
> >> >> > > > Thanks for your help,
> >> >> > > > Jacques
> >> >> > >
> >> >>
> >> >
> >>
> >> Interesting question. The problem is that java is unaware of what is
> >> in the VFS cache so theoretically could could end up with data in the
> >> BlockCache and in the VFS cache. Committing the memory to the JVM will
> >> give less to the system and as a result the system will have less to
> >> VFS cache with.
> >>
> >
>

Re: Memory Consumption and Processing questions

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Something to keep in mind is that the block cache is within the region
server's JVM, whereas it has to go on the network to get data from the
DNs (which should always be slower even if it's in the OS cache). But,
on a production system, regions don't move that much so the local DN
should always contain the blocks for it's RS's regions. If
https://issues.apache.org/jira/browse/HDFS-347 was there, block
caching could be almost useless if the OS is given a lot of room and
there would be no need for IB and whatnot.

J-D

On Mon, Aug 2, 2010 at 4:00 PM, Jacques <wh...@gmail.com> wrote:
> Makes me wonder if high speed interconnects and little to no block cache
> would work better--basically rely on each machine to hold the highly used
> blocks in os cache and push them around quickly if they are needed
> elsewhere.  Of course it's all just a thought experiment at this point.  The
> cost of having high speed interconnects would probably be substantially more
> than provisioning extra memory to hold cached blocks twice.  There is also
> the thought that if the blocks are cached by Hbase, they would appear rarely
> used from the os standpoint and are, therefore, unlikely to be in cache.
>
>
>
>
> On Mon, Aug 2, 2010 at 8:39 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> On Mon, Aug 2, 2010 at 11:33 AM, Jacques <wh...@gmail.com> wrote:
>> > You're right, of course.  I shouldn't generalize too much.  I'm more
>> trying
>> > to understand the landscape than pinpoint anything specific.
>> >
>> > Quick question: since the block cache is unaware of the location of
>> files,
>> > wouldn't it overlap the os cache for hfiles once they are localized after
>> > compaction?  Any guidance on how to tune the two?
>> >
>> > thanks,
>> > Jacques
>> >
>> > On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <jg...@facebook.com>
>> wrote:
>> >
>> >> One reason not to extrapolate that is that leaving lots of memory for
>> the
>> >> linux buffer cache is a good way to improve overall performance of
>> typically
>> >> i/o bound applications like Hadoop and HBase.
>> >>
>> >> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
>> >> generally require almost no significant memory (though generally run
>> with
>> >> 1GB); their performance will improve with more free memory for the os
>> buffer
>> >> cache.  As for MR, this completely depends on the tasks running.  The
>> >> TaskTrackers also don't require significant memory, so this completely
>> >> depends on the number of tasks per node and the memory requirements of
>> the
>> >> tasks.
>> >>
>> >> Unfortunately you can't always generalize the requirements too much,
>> >> especially in MR.
>> >>
>> >> JG
>> >>
>> >> > -----Original Message-----
>> >> > From: Jacques [mailto:whshub@gmail.com]
>> >> > Sent: Sunday, August 01, 2010 5:30 PM
>> >> > To: user@hbase.apache.org
>> >> > Subject: Re: Memory Consumption and Processing questions
>> >> >
>> >> > Thanks, that was very helpful.
>> >> >
>> >> > Regarding 24gb-- I saw people using servers with 32gb of server memory
>> >> > (a
>> >> > recent thread here and hstack.org).  I extrapolated the use since it
>> >> > seems
>> >> > most people use ~8 for hdfs/mr.
>> >> >
>> >> > -Jacques
>> >> >
>> >> >
>> >> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com>
>> >> > wrote:
>> >> >
>> >> > >
>> >> > >
>> >> > > > -----Original Message-----
>> >> > > > From: Jacques [mailto:whshub@gmail.com]
>> >> > > > Sent: Friday, July 30, 2010 1:16 PM
>> >> > > > To: user@hbase.apache.org
>> >> > > > Subject: Memory Consumption and Processing questions
>> >> > > >
>> >> > > > Hello all,
>> >> > > >
>> >> > > > I'm planning an hbase implementation and had some questions I was
>> >> > > > hoping
>> >> > > > someone could help with.
>> >> > > >
>> >> > > > 1. Can someone give me a basic overview of how memory is used in
>> >> > Hbase?
>> >> > > >  Various places on the web people state that 16-24gb is the
>> minimum
>> >> > for
>> >> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
>> >> > that
>> >> > > > hdfs/mr
>> >> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.
>>  It
>> >> > > > seems
>> >> > > > like lots of people suggesting use of even 24gb+ for hbase.  Why
>> so
>> >> > > > much?
>> >> > > >  Is it simply to avoid gc problems?  Have data in memory for fast
>> >> > > > random
>> >> > > > reads? Or?
>> >> > >
>> >> > > Where exactly are you reading this from?  I'm not actually aware of
>> >> > people
>> >> > > using 24GB+ heaps for HBase.
>> >> > >
>> >> > > I would not recommend using less than 4GB for RegionServers.  Beyond
>> >> > that,
>> >> > > it very much depends on your application.  8GB is often sufficient
>> >> > but I've
>> >> > > seen as much as 16GB used in production.
>> >> > >
>> >> > > You need at least 4GB because of GC.  General experience has been
>> >> > that
>> >> > > below that the CMS GC does not work well.
>> >> > >
>> >> > > Memory is used primarily for the MemStores (write cache) and Block
>> >> > Cache
>> >> > > (read cache).  In addition, memory is allocated as part of normal
>> >> > operations
>> >> > > to store in-memory state and in processing reads.
>> >> > >
>> >> > > > 2. What types of things put more/less pressure on memory?  I saw
>> >> > > > insinuation
>> >> > > > that insert speed can create substantial memory pressure.  What
>> >> > type of
>> >> > > > relative memory pressure do scanners, random reads, random writes,
>> >> > > > region
>> >> > > > quantity and compactions cause?
>> >> > >
>> >> > > Writes are buffered and flushed to disk when the write buffer gets
>> to
>> >> > a
>> >> > > local or global limit.  The local limit (per region) defaults to
>> >> > 64MB.  The
>> >> > > global limit is based on the total amount of heap available
>> (default,
>> >> > I
>> >> > > think, is 40%).  So there is interplay between how much heap you
>> have
>> >> > and
>> >> > > how many regions are actively written to.  If you have too many
>> >> > regions and
>> >> > > not enough memory to allow them to hit the local/region limit, you
>> >> > end up
>> >> > > flushing undersized files.
>> >> > >
>> >> > > Scanning/random reading will utilize the block cache, if configured
>> >> > to.
>> >> > >  The more room for the block cache, the more data you can keep in-
>> >> > memory.
>> >> > >  Reads from the block cache are significantly faster than non-cached
>> >> > reads,
>> >> > > obviously.
>> >> > >
>> >> > > Compactions are not generally an issue.
>> >> > >
>> >> > > > 2. How cpu intensive are the region servers?  It seems like most
>> of
>> >> > > > their
>> >> > > > performance is based on i/o.  (I've noted the caution in starving
>> >> > > > region
>> >> > > > servers of cycles--which seems primarily focused on avoiding zk
>> >> > timeout
>> >> > > > >
>> >> > > > region reassignment problems.)  Does anyone suggest or recommend
>> >> > > > against
>> >> > > > dedicating only one or two cores to a region server?  Do
>> individual
>> >> > > > compactions benefit from multiple cores are they single-threaded?
>> >> > >
>> >> > > I would dedicate at least one core to a region server, but as we add
>> >> > more
>> >> > > and more concurrency, it may become important to have two cores
>> >> > available.
>> >> > >  Many things, like compactions, are only single threaded today but
>> >> > there's a
>> >> > > very good chance you will be able to configure multiple threads in
>> >> > the next
>> >> > > major release.
>> >> > >
>> >> > > > 3. What are the memory and cpu resource demands of the master
>> >> > server?
>> >> > > > It
>> >> > > > seems like more and more of that load is moving to zk.
>> >> > >
>> >> > > Not too much.  I'm putting a change in TRUNK right now that keeps
>> all
>> >> > > region assignments in the master, so there is some memory usage, but
>> >> > not
>> >> > > much.  I would think 2GB heap and 1-2 cores is sufficient.
>> >> > >
>> >> > > > 4. General HDFS question-- when the namenode dies, what happens to
>> >> > the
>> >> > > > datanodes and how does that relate to Hbase?  E.g., can hbase
>> >> > continue
>> >> > > > to
>> >> > > > operate in a read-only mode (assuming no datanode/regionserver
>> >> > failures
>> >> > > > post
>> >> > > > namenode failure)?
>> >> > >
>> >> > > Today, HBase will probably die ungracefully once it does start to
>> hit
>> >> > the
>> >> > > NN.  There are some open JIRAs about HBase behavior under different
>> >> > HDFS
>> >> > > faults and trying to be as graceful as possible when they happen,
>> >> > including
>> >> > > HBASE-2183 about riding over an HDFS restart.
>> >> > >
>> >> > > >
>> >> > > > Thanks for your help,
>> >> > > > Jacques
>> >> > >
>> >>
>> >
>>
>> Interesting question. The problem is that java is unaware of what is
>> in the VFS cache so theoretically could could end up with data in the
>> BlockCache and in the VFS cache. Committing the memory to the JVM will
>> give less to the system and as a result the system will have less to
>> VFS cache with.
>>
>

Re: Memory Consumption and Processing questions

Posted by Jacques <wh...@gmail.com>.
Makes me wonder if high speed interconnects and little to no block cache
would work better--basically rely on each machine to hold the highly used
blocks in os cache and push them around quickly if they are needed
elsewhere.  Of course it's all just a thought experiment at this point.  The
cost of having high speed interconnects would probably be substantially more
than provisioning extra memory to hold cached blocks twice.  There is also
the thought that if the blocks are cached by Hbase, they would appear rarely
used from the os standpoint and are, therefore, unlikely to be in cache.




On Mon, Aug 2, 2010 at 8:39 AM, Edward Capriolo <ed...@gmail.com>wrote:

> On Mon, Aug 2, 2010 at 11:33 AM, Jacques <wh...@gmail.com> wrote:
> > You're right, of course.  I shouldn't generalize too much.  I'm more
> trying
> > to understand the landscape than pinpoint anything specific.
> >
> > Quick question: since the block cache is unaware of the location of
> files,
> > wouldn't it overlap the os cache for hfiles once they are localized after
> > compaction?  Any guidance on how to tune the two?
> >
> > thanks,
> > Jacques
> >
> > On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <jg...@facebook.com>
> wrote:
> >
> >> One reason not to extrapolate that is that leaving lots of memory for
> the
> >> linux buffer cache is a good way to improve overall performance of
> typically
> >> i/o bound applications like Hadoop and HBase.
> >>
> >> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
> >> generally require almost no significant memory (though generally run
> with
> >> 1GB); their performance will improve with more free memory for the os
> buffer
> >> cache.  As for MR, this completely depends on the tasks running.  The
> >> TaskTrackers also don't require significant memory, so this completely
> >> depends on the number of tasks per node and the memory requirements of
> the
> >> tasks.
> >>
> >> Unfortunately you can't always generalize the requirements too much,
> >> especially in MR.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: Jacques [mailto:whshub@gmail.com]
> >> > Sent: Sunday, August 01, 2010 5:30 PM
> >> > To: user@hbase.apache.org
> >> > Subject: Re: Memory Consumption and Processing questions
> >> >
> >> > Thanks, that was very helpful.
> >> >
> >> > Regarding 24gb-- I saw people using servers with 32gb of server memory
> >> > (a
> >> > recent thread here and hstack.org).  I extrapolated the use since it
> >> > seems
> >> > most people use ~8 for hdfs/mr.
> >> >
> >> > -Jacques
> >> >
> >> >
> >> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com>
> >> > wrote:
> >> >
> >> > >
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Jacques [mailto:whshub@gmail.com]
> >> > > > Sent: Friday, July 30, 2010 1:16 PM
> >> > > > To: user@hbase.apache.org
> >> > > > Subject: Memory Consumption and Processing questions
> >> > > >
> >> > > > Hello all,
> >> > > >
> >> > > > I'm planning an hbase implementation and had some questions I was
> >> > > > hoping
> >> > > > someone could help with.
> >> > > >
> >> > > > 1. Can someone give me a basic overview of how memory is used in
> >> > Hbase?
> >> > > >  Various places on the web people state that 16-24gb is the
> minimum
> >> > for
> >> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
> >> > that
> >> > > > hdfs/mr
> >> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.
>  It
> >> > > > seems
> >> > > > like lots of people suggesting use of even 24gb+ for hbase.  Why
> so
> >> > > > much?
> >> > > >  Is it simply to avoid gc problems?  Have data in memory for fast
> >> > > > random
> >> > > > reads? Or?
> >> > >
> >> > > Where exactly are you reading this from?  I'm not actually aware of
> >> > people
> >> > > using 24GB+ heaps for HBase.
> >> > >
> >> > > I would not recommend using less than 4GB for RegionServers.  Beyond
> >> > that,
> >> > > it very much depends on your application.  8GB is often sufficient
> >> > but I've
> >> > > seen as much as 16GB used in production.
> >> > >
> >> > > You need at least 4GB because of GC.  General experience has been
> >> > that
> >> > > below that the CMS GC does not work well.
> >> > >
> >> > > Memory is used primarily for the MemStores (write cache) and Block
> >> > Cache
> >> > > (read cache).  In addition, memory is allocated as part of normal
> >> > operations
> >> > > to store in-memory state and in processing reads.
> >> > >
> >> > > > 2. What types of things put more/less pressure on memory?  I saw
> >> > > > insinuation
> >> > > > that insert speed can create substantial memory pressure.  What
> >> > type of
> >> > > > relative memory pressure do scanners, random reads, random writes,
> >> > > > region
> >> > > > quantity and compactions cause?
> >> > >
> >> > > Writes are buffered and flushed to disk when the write buffer gets
> to
> >> > a
> >> > > local or global limit.  The local limit (per region) defaults to
> >> > 64MB.  The
> >> > > global limit is based on the total amount of heap available
> (default,
> >> > I
> >> > > think, is 40%).  So there is interplay between how much heap you
> have
> >> > and
> >> > > how many regions are actively written to.  If you have too many
> >> > regions and
> >> > > not enough memory to allow them to hit the local/region limit, you
> >> > end up
> >> > > flushing undersized files.
> >> > >
> >> > > Scanning/random reading will utilize the block cache, if configured
> >> > to.
> >> > >  The more room for the block cache, the more data you can keep in-
> >> > memory.
> >> > >  Reads from the block cache are significantly faster than non-cached
> >> > reads,
> >> > > obviously.
> >> > >
> >> > > Compactions are not generally an issue.
> >> > >
> >> > > > 2. How cpu intensive are the region servers?  It seems like most
> of
> >> > > > their
> >> > > > performance is based on i/o.  (I've noted the caution in starving
> >> > > > region
> >> > > > servers of cycles--which seems primarily focused on avoiding zk
> >> > timeout
> >> > > > >
> >> > > > region reassignment problems.)  Does anyone suggest or recommend
> >> > > > against
> >> > > > dedicating only one or two cores to a region server?  Do
> individual
> >> > > > compactions benefit from multiple cores are they single-threaded?
> >> > >
> >> > > I would dedicate at least one core to a region server, but as we add
> >> > more
> >> > > and more concurrency, it may become important to have two cores
> >> > available.
> >> > >  Many things, like compactions, are only single threaded today but
> >> > there's a
> >> > > very good chance you will be able to configure multiple threads in
> >> > the next
> >> > > major release.
> >> > >
> >> > > > 3. What are the memory and cpu resource demands of the master
> >> > server?
> >> > > > It
> >> > > > seems like more and more of that load is moving to zk.
> >> > >
> >> > > Not too much.  I'm putting a change in TRUNK right now that keeps
> all
> >> > > region assignments in the master, so there is some memory usage, but
> >> > not
> >> > > much.  I would think 2GB heap and 1-2 cores is sufficient.
> >> > >
> >> > > > 4. General HDFS question-- when the namenode dies, what happens to
> >> > the
> >> > > > datanodes and how does that relate to Hbase?  E.g., can hbase
> >> > continue
> >> > > > to
> >> > > > operate in a read-only mode (assuming no datanode/regionserver
> >> > failures
> >> > > > post
> >> > > > namenode failure)?
> >> > >
> >> > > Today, HBase will probably die ungracefully once it does start to
> hit
> >> > the
> >> > > NN.  There are some open JIRAs about HBase behavior under different
> >> > HDFS
> >> > > faults and trying to be as graceful as possible when they happen,
> >> > including
> >> > > HBASE-2183 about riding over an HDFS restart.
> >> > >
> >> > > >
> >> > > > Thanks for your help,
> >> > > > Jacques
> >> > >
> >>
> >
>
> Interesting question. The problem is that java is unaware of what is
> in the VFS cache so theoretically could could end up with data in the
> BlockCache and in the VFS cache. Committing the memory to the JVM will
> give less to the system and as a result the system will have less to
> VFS cache with.
>

Re: Memory Consumption and Processing questions

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 2, 2010 at 11:33 AM, Jacques <wh...@gmail.com> wrote:
> You're right, of course.  I shouldn't generalize too much.  I'm more trying
> to understand the landscape than pinpoint anything specific.
>
> Quick question: since the block cache is unaware of the location of files,
> wouldn't it overlap the os cache for hfiles once they are localized after
> compaction?  Any guidance on how to tune the two?
>
> thanks,
> Jacques
>
> On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <jg...@facebook.com> wrote:
>
>> One reason not to extrapolate that is that leaving lots of memory for the
>> linux buffer cache is a good way to improve overall performance of typically
>> i/o bound applications like Hadoop and HBase.
>>
>> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
>> generally require almost no significant memory (though generally run with
>> 1GB); their performance will improve with more free memory for the os buffer
>> cache.  As for MR, this completely depends on the tasks running.  The
>> TaskTrackers also don't require significant memory, so this completely
>> depends on the number of tasks per node and the memory requirements of the
>> tasks.
>>
>> Unfortunately you can't always generalize the requirements too much,
>> especially in MR.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Jacques [mailto:whshub@gmail.com]
>> > Sent: Sunday, August 01, 2010 5:30 PM
>> > To: user@hbase.apache.org
>> > Subject: Re: Memory Consumption and Processing questions
>> >
>> > Thanks, that was very helpful.
>> >
>> > Regarding 24gb-- I saw people using servers with 32gb of server memory
>> > (a
>> > recent thread here and hstack.org).  I extrapolated the use since it
>> > seems
>> > most people use ~8 for hdfs/mr.
>> >
>> > -Jacques
>> >
>> >
>> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com>
>> > wrote:
>> >
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Jacques [mailto:whshub@gmail.com]
>> > > > Sent: Friday, July 30, 2010 1:16 PM
>> > > > To: user@hbase.apache.org
>> > > > Subject: Memory Consumption and Processing questions
>> > > >
>> > > > Hello all,
>> > > >
>> > > > I'm planning an hbase implementation and had some questions I was
>> > > > hoping
>> > > > someone could help with.
>> > > >
>> > > > 1. Can someone give me a basic overview of how memory is used in
>> > Hbase?
>> > > >  Various places on the web people state that 16-24gb is the minimum
>> > for
>> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
>> > that
>> > > > hdfs/mr
>> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
>> > > > seems
>> > > > like lots of people suggesting use of even 24gb+ for hbase.  Why so
>> > > > much?
>> > > >  Is it simply to avoid gc problems?  Have data in memory for fast
>> > > > random
>> > > > reads? Or?
>> > >
>> > > Where exactly are you reading this from?  I'm not actually aware of
>> > people
>> > > using 24GB+ heaps for HBase.
>> > >
>> > > I would not recommend using less than 4GB for RegionServers.  Beyond
>> > that,
>> > > it very much depends on your application.  8GB is often sufficient
>> > but I've
>> > > seen as much as 16GB used in production.
>> > >
>> > > You need at least 4GB because of GC.  General experience has been
>> > that
>> > > below that the CMS GC does not work well.
>> > >
>> > > Memory is used primarily for the MemStores (write cache) and Block
>> > Cache
>> > > (read cache).  In addition, memory is allocated as part of normal
>> > operations
>> > > to store in-memory state and in processing reads.
>> > >
>> > > > 2. What types of things put more/less pressure on memory?  I saw
>> > > > insinuation
>> > > > that insert speed can create substantial memory pressure.  What
>> > type of
>> > > > relative memory pressure do scanners, random reads, random writes,
>> > > > region
>> > > > quantity and compactions cause?
>> > >
>> > > Writes are buffered and flushed to disk when the write buffer gets to
>> > a
>> > > local or global limit.  The local limit (per region) defaults to
>> > 64MB.  The
>> > > global limit is based on the total amount of heap available (default,
>> > I
>> > > think, is 40%).  So there is interplay between how much heap you have
>> > and
>> > > how many regions are actively written to.  If you have too many
>> > regions and
>> > > not enough memory to allow them to hit the local/region limit, you
>> > end up
>> > > flushing undersized files.
>> > >
>> > > Scanning/random reading will utilize the block cache, if configured
>> > to.
>> > >  The more room for the block cache, the more data you can keep in-
>> > memory.
>> > >  Reads from the block cache are significantly faster than non-cached
>> > reads,
>> > > obviously.
>> > >
>> > > Compactions are not generally an issue.
>> > >
>> > > > 2. How cpu intensive are the region servers?  It seems like most of
>> > > > their
>> > > > performance is based on i/o.  (I've noted the caution in starving
>> > > > region
>> > > > servers of cycles--which seems primarily focused on avoiding zk
>> > timeout
>> > > > >
>> > > > region reassignment problems.)  Does anyone suggest or recommend
>> > > > against
>> > > > dedicating only one or two cores to a region server?  Do individual
>> > > > compactions benefit from multiple cores are they single-threaded?
>> > >
>> > > I would dedicate at least one core to a region server, but as we add
>> > more
>> > > and more concurrency, it may become important to have two cores
>> > available.
>> > >  Many things, like compactions, are only single threaded today but
>> > there's a
>> > > very good chance you will be able to configure multiple threads in
>> > the next
>> > > major release.
>> > >
>> > > > 3. What are the memory and cpu resource demands of the master
>> > server?
>> > > > It
>> > > > seems like more and more of that load is moving to zk.
>> > >
>> > > Not too much.  I'm putting a change in TRUNK right now that keeps all
>> > > region assignments in the master, so there is some memory usage, but
>> > not
>> > > much.  I would think 2GB heap and 1-2 cores is sufficient.
>> > >
>> > > > 4. General HDFS question-- when the namenode dies, what happens to
>> > the
>> > > > datanodes and how does that relate to Hbase?  E.g., can hbase
>> > continue
>> > > > to
>> > > > operate in a read-only mode (assuming no datanode/regionserver
>> > failures
>> > > > post
>> > > > namenode failure)?
>> > >
>> > > Today, HBase will probably die ungracefully once it does start to hit
>> > the
>> > > NN.  There are some open JIRAs about HBase behavior under different
>> > HDFS
>> > > faults and trying to be as graceful as possible when they happen,
>> > including
>> > > HBASE-2183 about riding over an HDFS restart.
>> > >
>> > > >
>> > > > Thanks for your help,
>> > > > Jacques
>> > >
>>
>

Interesting question. The problem is that java is unaware of what is
in the VFS cache so theoretically could could end up with data in the
BlockCache and in the VFS cache. Committing the memory to the JVM will
give less to the system and as a result the system will have less to
VFS cache with.

Re: Memory Consumption and Processing questions

Posted by Jacques <wh...@gmail.com>.
You're right, of course.  I shouldn't generalize too much.  I'm more trying
to understand the landscape than pinpoint anything specific.

Quick question: since the block cache is unaware of the location of files,
wouldn't it overlap the os cache for hfiles once they are localized after
compaction?  Any guidance on how to tune the two?

thanks,
Jacques

On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray <jg...@facebook.com> wrote:

> One reason not to extrapolate that is that leaving lots of memory for the
> linux buffer cache is a good way to improve overall performance of typically
> i/o bound applications like Hadoop and HBase.
>
> Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes
> generally require almost no significant memory (though generally run with
> 1GB); their performance will improve with more free memory for the os buffer
> cache.  As for MR, this completely depends on the tasks running.  The
> TaskTrackers also don't require significant memory, so this completely
> depends on the number of tasks per node and the memory requirements of the
> tasks.
>
> Unfortunately you can't always generalize the requirements too much,
> especially in MR.
>
> JG
>
> > -----Original Message-----
> > From: Jacques [mailto:whshub@gmail.com]
> > Sent: Sunday, August 01, 2010 5:30 PM
> > To: user@hbase.apache.org
> > Subject: Re: Memory Consumption and Processing questions
> >
> > Thanks, that was very helpful.
> >
> > Regarding 24gb-- I saw people using servers with 32gb of server memory
> > (a
> > recent thread here and hstack.org).  I extrapolated the use since it
> > seems
> > most people use ~8 for hdfs/mr.
> >
> > -Jacques
> >
> >
> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com>
> > wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jacques [mailto:whshub@gmail.com]
> > > > Sent: Friday, July 30, 2010 1:16 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Memory Consumption and Processing questions
> > > >
> > > > Hello all,
> > > >
> > > > I'm planning an hbase implementation and had some questions I was
> > > > hoping
> > > > someone could help with.
> > > >
> > > > 1. Can someone give me a basic overview of how memory is used in
> > Hbase?
> > > >  Various places on the web people state that 16-24gb is the minimum
> > for
> > > > region servers if they also operate as hdfs/mr nodes.  Assuming
> > that
> > > > hdfs/mr
> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > > > seems
> > > > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > > > much?
> > > >  Is it simply to avoid gc problems?  Have data in memory for fast
> > > > random
> > > > reads? Or?
> > >
> > > Where exactly are you reading this from?  I'm not actually aware of
> > people
> > > using 24GB+ heaps for HBase.
> > >
> > > I would not recommend using less than 4GB for RegionServers.  Beyond
> > that,
> > > it very much depends on your application.  8GB is often sufficient
> > but I've
> > > seen as much as 16GB used in production.
> > >
> > > You need at least 4GB because of GC.  General experience has been
> > that
> > > below that the CMS GC does not work well.
> > >
> > > Memory is used primarily for the MemStores (write cache) and Block
> > Cache
> > > (read cache).  In addition, memory is allocated as part of normal
> > operations
> > > to store in-memory state and in processing reads.
> > >
> > > > 2. What types of things put more/less pressure on memory?  I saw
> > > > insinuation
> > > > that insert speed can create substantial memory pressure.  What
> > type of
> > > > relative memory pressure do scanners, random reads, random writes,
> > > > region
> > > > quantity and compactions cause?
> > >
> > > Writes are buffered and flushed to disk when the write buffer gets to
> > a
> > > local or global limit.  The local limit (per region) defaults to
> > 64MB.  The
> > > global limit is based on the total amount of heap available (default,
> > I
> > > think, is 40%).  So there is interplay between how much heap you have
> > and
> > > how many regions are actively written to.  If you have too many
> > regions and
> > > not enough memory to allow them to hit the local/region limit, you
> > end up
> > > flushing undersized files.
> > >
> > > Scanning/random reading will utilize the block cache, if configured
> > to.
> > >  The more room for the block cache, the more data you can keep in-
> > memory.
> > >  Reads from the block cache are significantly faster than non-cached
> > reads,
> > > obviously.
> > >
> > > Compactions are not generally an issue.
> > >
> > > > 2. How cpu intensive are the region servers?  It seems like most of
> > > > their
> > > > performance is based on i/o.  (I've noted the caution in starving
> > > > region
> > > > servers of cycles--which seems primarily focused on avoiding zk
> > timeout
> > > > >
> > > > region reassignment problems.)  Does anyone suggest or recommend
> > > > against
> > > > dedicating only one or two cores to a region server?  Do individual
> > > > compactions benefit from multiple cores are they single-threaded?
> > >
> > > I would dedicate at least one core to a region server, but as we add
> > more
> > > and more concurrency, it may become important to have two cores
> > available.
> > >  Many things, like compactions, are only single threaded today but
> > there's a
> > > very good chance you will be able to configure multiple threads in
> > the next
> > > major release.
> > >
> > > > 3. What are the memory and cpu resource demands of the master
> > server?
> > > > It
> > > > seems like more and more of that load is moving to zk.
> > >
> > > Not too much.  I'm putting a change in TRUNK right now that keeps all
> > > region assignments in the master, so there is some memory usage, but
> > not
> > > much.  I would think 2GB heap and 1-2 cores is sufficient.
> > >
> > > > 4. General HDFS question-- when the namenode dies, what happens to
> > the
> > > > datanodes and how does that relate to Hbase?  E.g., can hbase
> > continue
> > > > to
> > > > operate in a read-only mode (assuming no datanode/regionserver
> > failures
> > > > post
> > > > namenode failure)?
> > >
> > > Today, HBase will probably die ungracefully once it does start to hit
> > the
> > > NN.  There are some open JIRAs about HBase behavior under different
> > HDFS
> > > faults and trying to be as graceful as possible when they happen,
> > including
> > > HBASE-2183 about riding over an HDFS restart.
> > >
> > > >
> > > > Thanks for your help,
> > > > Jacques
> > >
>

RE: Memory Consumption and Processing questions

Posted by Jonathan Gray <jg...@facebook.com>.
One reason not to extrapolate that is that leaving lots of memory for the linux buffer cache is a good way to improve overall performance of typically i/o bound applications like Hadoop and HBase.

Also, I'm unsure that "most people use ~8 for hdfs/mr".  DataNodes generally require almost no significant memory (though generally run with 1GB); their performance will improve with more free memory for the os buffer cache.  As for MR, this completely depends on the tasks running.  The TaskTrackers also don't require significant memory, so this completely depends on the number of tasks per node and the memory requirements of the tasks.

Unfortunately you can't always generalize the requirements too much, especially in MR.

JG

> -----Original Message-----
> From: Jacques [mailto:whshub@gmail.com]
> Sent: Sunday, August 01, 2010 5:30 PM
> To: user@hbase.apache.org
> Subject: Re: Memory Consumption and Processing questions
> 
> Thanks, that was very helpful.
> 
> Regarding 24gb-- I saw people using servers with 32gb of server memory
> (a
> recent thread here and hstack.org).  I extrapolated the use since it
> seems
> most people use ~8 for hdfs/mr.
> 
> -Jacques
> 
> 
> On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Jacques [mailto:whshub@gmail.com]
> > > Sent: Friday, July 30, 2010 1:16 PM
> > > To: user@hbase.apache.org
> > > Subject: Memory Consumption and Processing questions
> > >
> > > Hello all,
> > >
> > > I'm planning an hbase implementation and had some questions I was
> > > hoping
> > > someone could help with.
> > >
> > > 1. Can someone give me a basic overview of how memory is used in
> Hbase?
> > >  Various places on the web people state that 16-24gb is the minimum
> for
> > > region servers if they also operate as hdfs/mr nodes.  Assuming
> that
> > > hdfs/mr
> > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > > seems
> > > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > > much?
> > >  Is it simply to avoid gc problems?  Have data in memory for fast
> > > random
> > > reads? Or?
> >
> > Where exactly are you reading this from?  I'm not actually aware of
> people
> > using 24GB+ heaps for HBase.
> >
> > I would not recommend using less than 4GB for RegionServers.  Beyond
> that,
> > it very much depends on your application.  8GB is often sufficient
> but I've
> > seen as much as 16GB used in production.
> >
> > You need at least 4GB because of GC.  General experience has been
> that
> > below that the CMS GC does not work well.
> >
> > Memory is used primarily for the MemStores (write cache) and Block
> Cache
> > (read cache).  In addition, memory is allocated as part of normal
> operations
> > to store in-memory state and in processing reads.
> >
> > > 2. What types of things put more/less pressure on memory?  I saw
> > > insinuation
> > > that insert speed can create substantial memory pressure.  What
> type of
> > > relative memory pressure do scanners, random reads, random writes,
> > > region
> > > quantity and compactions cause?
> >
> > Writes are buffered and flushed to disk when the write buffer gets to
> a
> > local or global limit.  The local limit (per region) defaults to
> 64MB.  The
> > global limit is based on the total amount of heap available (default,
> I
> > think, is 40%).  So there is interplay between how much heap you have
> and
> > how many regions are actively written to.  If you have too many
> regions and
> > not enough memory to allow them to hit the local/region limit, you
> end up
> > flushing undersized files.
> >
> > Scanning/random reading will utilize the block cache, if configured
> to.
> >  The more room for the block cache, the more data you can keep in-
> memory.
> >  Reads from the block cache are significantly faster than non-cached
> reads,
> > obviously.
> >
> > Compactions are not generally an issue.
> >
> > > 2. How cpu intensive are the region servers?  It seems like most of
> > > their
> > > performance is based on i/o.  (I've noted the caution in starving
> > > region
> > > servers of cycles--which seems primarily focused on avoiding zk
> timeout
> > > >
> > > region reassignment problems.)  Does anyone suggest or recommend
> > > against
> > > dedicating only one or two cores to a region server?  Do individual
> > > compactions benefit from multiple cores are they single-threaded?
> >
> > I would dedicate at least one core to a region server, but as we add
> more
> > and more concurrency, it may become important to have two cores
> available.
> >  Many things, like compactions, are only single threaded today but
> there's a
> > very good chance you will be able to configure multiple threads in
> the next
> > major release.
> >
> > > 3. What are the memory and cpu resource demands of the master
> server?
> > > It
> > > seems like more and more of that load is moving to zk.
> >
> > Not too much.  I'm putting a change in TRUNK right now that keeps all
> > region assignments in the master, so there is some memory usage, but
> not
> > much.  I would think 2GB heap and 1-2 cores is sufficient.
> >
> > > 4. General HDFS question-- when the namenode dies, what happens to
> the
> > > datanodes and how does that relate to Hbase?  E.g., can hbase
> continue
> > > to
> > > operate in a read-only mode (assuming no datanode/regionserver
> failures
> > > post
> > > namenode failure)?
> >
> > Today, HBase will probably die ungracefully once it does start to hit
> the
> > NN.  There are some open JIRAs about HBase behavior under different
> HDFS
> > faults and trying to be as graceful as possible when they happen,
> including
> > HBASE-2183 about riding over an HDFS restart.
> >
> > >
> > > Thanks for your help,
> > > Jacques
> >

Re: Memory Consumption and Processing questions

Posted by Jacques <wh...@gmail.com>.
Thanks, that was very helpful.

Regarding 24gb-- I saw people using servers with 32gb of server memory (a
recent thread here and hstack.org).  I extrapolated the use since it seems
most people use ~8 for hdfs/mr.

-Jacques


On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <jg...@facebook.com> wrote:

>
>
> > -----Original Message-----
> > From: Jacques [mailto:whshub@gmail.com]
> > Sent: Friday, July 30, 2010 1:16 PM
> > To: user@hbase.apache.org
> > Subject: Memory Consumption and Processing questions
> >
> > Hello all,
> >
> > I'm planning an hbase implementation and had some questions I was
> > hoping
> > someone could help with.
> >
> > 1. Can someone give me a basic overview of how memory is used in Hbase?
> >  Various places on the web people state that 16-24gb is the minimum for
> > region servers if they also operate as hdfs/mr nodes.  Assuming that
> > hdfs/mr
> > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> > seems
> > like lots of people suggesting use of even 24gb+ for hbase.  Why so
> > much?
> >  Is it simply to avoid gc problems?  Have data in memory for fast
> > random
> > reads? Or?
>
> Where exactly are you reading this from?  I'm not actually aware of people
> using 24GB+ heaps for HBase.
>
> I would not recommend using less than 4GB for RegionServers.  Beyond that,
> it very much depends on your application.  8GB is often sufficient but I've
> seen as much as 16GB used in production.
>
> You need at least 4GB because of GC.  General experience has been that
> below that the CMS GC does not work well.
>
> Memory is used primarily for the MemStores (write cache) and Block Cache
> (read cache).  In addition, memory is allocated as part of normal operations
> to store in-memory state and in processing reads.
>
> > 2. What types of things put more/less pressure on memory?  I saw
> > insinuation
> > that insert speed can create substantial memory pressure.  What type of
> > relative memory pressure do scanners, random reads, random writes,
> > region
> > quantity and compactions cause?
>
> Writes are buffered and flushed to disk when the write buffer gets to a
> local or global limit.  The local limit (per region) defaults to 64MB.  The
> global limit is based on the total amount of heap available (default, I
> think, is 40%).  So there is interplay between how much heap you have and
> how many regions are actively written to.  If you have too many regions and
> not enough memory to allow them to hit the local/region limit, you end up
> flushing undersized files.
>
> Scanning/random reading will utilize the block cache, if configured to.
>  The more room for the block cache, the more data you can keep in-memory.
>  Reads from the block cache are significantly faster than non-cached reads,
> obviously.
>
> Compactions are not generally an issue.
>
> > 2. How cpu intensive are the region servers?  It seems like most of
> > their
> > performance is based on i/o.  (I've noted the caution in starving
> > region
> > servers of cycles--which seems primarily focused on avoiding zk timeout
> > >
> > region reassignment problems.)  Does anyone suggest or recommend
> > against
> > dedicating only one or two cores to a region server?  Do individual
> > compactions benefit from multiple cores are they single-threaded?
>
> I would dedicate at least one core to a region server, but as we add more
> and more concurrency, it may become important to have two cores available.
>  Many things, like compactions, are only single threaded today but there's a
> very good chance you will be able to configure multiple threads in the next
> major release.
>
> > 3. What are the memory and cpu resource demands of the master server?
> > It
> > seems like more and more of that load is moving to zk.
>
> Not too much.  I'm putting a change in TRUNK right now that keeps all
> region assignments in the master, so there is some memory usage, but not
> much.  I would think 2GB heap and 1-2 cores is sufficient.
>
> > 4. General HDFS question-- when the namenode dies, what happens to the
> > datanodes and how does that relate to Hbase?  E.g., can hbase continue
> > to
> > operate in a read-only mode (assuming no datanode/regionserver failures
> > post
> > namenode failure)?
>
> Today, HBase will probably die ungracefully once it does start to hit the
> NN.  There are some open JIRAs about HBase behavior under different HDFS
> faults and trying to be as graceful as possible when they happen, including
> HBASE-2183 about riding over an HDFS restart.
>
> >
> > Thanks for your help,
> > Jacques
>