You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Fernando Padilla <fe...@alum.mit.edu> on 2009/07/14 23:58:13 UTC

hbase/zookeeper

So.. what's the recommendation for zookeeper?

should I run zookeeper nodes on the same region servers?
should I run zookeeper nodes external to the region servers?
how much memory should I give zookeeper, if it's just used for hbase?

Re: hbase/zookeeper

Posted by Jonathan Gray <jl...@streamy.com>.
The DataNode is not especially memory-intensive (you want to leave 
memory to the OS so that it can do fs caching).  ZK is recommended to 
have 1GB, so I'd move the .5 from DN to ZK.

Otherwise that looks reasonable.

Fernando Padilla wrote:
> thank you!
> 
> I'll pay attention to the CPU load then.  Any tips about the memory 
> distribution?  This is what I'm expecting, but I'm a newb. :)
> 
> DataNode - 1.5G
> TaskTracker - .5G
> Zookeeper - .5G
> RegionServer - 2G
> M/R - 2G
> 
> 
> Jonathan Gray wrote:
>> IMO, you can fit those things into 6.5G without a problem.  Of course, 
>> the more you give it the better your performance.
>>
>> However, medium instances have only 2 cores... That's going to be a 
>> problem.  Under heavy load (especially in an upload/import situation) 
>> you will starve threads in at least one of these processes... At a 
>> minimum, you really want a core each for DN, ZK, RS and then your 
>> requirements for your MR tasks would depend on the nature of them.  If 
>> they are at all CPU intensive, then you need to be sure to dedicate 
>> sufficient resources to them.
>>
>> In general, we recommend XL instances because they are quad core. 
>> Otherwise you will likely run into issues with this many processes on 
>> two cores.
>>
>> JG
>>
>> Fernando Padilla wrote:
>>> OK, if you don't mind me stretching this simple conversation a bit 
>>> more..
>>>
>>> Say I use the medium ec2 instance.. that's about 7.5G of ram, so I 
>>> have abgout 6.5 total.
>>>
>>> On any one node I would have:
>>>
>>> DataNode
>>> TaskTracker
>>> Zookeeper
>>> RegionServer
>>> +Map/Reduce Tasks?
>>>
>>>
>>> What would your gut be for distributing the memory?
>>>
>>> Can I run my M/R Tasks all sharing one JVM to share the same memory, 
>>> or does each Map or Reduce have it's own JVM/Memory requirements?
>>>
>>>
>>> I'm thinking between 5 to 10 nodes.  I know that this seems stingy 
>>> for what you guys are used to.. but this is my worst case or minimum 
>>> allocation.. if need be I can plan to get more nodes and spread 
>>> around the load (bursting on heavy days, etc).. but I don't want to 
>>> plan/budget for a large number of nodes until we see good ROI, etc 
>>> etc etc..
>>>
>>>
>>>
>>> On 7/14/09 11:54 PM, Nitay wrote:
>>>> Yes, Ryan's right. While we recommend running ZooKeeper on separate 
>>>> hosts,
>>>> it is really only if you can afford to do so. Otherwise, choose some 
>>>> of your
>>>> region server machines and run ZooKeeper alongside those.
>>>>
>>>> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ry...@gmail.com>  
>>>> wrote:
>>>>
>>>>> You can probably host it all on one set of machines.  You'll need the
>>>>> large sized.
>>>>>
>>>>> Let us know how EC2 works, performance might be off due to the
>>>>> virtualization.
>>>>>
>>>>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu>
>>>>> wrote:
>>>>>> The reason I ask, is that I'm planning on setting up a small HBase
>>>>> cluster
>>>>>> in ec2..
>>>>>>
>>>>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>>>>> instances for Hbase.. it sounds lop-sided. :)
>>>>>>
>>>>>> Does anyone here have any experience with HBase in EC2?
>>>>>>
>>>>>>
>>>>>> Ryan Rawson wrote:
>>>>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram 
>>>>>>> per
>>>>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>>>>> but better save than sorry.
>>>>>>>
>>>>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com>  wrote:
>>>>>>>> Hi Fernando,
>>>>>>>>
>>>>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>>>>> Servers.
>>>>>>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>>>>>>> minimal
>>>>>>>> currently. However you definitely don't want it to swap and you 
>>>>>>>> want to
>>>>>>>> be
>>>>>>>> able to handle a large number of connections. A safe value would be
>>>>>>>> something like 1GB.
>>>>>>>>
>>>>>>>> -n
>>>>>>>>
>>>>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando 
>>>>>>>> Padilla<fe...@alum.mit.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>>>>
>>>>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>>>>> how much memory should I give zookeeper, if it's just used for 
>>>>>>>>> hbase?
>>>>>>>>>
>>>>
>>>
> 

Re: hbase/zookeeper

Posted by Michael Greene <mi...@gmail.com>.
This is the relevant graph.  I've heard 7 being recommended, but 5 or
7 seem to be the best options.
http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance

Michael

On Fri, Jul 17, 2009 at 3:04 PM, Andrew Purtell<ap...@apache.org> wrote:
> Hmm... Is that private communication or up on a Wiki somewhere? Or
> maybe in a mailing list archive? We should collect these tidbits into
> our wiki.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Ryan Rawson <ry...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 12:57:51 PM
> Subject: Re: hbase/zookeeper
>
> The ZK folks also recommend quorums of 5 nodes.  Said something about
> diminishing returns at 7 and 9...
>
> -ryan
>
> On Fri, Jul 17, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org> wrote:
>> Thanks. That's good advice.
>>
>> We tune our heap allocations based on metrics collected over typical
>> and peak usage cases.
>>
>>   - Andy
>>
>>
>>
>>
>>
>> ________________________________
>> From: Jonathan Gray <jg...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Friday, July 17, 2009 12:42:30 PM
>> Subject: Re: hbase/zookeeper
>>
>> ZK guys seem to say you should give it 1GB at least.
>>
>> This should not matter for 0.20.  In 0.21, our use of ZK will expand and it will need more memory.  If you plan on using ZK for anything besides HBase, make sure you give it more memory.  For now, you're probably okay with 256MB.
>>
>> Andrew Purtell wrote:
>>> That looks good to me, in line with the best practices that are gelling as
>>> we collectively gain operational experience.
>>> This is how we allocate RAM on our 8GB worker nodes:
>>>
>>>   Hadoop
>>>     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
>>>     map/reduce tasks - 200 MB (Hadoop default)
>>>
>>>   HBase
>>>     ZK           - 256 MB (JVM default)
>>>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>>>     RegionServer - 4 GB
>>>
>>> We have a Master and hot spare Master each running on one of the workers.
>>> Our workers are dual quad core so we have them configured for maximum
>>> concurrent task execution of 4 mappers and 2 reducers and we run the
>>> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
>>> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
>>> RegionServers ahead of them.
>>> Note that the Hadoop NameNode is a special case which runs the NN in a
>>> standalone configuration with block device level replication to a hot
>>> spare configured in the typical HA fashion: heartbeat monitoring,
>>> fencing via power control operations, virtual IP address and L3 fail
>>> over, etc.
>>> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
>>> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
>>> handle 1000s of clients, if the quorum peers are running on dedicated
>>> hardware. We are considering this type of deployment for the future.
>>> However, for now we colocate ZK quorum peers with (some) HBase
>>> regionservers.
>>> Our next generation will use 32GB. This can support aggressive caching
>>> and in memory tables.
>>>    - Andy
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Fernando Padilla <fe...@alum.mit.edu>
>>> To: hbase-user@hadoop.apache.org
>>> Sent: Friday, July 17, 2009 10:30:52 AM
>>> Subject: Re: hbase/zookeeper
>>>
>>> thank you!
>>>
>>> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
>>>
>>> DataNode - 1.5G
>>> TaskTracker - .5G
>>> Zookeeper - .5G
>>> RegionServer - 2G
>>> M/R - 2G
>>>
>>>
>>>
>>
>>
>>
>
>
>
>

Re: hbase/zookeeper

Posted by Andrew Purtell <ap...@apache.org>.
Hmm... Is that private communication or up on a Wiki somewhere? Or
maybe in a mailing list archive? We should collect these tidbits into
our wiki.

   - Andy




________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 17, 2009 12:57:51 PM
Subject: Re: hbase/zookeeper

The ZK folks also recommend quorums of 5 nodes.  Said something about
diminishing returns at 7 and 9...

-ryan

On Fri, Jul 17, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org> wrote:
> Thanks. That's good advice.
>
> We tune our heap allocations based on metrics collected over typical
> and peak usage cases.
>
>   - Andy
>
>
>
>
>
> ________________________________
> From: Jonathan Gray <jg...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 12:42:30 PM
> Subject: Re: hbase/zookeeper
>
> ZK guys seem to say you should give it 1GB at least.
>
> This should not matter for 0.20.  In 0.21, our use of ZK will expand and it will need more memory.  If you plan on using ZK for anything besides HBase, make sure you give it more memory.  For now, you're probably okay with 256MB.
>
> Andrew Purtell wrote:
>> That looks good to me, in line with the best practices that are gelling as
>> we collectively gain operational experience.
>> This is how we allocate RAM on our 8GB worker nodes:
>>
>>   Hadoop
>>     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
>>     map/reduce tasks - 200 MB (Hadoop default)
>>
>>   HBase
>>     ZK           - 256 MB (JVM default)
>>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>>     RegionServer - 4 GB
>>
>> We have a Master and hot spare Master each running on one of the workers.
>> Our workers are dual quad core so we have them configured for maximum
>> concurrent task execution of 4 mappers and 2 reducers and we run the
>> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
>> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
>> RegionServers ahead of them.
>> Note that the Hadoop NameNode is a special case which runs the NN in a
>> standalone configuration with block device level replication to a hot
>> spare configured in the typical HA fashion: heartbeat monitoring,
>> fencing via power control operations, virtual IP address and L3 fail
>> over, etc.
>> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
>> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
>> handle 1000s of clients, if the quorum peers are running on dedicated
>> hardware. We are considering this type of deployment for the future.
>> However, for now we colocate ZK quorum peers with (some) HBase
>> regionservers.
>> Our next generation will use 32GB. This can support aggressive caching
>> and in memory tables.
>>    - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Fernando Padilla <fe...@alum.mit.edu>
>> To: hbase-user@hadoop.apache.org
>> Sent: Friday, July 17, 2009 10:30:52 AM
>> Subject: Re: hbase/zookeeper
>>
>> thank you!
>>
>> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
>>
>> DataNode - 1.5G
>> TaskTracker - .5G
>> Zookeeper - .5G
>> RegionServer - 2G
>> M/R - 2G
>>
>>
>>
>
>
>



      

Re: hbase/zookeeper

Posted by Ryan Rawson <ry...@gmail.com>.
The ZK folks also recommend quorums of 5 nodes.  Said something about
diminishing returns at 7 and 9...

-ryan

On Fri, Jul 17, 2009 at 12:52 PM, Andrew Purtell<ap...@apache.org> wrote:
> Thanks. That's good advice.
>
> We tune our heap allocations based on metrics collected over typical
> and peak usage cases.
>
>   - Andy
>
>
>
>
>
> ________________________________
> From: Jonathan Gray <jg...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 12:42:30 PM
> Subject: Re: hbase/zookeeper
>
> ZK guys seem to say you should give it 1GB at least.
>
> This should not matter for 0.20.  In 0.21, our use of ZK will expand and it will need more memory.  If you plan on using ZK for anything besides HBase, make sure you give it more memory.  For now, you're probably okay with 256MB.
>
> Andrew Purtell wrote:
>> That looks good to me, in line with the best practices that are gelling as
>> we collectively gain operational experience.
>> This is how we allocate RAM on our 8GB worker nodes:
>>
>>   Hadoop
>>     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
>>     map/reduce tasks - 200 MB (Hadoop default)
>>
>>   HBase
>>     ZK           - 256 MB (JVM default)
>>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>>     RegionServer - 4 GB
>>
>> We have a Master and hot spare Master each running on one of the workers.
>> Our workers are dual quad core so we have them configured for maximum
>> concurrent task execution of 4 mappers and 2 reducers and we run the
>> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
>> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
>> RegionServers ahead of them.
>> Note that the Hadoop NameNode is a special case which runs the NN in a
>> standalone configuration with block device level replication to a hot
>> spare configured in the typical HA fashion: heartbeat monitoring,
>> fencing via power control operations, virtual IP address and L3 fail
>> over, etc.
>> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
>> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
>> handle 1000s of clients, if the quorum peers are running on dedicated
>> hardware. We are considering this type of deployment for the future.
>> However, for now we colocate ZK quorum peers with (some) HBase
>> regionservers.
>> Our next generation will use 32GB. This can support aggressive caching
>> and in memory tables.
>>    - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Fernando Padilla <fe...@alum.mit.edu>
>> To: hbase-user@hadoop.apache.org
>> Sent: Friday, July 17, 2009 10:30:52 AM
>> Subject: Re: hbase/zookeeper
>>
>> thank you!
>>
>> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
>>
>> DataNode - 1.5G
>> TaskTracker - .5G
>> Zookeeper - .5G
>> RegionServer - 2G
>> M/R - 2G
>>
>>
>>
>
>
>

Re: hbase/zookeeper

Posted by Andrew Purtell <ap...@apache.org>.
Thanks. That's good advice.

We tune our heap allocations based on metrics collected over typical
and peak usage cases. 

   - Andy





________________________________
From: Jonathan Gray <jg...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 17, 2009 12:42:30 PM
Subject: Re: hbase/zookeeper

ZK guys seem to say you should give it 1GB at least.

This should not matter for 0.20.  In 0.21, our use of ZK will expand and it will need more memory.  If you plan on using ZK for anything besides HBase, make sure you give it more memory.  For now, you're probably okay with 256MB.

Andrew Purtell wrote:
> That looks good to me, in line with the best practices that are gelling as
> we collectively gain operational experience. 
> This is how we allocate RAM on our 8GB worker nodes:
> 
>   Hadoop
>     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
>     map/reduce tasks - 200 MB (Hadoop default)
> 
>   HBase
>     ZK           - 256 MB (JVM default)
>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>     RegionServer - 4 GB
> 
> We have a Master and hot spare Master each running on one of the workers. 
> Our workers are dual quad core so we have them configured for maximum
> concurrent task execution of 4 mappers and 2 reducers and we run the
> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> RegionServers ahead of them. 
> Note that the Hadoop NameNode is a special case which runs the NN in a
> standalone configuration with block device level replication to a hot
> spare configured in the typical HA fashion: heartbeat monitoring,
> fencing via power control operations, virtual IP address and L3 fail
> over, etc. 
> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> handle 1000s of clients, if the quorum peers are running on dedicated
> hardware. We are considering this type of deployment for the future.
> However, for now we colocate ZK quorum peers with (some) HBase
> regionservers. 
> Our next generation will use 32GB. This can support aggressive caching
> and in memory tables. 
>    - Andy
> 
> 
> 
> 
> ________________________________
> From: Fernando Padilla <fe...@alum.mit.edu>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 10:30:52 AM
> Subject: Re: hbase/zookeeper
> 
> thank you!
> 
> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
> 
> DataNode - 1.5G
> TaskTracker - .5G
> Zookeeper - .5G
> RegionServer - 2G
> M/R - 2G 
> 
> 
>      


      

Re: hbase/zookeeper

Posted by Jonathan Gray <jg...@apache.org>.
ZK guys seem to say you should give it 1GB at least.

This should not matter for 0.20.  In 0.21, our use of ZK will expand and 
it will need more memory.  If you plan on using ZK for anything besides 
HBase, make sure you give it more memory.  For now, you're probably okay 
with 256MB.

Andrew Purtell wrote:
> That looks good to me, in line with the best practices that are gelling as
> we collectively gain operational experience. 
> 
> This is how we allocate RAM on our 8GB worker nodes:
> 
>   Hadoop
>     DataNode     - 1 GB 
>     TaskTracker  - 256 MB (JVM default)
>     map/reduce tasks - 200 MB (Hadoop default)
> 
>   HBase
>     ZK           - 256 MB (JVM default)
>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>     RegionServer - 4 GB
> 
> We have a Master and hot spare Master each running on one of the 
> workers. 
> 
> Our workers are dual quad core so we have them configured for maximum
> concurrent task execution of 4 mappers and 2 reducers and we run the
> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> RegionServers ahead of them. 
> 
> Note that the Hadoop NameNode is a special case which runs the NN in a
> standalone configuration with block device level replication to a hot
> spare configured in the typical HA fashion: heartbeat monitoring,
> fencing via power control operations, virtual IP address and L3 fail
> over, etc. 
> 
> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> handle 1000s of clients, if the quorum peers are running on dedicated
> hardware. We are considering this type of deployment for the future.
> However, for now we colocate ZK quorum peers with (some) HBase
> regionservers. 
> 
> Our next generation will use 32GB. This can support aggressive caching
> and in memory tables. 
> 
>    - Andy
> 
> 
> 
> 
> ________________________________
> From: Fernando Padilla <fe...@alum.mit.edu>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 10:30:52 AM
> Subject: Re: hbase/zookeeper
> 
> thank you!
> 
> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
> 
> DataNode - 1.5G
> TaskTracker - .5G
> Zookeeper - .5G
> RegionServer - 2G
> M/R - 2G 
> 
> 
> 
>       

Re: hbase/zookeeper

Posted by Ninad Raut <hb...@gmail.com>.
Try the configuration which we used in our EC2 cluster (medium large
machines) with Hbase. It helps avoid swaps and scanner timeouts for long
running MR Jobs... came up with this after lots of tuning.. I hope it helps
<property>
  <name>dfs.replication</name>
  <value>3</value>
 </property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>20</value>
  <description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
  <name>mapred.task.timeout</name>
  <value>0</value>
  <description>The number of milliseconds before a task will be terminated
if it neither reads an input, writes an output, nor updates its status
string.
  </description>
</property>
<property>
  <name>mapred.reduce.max.attempts</name>
  <value>1</value>
  <description>Expert: The maximum number of attempts per reduce task. In
other words, framework will try to execute a reduce task these many number
of times before giving up on it.
  </description>
</property>
<property>
  <name>mapred.job.reuse.jvm.num.tasks</name>
  <value>-1</value>
  <description>How many tasks to run per jvm. If -1 then no limit at all.
  </description>
</property>
<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>2048</value>
</property>
<property>
  <name>dfs.datanode.handler.count</name>
  <value>10</value>
</property>
property>
  <name>mapred.tasktracker.expiry.interval</name>
  <value>36000</value>
  <description>How many tasks to run per jvm. If -1 then no limit at all.
  </description>
</property>
<property>
    <name>hbase.master.lease.period</name>
    <value>360000</value>
<description>HMaster server lease period in milliseconds. Default is 120
seconds.  Region servers must report in within this period else they are
considered dead.  On loaded cluster, may need to up this
    period.</description>
 </property>
<property>
    <name>hbase.regionserver.lease.period</name>
    <value>36000000</value>
    <description>HRegion server lease period in milliseconds. Default is 60
seconds. Clients must report in within this period else they are considered
dead.</description>
  </property>
  <property>
<property>
    <name>hbase.hregion.memcache.flush.size</name>
    <value>1048576</value>
    <description>
    A HRegion memcache will be flushed to disk if size of the memcache
exceeds this number of bytes.  Value is checked by a thread that runs every
hbase.server.thread.wakefrequency.
    </description>
  </property>
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>16777216</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
    </description>
  </property>



On Mon, Jul 20, 2009 at 12:15 AM, Andrew Purtell <ap...@apache.org>wrote:

> > How much memory are you giving the NameNode? and the SecondaryNameNode?
>
> We give the NN 4 GB and the 2NN the default 1 GB. Technically according
> to the Hadoop manual (which suggests the 2NN's task is as resource
> intensive as the NN's) this is wrong, but with the HA configuration of
> the NN, in our setup the 2NN is not critical, and it functions well
> enough. I'm not even sure we need it. Also given the current number of
> files in the filesystem, not all of the 4 GB heap allocated to the NN is
> actually required.
>
> > but do they take a lot of CPU?
>
> Because everything falls apart if HDFS falls apart, the NN deserves
> special consideration.
>
> It depends on the particulars of your workload but in general an
> environment which includes HBase will be more taxing on the balance.
>
> I think RAM is the critical resource for the NN. For example, to my
> understanding, Facebook runs at least one cluster with >= 20 GB heap
> for the NN. It obviously tracks the block locations for millions of
> files. Give your NN a lot of RAM in the beginning and there will be
> plenty of headroom to scale up into -- you can add more datanodes
> over time in a seamless manner and won't need to bring down HDFS to
> upgrade RAM on the NN.
>
> > if i ignore HA could they share a box with other services?
>
> If you ignore HA, my advice is to run the NN and the HBase Master on
> the same node. The Master spends most of its time suspended waiting
> for work, so this would be a good match. I also run a DataNode in
> addition to the NN and Master on one node on my test cluster and have
> never had an incident. Your mileage may vary. Something like this is
> suitable for testing only.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Fernando Padilla <fe...@alum.mit.edu>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 7:37:04 PM
> Subject: Re: hbase/zookeeper
>
> Ok.. so it seems like ZK and TT can be smaller than we thought.. at least
> it's an option. :)
>
> How much memory are you giving the NameNode? and the SecondaryNameNode? It
> looks like those are beefy on your setup for HA purposes.. but do they take
> a lot of CPU? if i ignore HA could they share a box with other services?
>
>
> Andrew Purtell wrote:
> > That looks good to me, in line with the best practices that are gelling
> as
> > we collectively gain operational experience.
> > This is how we allocate RAM on our 8GB worker nodes:
> >
> >   Hadoop
> >     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
> >     map/reduce tasks - 200 MB (Hadoop default)
> >
> >   HBase
> >     ZK           - 256 MB (JVM default)
> >     Master       - 1 GB (HBase default, but actual use is < 500MB)
> >     RegionServer - 4 GB
> >
> > We have a Master and hot spare Master each running on one of the workers.
> > Our workers are dual quad core so we have them configured for maximum
> > concurrent task execution of 4 mappers and 2 reducers and we run the
> > TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> > the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> > RegionServers ahead of them.
> > Note that the Hadoop NameNode is a special case which runs the NN in a
> > standalone configuration with block device level replication to a hot
> > spare configured in the typical HA fashion: heartbeat monitoring,
> > fencing via power control operations, virtual IP address and L3 fail
> > over, etc.
> > Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> > reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> > handle 1000s of clients, if the quorum peers are running on dedicated
> > hardware. We are considering this type of deployment for the future.
> > However, for now we colocate ZK quorum peers with (some) HBase
> > regionservers.
> > Our next generation will use 32GB. This can support aggressive caching
> > and in memory tables.
> >    - Andy
> >
> >
> >
> >
> > ________________________________
> > From: Fernando Padilla <fe...@alum.mit.edu>
> > To: hbase-user@hadoop.apache.org
> > Sent: Friday, July 17, 2009 10:30:52 AM
> > Subject: Re: hbase/zookeeper
> >
> > thank you!
> >
> > I'll pay attention to the CPU load then.  Any tips about the memory
> distribution?  This is what I'm expecting, but I'm a newb. :)
> >
> > DataNode - 1.5G
> > TaskTracker - .5G
> > Zookeeper - .5G
> > RegionServer - 2G
> > M/R - 2G
> >
> >
> >
>
>
>
>

Re: hbase/zookeeper

Posted by Andrew Purtell <ap...@apache.org>.
> How much memory are you giving the NameNode? and the SecondaryNameNode?

We give the NN 4 GB and the 2NN the default 1 GB. Technically according
to the Hadoop manual (which suggests the 2NN's task is as resource 
intensive as the NN's) this is wrong, but with the HA configuration of
the NN, in our setup the 2NN is not critical, and it functions well
enough. I'm not even sure we need it. Also given the current number of
files in the filesystem, not all of the 4 GB heap allocated to the NN is
actually required. 

> but do they take a lot of CPU? 

Because everything falls apart if HDFS falls apart, the NN deserves
special consideration. 

It depends on the particulars of your workload but in general an
environment which includes HBase will be more taxing on the balance.

I think RAM is the critical resource for the NN. For example, to my
understanding, Facebook runs at least one cluster with >= 20 GB heap
for the NN. It obviously tracks the block locations for millions of
files. Give your NN a lot of RAM in the beginning and there will be
plenty of headroom to scale up into -- you can add more datanodes 
over time in a seamless manner and won't need to bring down HDFS to
upgrade RAM on the NN. 

> if i ignore HA could they share a box with other services?

If you ignore HA, my advice is to run the NN and the HBase Master on
the same node. The Master spends most of its time suspended waiting
for work, so this would be a good match. I also run a DataNode in
addition to the NN and Master on one node on my test cluster and have
never had an incident. Your mileage may vary. Something like this is
suitable for testing only. 

   - Andy




________________________________
From: Fernando Padilla <fe...@alum.mit.edu>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 17, 2009 7:37:04 PM
Subject: Re: hbase/zookeeper

Ok.. so it seems like ZK and TT can be smaller than we thought.. at least it's an option. :)

How much memory are you giving the NameNode? and the SecondaryNameNode? It looks like those are beefy on your setup for HA purposes.. but do they take a lot of CPU? if i ignore HA could they share a box with other services?


Andrew Purtell wrote:
> That looks good to me, in line with the best practices that are gelling as
> we collectively gain operational experience. 
> This is how we allocate RAM on our 8GB worker nodes:
> 
>   Hadoop
>     DataNode     - 1 GB     TaskTracker  - 256 MB (JVM default)
>     map/reduce tasks - 200 MB (Hadoop default)
> 
>   HBase
>     ZK           - 256 MB (JVM default)
>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>     RegionServer - 4 GB
> 
> We have a Master and hot spare Master each running on one of the workers. 
> Our workers are dual quad core so we have them configured for maximum
> concurrent task execution of 4 mappers and 2 reducers and we run the
> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> RegionServers ahead of them. 
> Note that the Hadoop NameNode is a special case which runs the NN in a
> standalone configuration with block device level replication to a hot
> spare configured in the typical HA fashion: heartbeat monitoring,
> fencing via power control operations, virtual IP address and L3 fail
> over, etc. 
> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> handle 1000s of clients, if the quorum peers are running on dedicated
> hardware. We are considering this type of deployment for the future.
> However, for now we colocate ZK quorum peers with (some) HBase
> regionservers. 
> Our next generation will use 32GB. This can support aggressive caching
> and in memory tables. 
>    - Andy
> 
> 
> 
> 
> ________________________________
> From: Fernando Padilla <fe...@alum.mit.edu>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 10:30:52 AM
> Subject: Re: hbase/zookeeper
> 
> thank you!
> 
> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
> 
> DataNode - 1.5G
> TaskTracker - .5G
> Zookeeper - .5G
> RegionServer - 2G
> M/R - 2G 
> 
> 
>      


      

Re: hbase/zookeeper

Posted by Fernando Padilla <fe...@alum.mit.edu>.
Ok.. so it seems like ZK and TT can be smaller than we thought.. at 
least it's an option. :)

How much memory are you giving the NameNode? and the SecondaryNameNode? 
It looks like those are beefy on your setup for HA purposes.. but do 
they take a lot of CPU? if i ignore HA could they share a box with other 
services?


Andrew Purtell wrote:
> That looks good to me, in line with the best practices that are gelling as
> we collectively gain operational experience. 
> 
> This is how we allocate RAM on our 8GB worker nodes:
> 
>   Hadoop
>     DataNode     - 1 GB 
>     TaskTracker  - 256 MB (JVM default)
>     map/reduce tasks - 200 MB (Hadoop default)
> 
>   HBase
>     ZK           - 256 MB (JVM default)
>     Master       - 1 GB (HBase default, but actual use is < 500MB)
>     RegionServer - 4 GB
> 
> We have a Master and hot spare Master each running on one of the 
> workers. 
> 
> Our workers are dual quad core so we have them configured for maximum
> concurrent task execution of 4 mappers and 2 reducers and we run the
> TaskTracker (therefore, also the tasks) with niceness +10 to hint to
> the OS the importance of scheduling the DataNodes, ZK quorum peers, or
> RegionServers ahead of them. 
> 
> Note that the Hadoop NameNode is a special case which runs the NN in a
> standalone configuration with block device level replication to a hot
> spare configured in the typical HA fashion: heartbeat monitoring,
> fencing via power control operations, virtual IP address and L3 fail
> over, etc. 
> 
> Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
> reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
> handle 1000s of clients, if the quorum peers are running on dedicated
> hardware. We are considering this type of deployment for the future.
> However, for now we colocate ZK quorum peers with (some) HBase
> regionservers. 
> 
> Our next generation will use 32GB. This can support aggressive caching
> and in memory tables. 
> 
>    - Andy
> 
> 
> 
> 
> ________________________________
> From: Fernando Padilla <fe...@alum.mit.edu>
> To: hbase-user@hadoop.apache.org
> Sent: Friday, July 17, 2009 10:30:52 AM
> Subject: Re: hbase/zookeeper
> 
> thank you!
> 
> I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)
> 
> DataNode - 1.5G
> TaskTracker - .5G
> Zookeeper - .5G
> RegionServer - 2G
> M/R - 2G 
> 
> 
> 
>       

Re: hbase/zookeeper

Posted by Andrew Purtell <ap...@apache.org>.
That looks good to me, in line with the best practices that are gelling as
we collectively gain operational experience. 

This is how we allocate RAM on our 8GB worker nodes:

  Hadoop
    DataNode     - 1 GB 
    TaskTracker  - 256 MB (JVM default)
    map/reduce tasks - 200 MB (Hadoop default)

  HBase
    ZK           - 256 MB (JVM default)
    Master       - 1 GB (HBase default, but actual use is < 500MB)
    RegionServer - 4 GB

We have a Master and hot spare Master each running on one of the 
workers. 

Our workers are dual quad core so we have them configured for maximum
concurrent task execution of 4 mappers and 2 reducers and we run the
TaskTracker (therefore, also the tasks) with niceness +10 to hint to
the OS the importance of scheduling the DataNodes, ZK quorum peers, or
RegionServers ahead of them. 

Note that the Hadoop NameNode is a special case which runs the NN in a
standalone configuration with block device level replication to a hot
spare configured in the typical HA fashion: heartbeat monitoring,
fencing via power control operations, virtual IP address and L3 fail
over, etc. 

Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
handle 1000s of clients, if the quorum peers are running on dedicated
hardware. We are considering this type of deployment for the future.
However, for now we colocate ZK quorum peers with (some) HBase
regionservers. 

Our next generation will use 32GB. This can support aggressive caching
and in memory tables. 

   - Andy




________________________________
From: Fernando Padilla <fe...@alum.mit.edu>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 17, 2009 10:30:52 AM
Subject: Re: hbase/zookeeper

thank you!

I'll pay attention to the CPU load then.  Any tips about the memory distribution?  This is what I'm expecting, but I'm a newb. :)

DataNode - 1.5G
TaskTracker - .5G
Zookeeper - .5G
RegionServer - 2G
M/R - 2G 



      

Re: hbase/zookeeper

Posted by Fernando Padilla <fe...@alum.mit.edu>.
thank you!

I'll pay attention to the CPU load then.  Any tips about the memory 
distribution?  This is what I'm expecting, but I'm a newb. :)

DataNode - 1.5G
TaskTracker - .5G
Zookeeper - .5G
RegionServer - 2G
M/R - 2G


Jonathan Gray wrote:
> IMO, you can fit those things into 6.5G without a problem.  Of course, 
> the more you give it the better your performance.
> 
> However, medium instances have only 2 cores... That's going to be a 
> problem.  Under heavy load (especially in an upload/import situation) 
> you will starve threads in at least one of these processes... At a 
> minimum, you really want a core each for DN, ZK, RS and then your 
> requirements for your MR tasks would depend on the nature of them.  If 
> they are at all CPU intensive, then you need to be sure to dedicate 
> sufficient resources to them.
> 
> In general, we recommend XL instances because they are quad core. 
> Otherwise you will likely run into issues with this many processes on 
> two cores.
> 
> JG
> 
> Fernando Padilla wrote:
>> OK, if you don't mind me stretching this simple conversation a bit more..
>>
>> Say I use the medium ec2 instance.. that's about 7.5G of ram, so I 
>> have abgout 6.5 total.
>>
>> On any one node I would have:
>>
>> DataNode
>> TaskTracker
>> Zookeeper
>> RegionServer
>> +Map/Reduce Tasks?
>>
>>
>> What would your gut be for distributing the memory?
>>
>> Can I run my M/R Tasks all sharing one JVM to share the same memory, 
>> or does each Map or Reduce have it's own JVM/Memory requirements?
>>
>>
>> I'm thinking between 5 to 10 nodes.  I know that this seems stingy for 
>> what you guys are used to.. but this is my worst case or minimum 
>> allocation.. if need be I can plan to get more nodes and spread around 
>> the load (bursting on heavy days, etc).. but I don't want to 
>> plan/budget for a large number of nodes until we see good ROI, etc etc 
>> etc..
>>
>>
>>
>> On 7/14/09 11:54 PM, Nitay wrote:
>>> Yes, Ryan's right. While we recommend running ZooKeeper on separate 
>>> hosts,
>>> it is really only if you can afford to do so. Otherwise, choose some 
>>> of your
>>> region server machines and run ZooKeeper alongside those.
>>>
>>> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ry...@gmail.com>  
>>> wrote:
>>>
>>>> You can probably host it all on one set of machines.  You'll need the
>>>> large sized.
>>>>
>>>> Let us know how EC2 works, performance might be off due to the
>>>> virtualization.
>>>>
>>>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu>
>>>> wrote:
>>>>> The reason I ask, is that I'm planning on setting up a small HBase
>>>> cluster
>>>>> in ec2..
>>>>>
>>>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>>>> instances for Hbase.. it sounds lop-sided. :)
>>>>>
>>>>> Does anyone here have any experience with HBase in EC2?
>>>>>
>>>>>
>>>>> Ryan Rawson wrote:
>>>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>>>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>>>> but better save than sorry.
>>>>>>
>>>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com>  wrote:
>>>>>>> Hi Fernando,
>>>>>>>
>>>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>>>> Servers.
>>>>>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>>>>>> minimal
>>>>>>> currently. However you definitely don't want it to swap and you 
>>>>>>> want to
>>>>>>> be
>>>>>>> able to handle a large number of connections. A safe value would be
>>>>>>> something like 1GB.
>>>>>>>
>>>>>>> -n
>>>>>>>
>>>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<fe...@alum.mit.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>>>
>>>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>>>> how much memory should I give zookeeper, if it's just used for 
>>>>>>>> hbase?
>>>>>>>>
>>>
>>

Re: hbase/zookeeper

Posted by Jonathan Gray <jl...@streamy.com>.
IMO, you can fit those things into 6.5G without a problem.  Of course, 
the more you give it the better your performance.

However, medium instances have only 2 cores... That's going to be a 
problem.  Under heavy load (especially in an upload/import situation) 
you will starve threads in at least one of these processes... At a 
minimum, you really want a core each for DN, ZK, RS and then your 
requirements for your MR tasks would depend on the nature of them.  If 
they are at all CPU intensive, then you need to be sure to dedicate 
sufficient resources to them.

In general, we recommend XL instances because they are quad core. 
Otherwise you will likely run into issues with this many processes on 
two cores.

JG

Fernando Padilla wrote:
> OK, if you don't mind me stretching this simple conversation a bit more..
> 
> Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have 
> abgout 6.5 total.
> 
> On any one node I would have:
> 
> DataNode
> TaskTracker
> Zookeeper
> RegionServer
> +Map/Reduce Tasks?
> 
> 
> What would your gut be for distributing the memory?
> 
> Can I run my M/R Tasks all sharing one JVM to share the same memory, or 
> does each Map or Reduce have it's own JVM/Memory requirements?
> 
> 
> I'm thinking between 5 to 10 nodes.  I know that this seems stingy for 
> what you guys are used to.. but this is my worst case or minimum 
> allocation.. if need be I can plan to get more nodes and spread around 
> the load (bursting on heavy days, etc).. but I don't want to plan/budget 
> for a large number of nodes until we see good ROI, etc etc etc..
> 
> 
> 
> On 7/14/09 11:54 PM, Nitay wrote:
>> Yes, Ryan's right. While we recommend running ZooKeeper on separate 
>> hosts,
>> it is really only if you can afford to do so. Otherwise, choose some 
>> of your
>> region server machines and run ZooKeeper alongside those.
>>
>> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ry...@gmail.com>  wrote:
>>
>>> You can probably host it all on one set of machines.  You'll need the
>>> large sized.
>>>
>>> Let us know how EC2 works, performance might be off due to the
>>> virtualization.
>>>
>>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu>
>>> wrote:
>>>> The reason I ask, is that I'm planning on setting up a small HBase
>>> cluster
>>>> in ec2..
>>>>
>>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>>> instances for Hbase.. it sounds lop-sided. :)
>>>>
>>>> Does anyone here have any experience with HBase in EC2?
>>>>
>>>>
>>>> Ryan Rawson wrote:
>>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>>> but better save than sorry.
>>>>>
>>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com>  wrote:
>>>>>> Hi Fernando,
>>>>>>
>>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>>> Servers.
>>>>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>>>>> minimal
>>>>>> currently. However you definitely don't want it to swap and you 
>>>>>> want to
>>>>>> be
>>>>>> able to handle a large number of connections. A safe value would be
>>>>>> something like 1GB.
>>>>>>
>>>>>> -n
>>>>>>
>>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<fe...@alum.mit.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>>
>>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>>> how much memory should I give zookeeper, if it's just used for 
>>>>>>> hbase?
>>>>>>>
>>
> 

Re: hbase/zookeeper

Posted by Fernando Padilla <fe...@alum.mit.edu>.
OK, if you don't mind me stretching this simple conversation a bit more..

Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have 
abgout 6.5 total.

On any one node I would have:

DataNode
TaskTracker
Zookeeper
RegionServer
+Map/Reduce Tasks?


What would your gut be for distributing the memory?

Can I run my M/R Tasks all sharing one JVM to share the same memory, or 
does each Map or Reduce have it's own JVM/Memory requirements?


I'm thinking between 5 to 10 nodes.  I know that this seems stingy for 
what you guys are used to.. but this is my worst case or minimum 
allocation.. if need be I can plan to get more nodes and spread around 
the load (bursting on heavy days, etc).. but I don't want to plan/budget 
for a large number of nodes until we see good ROI, etc etc etc..



On 7/14/09 11:54 PM, Nitay wrote:
> Yes, Ryan's right. While we recommend running ZooKeeper on separate hosts,
> it is really only if you can afford to do so. Otherwise, choose some of your
> region server machines and run ZooKeeper alongside those.
>
> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ry...@gmail.com>  wrote:
>
>> You can probably host it all on one set of machines.  You'll need the
>> large sized.
>>
>> Let us know how EC2 works, performance might be off due to the
>> virtualization.
>>
>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu>
>> wrote:
>>> The reason I ask, is that I'm planning on setting up a small HBase
>> cluster
>>> in ec2..
>>>
>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>> instances for Hbase.. it sounds lop-sided. :)
>>>
>>> Does anyone here have any experience with HBase in EC2?
>>>
>>>
>>> Ryan Rawson wrote:
>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>> but better save than sorry.
>>>>
>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com>  wrote:
>>>>> Hi Fernando,
>>>>>
>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>> Servers.
>>>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>>>> minimal
>>>>> currently. However you definitely don't want it to swap and you want to
>>>>> be
>>>>> able to handle a large number of connections. A safe value would be
>>>>> something like 1GB.
>>>>>
>>>>> -n
>>>>>
>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<fe...@alum.mit.edu>
>>>>> wrote:
>>>>>
>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>
>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>> how much memory should I give zookeeper, if it's just used for hbase?
>>>>>>
>

Re: hbase/zookeeper

Posted by Nitay <ni...@gmail.com>.
Yes, Ryan's right. While we recommend running ZooKeeper on separate hosts,
it is really only if you can afford to do so. Otherwise, choose some of your
region server machines and run ZooKeeper alongside those.

On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson <ry...@gmail.com> wrote:

> You can probably host it all on one set of machines.  You'll need the
> large sized.
>
> Let us know how EC2 works, performance might be off due to the
> virtualization.
>
> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu>
> wrote:
> > The reason I ask, is that I'm planning on setting up a small HBase
> cluster
> > in ec2..
> >
> > having 3 to 5 instances just for zookeeper, while having only 3 to 5
> > instances for Hbase.. it sounds lop-sided. :)
> >
> > Does anyone here have any experience with HBase in EC2?
> >
> >
> > Ryan Rawson wrote:
> >>
> >> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
> >> regionserver.  I used to run 1gb, and never had problems. Now with
> >> hbase managing the quorum I have 5gb ram, and its probalby over kill
> >> but better save than sorry.
> >>
> >> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com> wrote:
> >>>
> >>> Hi Fernando,
> >>>
> >>> It is recommended that you run ZooKeeper separate from the Region
> >>> Servers.
> >>> On the memory side, our use of ZooKeeper in terms of data stored is
> >>> minimal
> >>> currently. However you definitely don't want it to swap and you want to
> >>> be
> >>> able to handle a large number of connections. A safe value would be
> >>> something like 1GB.
> >>>
> >>> -n
> >>>
> >>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla <fe...@alum.mit.edu>
> >>> wrote:
> >>>
> >>>> So.. what's the recommendation for zookeeper?
> >>>>
> >>>> should I run zookeeper nodes on the same region servers?
> >>>> should I run zookeeper nodes external to the region servers?
> >>>> how much memory should I give zookeeper, if it's just used for hbase?
> >>>>
> >
>

Re: hbase/zookeeper

Posted by Ryan Rawson <ry...@gmail.com>.
You can probably host it all on one set of machines.  You'll need the
large sized.

Let us know how EC2 works, performance might be off due to the virtualization.

On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fe...@alum.mit.edu> wrote:
> The reason I ask, is that I'm planning on setting up a small HBase cluster
> in ec2..
>
> having 3 to 5 instances just for zookeeper, while having only 3 to 5
> instances for Hbase.. it sounds lop-sided. :)
>
> Does anyone here have any experience with HBase in EC2?
>
>
> Ryan Rawson wrote:
>>
>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>> regionserver.  I used to run 1gb, and never had problems. Now with
>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>> but better save than sorry.
>>
>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com> wrote:
>>>
>>> Hi Fernando,
>>>
>>> It is recommended that you run ZooKeeper separate from the Region
>>> Servers.
>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>> minimal
>>> currently. However you definitely don't want it to swap and you want to
>>> be
>>> able to handle a large number of connections. A safe value would be
>>> something like 1GB.
>>>
>>> -n
>>>
>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla <fe...@alum.mit.edu>
>>> wrote:
>>>
>>>> So.. what's the recommendation for zookeeper?
>>>>
>>>> should I run zookeeper nodes on the same region servers?
>>>> should I run zookeeper nodes external to the region servers?
>>>> how much memory should I give zookeeper, if it's just used for hbase?
>>>>
>

Re: hbase/zookeeper

Posted by Fernando Padilla <fe...@alum.mit.edu>.
The reason I ask, is that I'm planning on setting up a small HBase 
cluster in ec2..

having 3 to 5 instances just for zookeeper, while having only 3 to 5 
instances for Hbase.. it sounds lop-sided. :)

Does anyone here have any experience with HBase in EC2?


Ryan Rawson wrote:
> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
> regionserver.  I used to run 1gb, and never had problems. Now with
> hbase managing the quorum I have 5gb ram, and its probalby over kill
> but better save than sorry.
> 
> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com> wrote:
>> Hi Fernando,
>>
>> It is recommended that you run ZooKeeper separate from the Region Servers.
>> On the memory side, our use of ZooKeeper in terms of data stored is minimal
>> currently. However you definitely don't want it to swap and you want to be
>> able to handle a large number of connections. A safe value would be
>> something like 1GB.
>>
>> -n
>>
>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla <fe...@alum.mit.edu> wrote:
>>
>>> So.. what's the recommendation for zookeeper?
>>>
>>> should I run zookeeper nodes on the same region servers?
>>> should I run zookeeper nodes external to the region servers?
>>> how much memory should I give zookeeper, if it's just used for hbase?
>>>

Re: hbase/zookeeper

Posted by Ryan Rawson <ry...@gmail.com>.
I run my ZK quorum on my regionservers, but I also have 16 GB ram per
regionserver.  I used to run 1gb, and never had problems. Now with
hbase managing the quorum I have 5gb ram, and its probalby over kill
but better save than sorry.

On Tue, Jul 14, 2009 at 6:07 PM, Nitay<ni...@gmail.com> wrote:
> Hi Fernando,
>
> It is recommended that you run ZooKeeper separate from the Region Servers.
> On the memory side, our use of ZooKeeper in terms of data stored is minimal
> currently. However you definitely don't want it to swap and you want to be
> able to handle a large number of connections. A safe value would be
> something like 1GB.
>
> -n
>
> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla <fe...@alum.mit.edu> wrote:
>
>> So.. what's the recommendation for zookeeper?
>>
>> should I run zookeeper nodes on the same region servers?
>> should I run zookeeper nodes external to the region servers?
>> how much memory should I give zookeeper, if it's just used for hbase?
>>
>

Re: hbase/zookeeper

Posted by Nitay <ni...@gmail.com>.
Hi Fernando,

It is recommended that you run ZooKeeper separate from the Region Servers.
On the memory side, our use of ZooKeeper in terms of data stored is minimal
currently. However you definitely don't want it to swap and you want to be
able to handle a large number of connections. A safe value would be
something like 1GB.

-n

On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla <fe...@alum.mit.edu> wrote:

> So.. what's the recommendation for zookeeper?
>
> should I run zookeeper nodes on the same region servers?
> should I run zookeeper nodes external to the region servers?
> how much memory should I give zookeeper, if it's just used for hbase?
>