You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by MrE <el...@msn.com> on 2015/08/20 17:32:08 UTC

HBase on HDFS: proper way to setup

Hello,

I'm new to HBase, so pardon the stupid question.
Hbase is meant to run on HDFS I presume, although it is not the default on
the 'single host' setup.

My question is: assuming I have a HDFS cluster setup for storage (just HDFS)
 
What is the rule of thumb for deployment of HBase instances: should I have a
HBase instance on each HDFS node? 
I assume the HBase instances should be close to the data to avoid network
latencies, but do I need a HBase instance on each datanode? 
Is it any useful to have more HBase nodes than HDFS nodes?

All the basic tutorials explain setting up HBase on local fs, and then
explain that to setup as a cluster 'just point to HDFS' for storage, but I
haven't found clear explanation of how all these nodes should be arranged
together to be efficient.

Thanks for the help.
E



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase on HDFS: proper way to setup

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hello E,

You got it right :) HBase will work much more efficiently if RegionServers
are co-located with the DataNodes, e.g. 1:1 ratio and thats what in most
deployments HBase ops do. However, I've seen deployments where ops choose
to deploy less RegionServers than DataNodes or vice versa, but there are
more caveats of having less RSs than DNs specially due re-balancing of HDFS
blocks or when a RS goes down, etc. and that deployment mode usually causes
more problems. Deploying multiple RSs on top of a single DN node is
possible but it depends on your workload and if the effort to get it
"right" is worth.

cheers,
esteban.




--
Cloudera, Inc.


On Thu, Aug 20, 2015 at 8:32 AM, MrE <el...@msn.com> wrote:

> Hello,
>
> I'm new to HBase, so pardon the stupid question.
> Hbase is meant to run on HDFS I presume, although it is not the default on
> the 'single host' setup.
>
> My question is: assuming I have a HDFS cluster setup for storage (just
> HDFS)
>
> What is the rule of thumb for deployment of HBase instances: should I have
> a
> HBase instance on each HDFS node?
> I assume the HBase instances should be close to the data to avoid network
> latencies, but do I need a HBase instance on each datanode?
> Is it any useful to have more HBase nodes than HDFS nodes?
>
> All the basic tutorials explain setting up HBase on local fs, and then
> explain that to setup as a cluster 'just point to HDFS' for storage, but I
> haven't found clear explanation of how all these nodes should be arranged
> together to be efficient.
>
> Thanks for the help.
> E
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

RE: HBase on HDFS: proper way to setup

Posted by MrE <el...@msn.com>.
Thanks for the very quick reply.
I am only planning right now, but i have a CoreOS based cluster, with HDFS units and HBase units. I just want to know if I should have one of each on each node. If it's ok to just have fewer HBase instances at first, scale up later if needed? I guess the question is do I 'need' to have a HBase instance per HDFS node, or is it totally up to me and the requirements of the application? 
As for more HBase nodes than HDFS, i was thinking that for search intensive applications it may be useful to have more 'search' nodes than data nodes, but maybe this doesn't apply to HBase well.
Thanks

Date: Thu, 20 Aug 2015 08:42:40 -0700
From: ml-node+s679495n4074048h43@n3.nabble.com
To: eleroy@msn.com
Subject: Re: HBase on HDFS: proper way to setup



	Whether having HBase instance on each data node depends on the amount of

data you have and access pattern you expect.


bq. Is it any useful to have more HBase nodes than HDFS nodes?


I have never seen the above setup.


Do you have an hdfs cluster already ? Can you let us know your use case ?


Cheers


On Thu, Aug 20, 2015 at 8:32 AM, MrE <[hidden email]> wrote:


> Hello,

>

> I'm new to HBase, so pardon the stupid question.

> Hbase is meant to run on HDFS I presume, although it is not the default on

> the 'single host' setup.

>

> My question is: assuming I have a HDFS cluster setup for storage (just

> HDFS)

>

> What is the rule of thumb for deployment of HBase instances: should I have

> a

> HBase instance on each HDFS node?

> I assume the HBase instances should be close to the data to avoid network

> latencies, but do I need a HBase instance on each datanode?

> Is it any useful to have more HBase nodes than HDFS nodes?

>

> All the basic tutorials explain setting up HBase on local fs, and then

> explain that to setup as a cluster 'just point to HDFS' for storage, but I

> haven't found clear explanation of how all these nodes should be arranged

> together to be efficient.

>

> Thanks for the help.

> E

>

>

>

> --

> View this message in context:

> http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047.html
> Sent from the HBase User mailing list archive at Nabble.com.

>



	
	
	
	

	

	
	
		If you reply to this email, your message will be added to the discussion below:
		http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047p4074048.html
	
	
		
		To unsubscribe from HBase on HDFS: proper way to setup, click here.

		NAML
	 		 	   		  



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047p4074050.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase on HDFS: proper way to setup

Posted by Ted Yu <yu...@gmail.com>.
Whether having HBase instance on each data node depends on the amount of
data you have and access pattern you expect.

bq. Is it any useful to have more HBase nodes than HDFS nodes?

I have never seen the above setup.

Do you have an hdfs cluster already ? Can you let us know your use case ?

Cheers

On Thu, Aug 20, 2015 at 8:32 AM, MrE <el...@msn.com> wrote:

> Hello,
>
> I'm new to HBase, so pardon the stupid question.
> Hbase is meant to run on HDFS I presume, although it is not the default on
> the 'single host' setup.
>
> My question is: assuming I have a HDFS cluster setup for storage (just
> HDFS)
>
> What is the rule of thumb for deployment of HBase instances: should I have
> a
> HBase instance on each HDFS node?
> I assume the HBase instances should be close to the data to avoid network
> latencies, but do I need a HBase instance on each datanode?
> Is it any useful to have more HBase nodes than HDFS nodes?
>
> All the basic tutorials explain setting up HBase on local fs, and then
> explain that to setup as a cluster 'just point to HDFS' for storage, but I
> haven't found clear explanation of how all these nodes should be arranged
> together to be efficient.
>
> Thanks for the help.
> E
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-on-HDFS-proper-way-to-setup-tp4074047.html
> Sent from the HBase User mailing list archive at Nabble.com.
>