You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ronen Itkin <ro...@taykey.com> on 2011/09/13 12:16:19 UTC

HBase best practice and Regions confusion

Hi all,
How are you?

I am new to hbase and all I have been doing for the past week is reading
information exists on the web.
My goal is to master HBase from the System Administrator point of view and
to setup an Cloudera HBase cluster relying on HDFS storage (Production on
AMAZON Web Services).
Hadoop HDFS is running over EC2 Large instances (2 Processing Units, 7.5G
ram * 3 data nodes).
I am about to have 4 tables in HBase, and I was wondering what is the best
practice for my situation?
How many HRegionServers should I use? will large AMAZON EC2 instances will
be enough?

I have another confusion regarding -ROOT and .META. regions, and regarding
the process of a client approaching HBase.
Where do these regions are being stored? How do they structured (rows,
columns)?
A client first approaches the Zoo Keeper and asks for the -ROOT region
location? what happens next?
Please elaborate as much as you can.

Thanks and Best Regards,
*Ronen Itkin*

<http://www.taykey.com/>

RE: HBase best practice and Regions confusion

Posted by "Buttler, David" <bu...@llnl.gov>.
If you do that then all data access will be over the network.  Amazon's internal network is very busy and you might see a lot of delays in processing data.  This would be partially alleviated if you could run enough region servers to keep your entire table in memory in the block cache -- but that is not a typical scenario, and will not help at all with writes (As they must be flushed to disk (in the WAL) before writes complete.

Dave

-----Original Message-----
From: Ronen Itkin [mailto:ronen@taykey.com] 
Sent: Tuesday, September 13, 2011 8:24 AM
To: user@hbase.apache.org
Subject: Re: HBase best practice and Regions confusion

Hi,

Thanks for the answer!
Another question is what should I take into account if I'll decide to run
HRegionServers on separated servers and not on the hdfs datanodes??

Thanks!



On Tue, Sep 13, 2011 at 3:26 PM, Doug Meil <do...@explorysmedical.com>wrote:

>
> Hi there-
>
> Regarding EC2, see this in the Hbase book...
>
> http://hbase.apache.org/book.html#trouble.ec2
>
> Regarding ROOT/META, see this in the Hbase book
>
> http://hbase.apache.org/book.html#arch.catalog
>
>
>
>
>
>
> On 9/13/11 6:16 AM, "Ronen Itkin" <ro...@taykey.com> wrote:
>
> >Hi all,
> >How are you?
> >
> >I am new to hbase and all I have been doing for the past week is reading
> >information exists on the web.
> >My goal is to master HBase from the System Administrator point of view and
> >to setup an Cloudera HBase cluster relying on HDFS storage (Production on
> >AMAZON Web Services).
> >Hadoop HDFS is running over EC2 Large instances (2 Processing Units, 7.5G
> >ram * 3 data nodes).
> >I am about to have 4 tables in HBase, and I was wondering what is the best
> >practice for my situation?
> >How many HRegionServers should I use? will large AMAZON EC2 instances will
> >be enough?
> >
> >I have another confusion regarding -ROOT and .META. regions, and regarding
> >the process of a client approaching HBase.
> >Where do these regions are being stored? How do they structured (rows,
> >columns)?
> >A client first approaches the Zoo Keeper and asks for the -ROOT region
> >location? what happens next?
> >Please elaborate as much as you can.
> >
> >Thanks and Best Regards,
> >*Ronen Itkin*
> >
> ><http://www.taykey.com/>
>
>


-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Re: HBase best practice and Regions confusion

Posted by Ronen Itkin <ro...@taykey.com>.
Hi,

Thanks for the answer!
Another question is what should I take into account if I'll decide to run
HRegionServers on separated servers and not on the hdfs datanodes??

Thanks!



On Tue, Sep 13, 2011 at 3:26 PM, Doug Meil <do...@explorysmedical.com>wrote:

>
> Hi there-
>
> Regarding EC2, see this in the Hbase book...
>
> http://hbase.apache.org/book.html#trouble.ec2
>
> Regarding ROOT/META, see this in the Hbase book
>
> http://hbase.apache.org/book.html#arch.catalog
>
>
>
>
>
>
> On 9/13/11 6:16 AM, "Ronen Itkin" <ro...@taykey.com> wrote:
>
> >Hi all,
> >How are you?
> >
> >I am new to hbase and all I have been doing for the past week is reading
> >information exists on the web.
> >My goal is to master HBase from the System Administrator point of view and
> >to setup an Cloudera HBase cluster relying on HDFS storage (Production on
> >AMAZON Web Services).
> >Hadoop HDFS is running over EC2 Large instances (2 Processing Units, 7.5G
> >ram * 3 data nodes).
> >I am about to have 4 tables in HBase, and I was wondering what is the best
> >practice for my situation?
> >How many HRegionServers should I use? will large AMAZON EC2 instances will
> >be enough?
> >
> >I have another confusion regarding -ROOT and .META. regions, and regarding
> >the process of a client approaching HBase.
> >Where do these regions are being stored? How do they structured (rows,
> >columns)?
> >A client first approaches the Zoo Keeper and asks for the -ROOT region
> >location? what happens next?
> >Please elaborate as much as you can.
> >
> >Thanks and Best Regards,
> >*Ronen Itkin*
> >
> ><http://www.taykey.com/>
>
>


-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Re: HBase best practice and Regions confusion

Posted by Eric Charles <er...@gmail.com>.
On 13/09/11 05:26, Doug Meil wrote:
>
> Hi there-
>
> Regarding EC2, see this in the Hbase book...
>
> http://hbase.apache.org/book.html#trouble.ec2
>

btw, There's also the whirr project (http://whirr.apache.org/) that 
allows to deploy hbase on amazon without trouble.

I can submit a patch if it makes sense to add a section in the book for 
this?

> Regarding ROOT/META, see this in the Hbase book
>
> http://hbase.apache.org/book.html#arch.catalog
>
http://ofps.oreilly.com/titles/9781449396107/adminapi.html could also help.

 From my understanding, -ROOT- and .META. are system tables, although 
they are persisted just like any other user table (via 
store/memstore/hfile). You can even update them at your own risk.

Client will go via zookeeper to find -ROOT- and will use -ROOT- to find 
the location of the adhoc region of .META. Finally, .META. is used to 
find the location of the user space region of the target table.

So .META. can span multiple regions. It's foreseen in the process. What 
I'm not sure, is if -ROOT- can span multiple regions? (still have to 
look in code) If this is the case, zookeeper should have multiple 
entries. I guess the expected size of -ROOT- is not so high, so it can 
reside in main cases in one region?

Thx.

>
>
>
>
>
> On 9/13/11 6:16 AM, "Ronen Itkin"<ro...@taykey.com>  wrote:
>
>> Hi all,
>> How are you?
>>
>> I am new to hbase and all I have been doing for the past week is reading
>> information exists on the web.
>> My goal is to master HBase from the System Administrator point of view and
>> to setup an Cloudera HBase cluster relying on HDFS storage (Production on
>> AMAZON Web Services).
>> Hadoop HDFS is running over EC2 Large instances (2 Processing Units, 7.5G
>> ram * 3 data nodes).
>> I am about to have 4 tables in HBase, and I was wondering what is the best
>> practice for my situation?
>> How many HRegionServers should I use? will large AMAZON EC2 instances will
>> be enough?
>>
>> I have another confusion regarding -ROOT and .META. regions, and regarding
>> the process of a client approaching HBase.
>> Where do these regions are being stored? How do they structured (rows,
>> columns)?
>> A client first approaches the Zoo Keeper and asks for the -ROOT region
>> location? what happens next?
>> Please elaborate as much as you can.
>>
>> Thanks and Best Regards,
>> *Ronen Itkin*
>>
>> <http://www.taykey.com/>
>

-- 
Eric
http://about.echarles.net

Re: HBase best practice and Regions confusion

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

Regarding EC2, see this in the Hbase book...

http://hbase.apache.org/book.html#trouble.ec2

Regarding ROOT/META, see this in the Hbase book

http://hbase.apache.org/book.html#arch.catalog






On 9/13/11 6:16 AM, "Ronen Itkin" <ro...@taykey.com> wrote:

>Hi all,
>How are you?
>
>I am new to hbase and all I have been doing for the past week is reading
>information exists on the web.
>My goal is to master HBase from the System Administrator point of view and
>to setup an Cloudera HBase cluster relying on HDFS storage (Production on
>AMAZON Web Services).
>Hadoop HDFS is running over EC2 Large instances (2 Processing Units, 7.5G
>ram * 3 data nodes).
>I am about to have 4 tables in HBase, and I was wondering what is the best
>practice for my situation?
>How many HRegionServers should I use? will large AMAZON EC2 instances will
>be enough?
>
>I have another confusion regarding -ROOT and .META. regions, and regarding
>the process of a client approaching HBase.
>Where do these regions are being stored? How do they structured (rows,
>columns)?
>A client first approaches the Zoo Keeper and asks for the -ROOT region
>location? what happens next?
>Please elaborate as much as you can.
>
>Thanks and Best Regards,
>*Ronen Itkin*
>
><http://www.taykey.com/>