You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Xin Jing <xi...@microsoft.com> on 2008/07/22 08:28:07 UTC

Some question about HBase

Hi,

I am a new user of HBase, I am curious about the inert process of HBase. Could you please explain it in details?

The question is: when I created a table (only one column, to make it easy to describe), and insert a huge amount of data into the table. I know it is a B-Tree like storage structure, what is the mechanism to build the table?

1.         When the table size is over a threshold, how to split it?

2.         When inserting data into the table, is all the data is in memory? If not, how to make sure the performance is good enough?

3.         When all the data has been inserted into the table, there must be a lot of files. And the files size may differ at some extend (some file is several M, while some may be several hundred M), do I need to make the file size similar and how?

Thanks
-Xin

Re: Some question about HBase

Posted by Jean-Daniel Cryans <jd...@gmail.com>.

Xin,

Comments inline.

Regards,

J-D

On Tue, Jul 22, 2008 at 2:28 AM, Xin Jing <xi...@microsoft.com> wrote:

> Hi,
>
> I am a new user of HBase, I am curious about the inert process of HBase.
> Could you please explain it in details?
>
> The question is: when I created a table (only one column, to make it easy
> to describe), and insert a huge amount of data into the table. I know it is
> a B-Tree like storage structure, what is the mechanism to build the table?
>
> 1.         When the table size is over a threshold, how to split it?

Each table is divided into regions which are distributed among the region
servers (nodes) and each region splits when growing larger than a configured
size. This is described here:
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion

>
>
> 2.         When inserting data into the table, is all the data is in
> memory? If not, how to make sure the performance is good enough?

Also described in the link above.

>
>
> 3.         When all the data has been inserted into the table, there must
> be a lot of files. And the files size may differ at some extend (some file
> is several M, while some may be several hundred M), do I need to make the
> file size similar and how?

This is also described in the link above.

>
>
> Thanks
> -Xin
>
>