You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2009/04/09 08:02:49 UTC

How to check the distributed degree of table?

I made a table with 10,000 rows. BTW, It seems stored on a single node
at this time, and requires some time for distributing. Is it right?
Then, I would like to know whether distributed to each node after bulk
importing the data. Is there a detecting tool?

....
09/04/09 14:40:20 INFO mapred.JobClient:     Map output records=100000
09/04/09 14:40:20 INFO mapred.JobClient:     Reduce input records=100000
09/04/09 14:40:20 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/04/09 14:40:20 WARN mapred.JobClient: Use genericOptions for the
option -libjars
09/04/09 14:40:20 WARN mapred.JobClient: No job jar file set.  User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
0->d8g051.nhncorp.com:,00000000000,19,12-7842
09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
1->d8g051.nhncorp.com:00000000000,19,12-7842,000000000000,15,3-6060
09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
2->d8g051.nhncorp.com:000000000000,15,3-6060,000000000000,5,10-2207
09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
3->d8g051.nhncorp.com:000000000000,5,10-2207,0000000000000,3,5-1310
09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
4->d8g051.nhncorp.com:0000000000000,3,5-1310,
09/04/09 14:40:20 INFO mapred.JobClient: Running job: job_200904081716_0020


-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: How to check the distributed degree of table?

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Thanks for nice information.

On Fri, Apr 10, 2009 at 10:31 AM, Ryan Rawson <ry...@gmail.com> wrote:
> Hey,
>
> In HBase, each table is split into regions.  Each region is a contiguous set
> of keys.  Once any specific region has a file that exceeds 256 MB, it is
> split in half to 2 regions. HBase master generally spreads these regions
> across all the regionservers.  I have not had problems with hot
> regionservers with too many regions.
>
> When a table starts out, it has 1 region.  If you dont hit the 256MB limit,
> then you won't split, and thus wont have more than 1 region to distribute
> across tables.
>
> One way to get a handle on how many regions a table might have is querying
> .META. - if you search for the key 'table_name,,' and for the column
> 'info:regioninfo' you can get a maximal bound (but not exact count) of how
> many regions your table has.  To get the exact count, you have to parse the
> info:regioninfo and detect the split parent regions that don't really
> 'exist' but are kept around for garbage collection later.
>
> Just FYI, with 400m rows I get about 200-300 regions given a value size of
> about 30 bytes.  10,000 rows may not have been enough to trigger a split if
> the values are small.
>
> On Thu, Apr 9, 2009 at 4:48 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Can it be provided to clients as a API?
>>
>> I need to consecutively run two MR jobs and I would like to add some
>> precondition checks in between since I noticed that second job always
>> run on a single node.
>>
>> On Thu, Apr 9, 2009 at 8:07 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > I usually look in the web UI, clicking on the name of the table in the
>> > master page, to see if it's well distributed.
>> >
>> > Or are you thinking about a shell tool that tells you the level of
>> > distribution of a table?
>> >
>> > J-D
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: How to check the distributed degree of table?

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

In HBase, each table is split into regions.  Each region is a contiguous set
of keys.  Once any specific region has a file that exceeds 256 MB, it is
split in half to 2 regions. HBase master generally spreads these regions
across all the regionservers.  I have not had problems with hot
regionservers with too many regions.

When a table starts out, it has 1 region.  If you dont hit the 256MB limit,
then you won't split, and thus wont have more than 1 region to distribute
across tables.

One way to get a handle on how many regions a table might have is querying
.META. - if you search for the key 'table_name,,' and for the column
'info:regioninfo' you can get a maximal bound (but not exact count) of how
many regions your table has.  To get the exact count, you have to parse the
info:regioninfo and detect the split parent regions that don't really
'exist' but are kept around for garbage collection later.

Just FYI, with 400m rows I get about 200-300 regions given a value size of
about 30 bytes.  10,000 rows may not have been enough to trigger a split if
the values are small.

On Thu, Apr 9, 2009 at 4:48 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Can it be provided to clients as a API?
>
> I need to consecutively run two MR jobs and I would like to add some
> precondition checks in between since I noticed that second job always
> run on a single node.
>
> On Thu, Apr 9, 2009 at 8:07 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > I usually look in the web UI, clicking on the name of the table in the
> > master page, to see if it's well distributed.
> >
> > Or are you thinking about a shell tool that tells you the level of
> > distribution of a table?
> >
> > J-D
>

Re: How to check the distributed degree of table?

Posted by "Edward J. Yoon" <ed...@apache.org>.
Can it be provided to clients as a API?

I need to consecutively run two MR jobs and I would like to add some
precondition checks in between since I noticed that second job always
run on a single node.

On Thu, Apr 9, 2009 at 8:07 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> I usually look in the web UI, clicking on the name of the table in the
> master page, to see if it's well distributed.
>
> Or are you thinking about a shell tool that tells you the level of
> distribution of a table?
>
> J-D
>
> On Thu, Apr 9, 2009 at 2:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> I made a table with 10,000 rows. BTW, It seems stored on a single node
>> at this time, and requires some time for distributing. Is it right?
>> Then, I would like to know whether distributed to each node after bulk
>> importing the data. Is there a detecting tool?
>>
>> ....
>> 09/04/09 14:40:20 INFO mapred.JobClient:     Map output records=100000
>> 09/04/09 14:40:20 INFO mapred.JobClient:     Reduce input records=100000
>> 09/04/09 14:40:20 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 09/04/09 14:40:20 WARN mapred.JobClient: Use genericOptions for the
>> option -libjars
>> 09/04/09 14:40:20 WARN mapred.JobClient: No job jar file set.  User
>> classes may not be found. See JobConf(Class) or
>> JobConf#setJar(String).
>> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
>> 0->d8g051.nhncorp.com:,00000000000,19,12-7842
>> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
>> 1->d8g051.nhncorp.com:00000000000,19,12-7842,000000000000,15,3-6060
>> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
>> 2->d8g051.nhncorp.com:000000000000,15,3-6060,000000000000,5,10-2207
>> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
>> 3->d8g051.nhncorp.com:000000000000,5,10-2207,0000000000000,3,5-1310
>> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
>> 4->d8g051.nhncorp.com:0000000000000,3,5-1310,
>> 09/04/09 14:40:20 INFO mapred.JobClient: Running job: job_200904081716_0020
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: How to check the distributed degree of table?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I usually look in the web UI, clicking on the name of the table in the
master page, to see if it's well distributed.

Or are you thinking about a shell tool that tells you the level of
distribution of a table?

J-D

On Thu, Apr 9, 2009 at 2:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> I made a table with 10,000 rows. BTW, It seems stored on a single node
> at this time, and requires some time for distributing. Is it right?
> Then, I would like to know whether distributed to each node after bulk
> importing the data. Is there a detecting tool?
>
> ....
> 09/04/09 14:40:20 INFO mapred.JobClient:     Map output records=100000
> 09/04/09 14:40:20 INFO mapred.JobClient:     Reduce input records=100000
> 09/04/09 14:40:20 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 09/04/09 14:40:20 WARN mapred.JobClient: Use genericOptions for the
> option -libjars
> 09/04/09 14:40:20 WARN mapred.JobClient: No job jar file set.  User
> classes may not be found. See JobConf(Class) or
> JobConf#setJar(String).
> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
> 0->d8g051.nhncorp.com:,00000000000,19,12-7842
> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
> 1->d8g051.nhncorp.com:00000000000,19,12-7842,000000000000,15,3-6060
> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
> 2->d8g051.nhncorp.com:000000000000,15,3-6060,000000000000,5,10-2207
> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
> 3->d8g051.nhncorp.com:000000000000,5,10-2207,0000000000000,3,5-1310
> 09/04/09 14:40:20 INFO mapred.HTableInputFormatBase: split:
> 4->d8g051.nhncorp.com:0000000000000,3,5-1310,
> 09/04/09 14:40:20 INFO mapred.JobClient: Running job: job_200904081716_0020
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>