You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vidhyashankar Venkataraman <vi...@yahoo-inc.com> on 2010/08/28 02:01:51 UTC

Initial region loads in hbase..

Hi guys,
  A couple of days back, I had posted a problem on regions taking too much time to load when I restart Hbase.. I have a table that has around 80 K regions on 650 nodes (!) ..
  I was checking the logs in the master and I notice that the time it takes from assigning a region to a region server to the point when it recognizes that the region is open in that server takes around 20-30 minutes!
   Apart from master being the bottleneck here, can you guys let me know what the other possible cases are as to why this may happen?

Thank you
Vidhya

Below is an example for region with start key 0000003404803994 where the assignment takes place at 22:59 while the confirmation that it got open came at 23:19...

2010-08-27 22:59:02,642 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. is not valid;  serverAddress=, startCode=0 unknown.
2010-08-27 22:59:02,643 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeCreated with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeCreated with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeCreated state:SyncConnected path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.hbase.master.HMaster>Created ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created/updated UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeChildrenChanged with state SyncConnected with path /hbase/UNASSIGNED
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeChildrenChanged with path /hbase/UNASSIGNED
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeChildrenChanged state:SyncConnected path:/hbase/UNASSIGNED
2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
2010-08-27 22:59:02,647 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. to b3130020.yst.yahoo.net,60020,1282940735627
2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists in state = M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE




AND THEN,

2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. open on b3130020.yst.yahoo.net,60020,1282940735627
2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in region .META.,,1 with startcode=1282940735627, server=b3130020.yst.yahoo.net:60020
2010-08-27 23:18:53,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as region is open...



Re: Initial region loads in hbase..

Posted by Stack <st...@duboce.net>.
Is the master working hard?  If so, maybe make this number bigger:

  <property>
    <name>hbase.regionserver.msginterval</name>
    <value>1000</value>
    <description>Interval between messages from the RegionServer to HMaster
    in milliseconds. Use a high value like 3000 for clusters with more than 10
    nodes. Default is 1 second so that HBase seems more 'live'.
    </description>
  </property>

All regionservers check in every second by default.  When they check
in during startup, they'll report status on the regions that are
currently being opened... maybe this is flooding the master?

For your bigger cluster you should probably up the above number anyways?

Are the regions being assigned out slowly?  If you tail regionserver
log, it'll report OPENING messages... are regions taking a long time
to move to OPEN?

St.Ack



On Fri, Aug 27, 2010 at 5:35 PM, Stack <st...@duboce.net> wrote:
> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>  The server has lots of regions to open and its taking time?  Is the
> meta table being beat up?  Could this be holding up region opens?
>
> Good luck V,
> St.Ack
>
>
> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
>> Hi guys,
>>  A couple of days back, I had posted a problem on regions taking too much time to load when I restart Hbase.. I have a table that has around 80 K regions on 650 nodes (!) ..
>>  I was checking the logs in the master and I notice that the time it takes from assigning a region to a region server to the point when it recognizes that the region is open in that server takes around 20-30 minutes!
>>   Apart from master being the bottleneck here, can you guys let me know what the other possible cases are as to why this may happen?
>>
>> Thank you
>> Vidhya
>>
>> Below is an example for region with start key 0000003404803994 where the assignment takes place at 22:59 while the confirmation that it got open came at 23:19...
>>
>> 2010-08-27 22:59:02,642 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. is not valid;  serverAddress=, startCode=0 unknown.
>> 2010-08-27 22:59:02,643 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeCreated with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeCreated with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeCreated state:SyncConnected path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.hbase.master.HMaster>Created ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created/updated UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
>> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeChildrenChanged with state SyncConnected with path /hbase/UNASSIGNED
>> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeChildrenChanged with path /hbase/UNASSIGNED
>> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeChildrenChanged state:SyncConnected path:/hbase/UNASSIGNED
>> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
>> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>> 2010-08-27 22:59:02,647 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. to b3130020.yst.yahoo.net,60020,1282940735627
>> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists in state = M2ZK_REGION_OFFLINE
>> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
>>
>>
>>
>>
>> AND THEN,
>>
>> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. open on b3130020.yst.yahoo.net,60020,1282940735627
>> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in region .META.,,1 with startcode=1282940735627, server=b3130020.yst.yahoo.net:60020
>> 2010-08-27 23:18:53,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as region is open...
>>
>>
>>
>

Re: Initial region loads in hbase..

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 30, 2010 at 1:40 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Vidhya,
>
> Thanks for keeping it up :)
>
> If you trace all the references to 0000003404803994 in the region
> server, what do you see? I think that most of the time is spent
> opening the region sequentially, it'd be nice to confirm.
>
> J-D
>
> On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
>> Thank you for the responses guys.. I will be looking at the problem later this week, I guess. The initial eyeballing suggested that the RS's respond to the master's open very quickly.. And I saw our rrd stats (which is similar to ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I haven't explored enough :) I will be coming back to the problem soon and will let you know about my findings..
>>
>>  I have been running scans and planning to run incremental loads on the db and get some useful numbers.. Scans are blazing fast with just one storefile per region: 75-95 MBps per node! (this is if locality between RS and data is kind of preserved: i.e., after a major compact is issued)..
>>
>> I will update on my status soon..
>> Thank you again
>> Vidhya
>>
>> On 8/27/10 5:42 PM, "Jonathan Gray" <jg...@facebook.com> wrote:
>>
>> Vidhya,
>>
>> Could you post a snippet of an RS log during this time?  You should be able to see what's happening between when the OPEN message gets there and the OPEN completes.
>>
>> Like Stack said, it's probably that its single-threaded in the version you're using and with all the file opening, your NN and DNs are under heavy load.  Do you see io-wait or anything else jump up across the cluster at this time?  You have ganglia setup on this bad boy?
>>
>> JG
>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>> Stack
>>> Sent: Friday, August 27, 2010 5:36 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Initial region loads in hbase..
>>>
>>> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>>>  The server has lots of regions to open and its taking time?  Is the
>>> meta table being beat up?  Could this be holding up region opens?
>>>
>>> Good luck V,
>>> St.Ack
>>>
>>>
>>> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
>>> <vi...@yahoo-inc.com> wrote:
>>> > Hi guys,
>>> >  A couple of days back, I had posted a problem on regions taking too
>>> much time to load when I restart Hbase.. I have a table that has around
>>> 80 K regions on 650 nodes (!) ..
>>> >  I was checking the logs in the master and I notice that the time it
>>> takes from assigning a region to a region server to the point when it
>>> recognizes that the region is open in that server takes around 20-30
>>> minutes!
>>> >   Apart from master being the bottleneck here, can you guys let me
>>> know what the other possible cases are as to why this may happen?
>>> >
>>> > Thank you
>>> > Vidhya
>>> >
>>> > Below is an example for region with start key 0000003404803994 where
>>> the assignment takes place at 22:59 while the confirmation that it got
>>> open came at 23:19...
>>> >
>>> > 2010-08-27 22:59:02,642 DEBUG
>>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> is not valid;  serverAddress=, startCode=0 unknown.
>>> > 2010-08-27 22:59:02,643 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
>>> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeCreated with state SyncConnected with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeCreated with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeCreated state:SyncConnected
>>> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
>>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
>>> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
>>> base.master.HMaster>Created ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>>> UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>>> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeChildrenChanged with state SyncConnected with path
>>> /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeChildrenChanged with path /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeChildrenChanged state:SyncConnected
>>> path:/hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning for
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
>>> nregions to assign=1, regions to give other servers than this=0,
>>> isMetaAssign=false
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>>> > 2010-08-27 22:59:02,647 INFO
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning region
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> to b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
>>> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
>>> in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> >
>>> >
>>> >
>>> >
>>> > AND THEN,
>>> >
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation:
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> open on b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in region .META.,,1 with startcode=1282940735627,
>>> server=b3130020.yst.yahoo.net:60020
>>> > 2010-08-27 23:18:53,677 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
>>> region is open...
>>> >
>>> >
>>> >
>>
>>
>

Just an FYI. hadoop-cacti-jtg
(http://www.jointhegrid.com/hadoop/index.jsp) had excellent graph
coverage for all the jmx attributes in hbase-20.3.

http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_region_blockcache.jsp
http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_master.jsp

Has their been any major changes/additions to hbase jmx recently?

Edward

Re: Initial region loads in hbase..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
>> If you trace all the references to 0000003404803994 in the region
>> server, what do you see? I think that most of the time is spent
>> opening the region sequentially, it'd be nice to confirm.

I have attached the log snippets on the master-RS-master conversation..
Vidhya

In the master:
2010-08-27 22:59:02,642 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d3
25932ff1. is not valid;  serverAddress=, startCode=0 unknown.2010-08-27 22:59:02,643 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK
_REGION_OFFLINE2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeCreated with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20
b9a239d325932ff1
2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeCreated with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239
d325932ff12010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeCreated state:SyncConnected path:/hbase/U
NASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff12010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130
640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.hbase.master.HMaster>Created ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created/updated UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeChildrenChanged with state SyncConnected with path /hbase/UNASSIGNED2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeChildrenChanged with path /hbase/UNASSIGNED
2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeChildrenChanged state:SyncConnected path:/hbase/UNASSIGNED
2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
2010-08-27 22:59:02,647 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. to b3130020.yst.yahoo.net,60020,1282940735627
2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists in state = M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
2010-08-27 22:59:02,657 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeDataChanged with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1





In the region server:
2010-08-27 22:59:02,658 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
2010-08-27 22:59:02,658 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
2010-08-27 22:59:02,659 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 with [RS2ZK_REGION_OPENING] expected version = 0
2010-08-27 22:59:02,660 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
2010-08-27 22:59:02,660 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Creating region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
2010-08-27 22:59:02,718 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://b3130080.yst.yahoo.net:4600/hbase/DocDB/73c0f8fdb8ffbc20b9a239d325932ff1/bigColumn/7281813866313705842, isReference=false, isBulkLoadResult=true, bulkLoadTS=1282903668546, majorCompaction=true
2010-08-27 22:59:02,720 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.; next sequenceid=1
2010-08-27 22:59:02,720 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 with [RS2ZK_REGION_OPENED] expected version = 1

(no more references in the region server..)



And finally, in the master:
2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. open on b3130020.yst.yahoo.net,60020,1282940735627
2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d32
5932ff1. in region .META.,,1 with startcode=1282940735627, server=b3130020.yst.yahoo.net:60020


Re: Initial region loads in hbase..

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Vidhya,

Thanks for keeping it up :)

If you trace all the references to 0000003404803994 in the region
server, what do you see? I think that most of the time is spent
opening the region sequentially, it'd be nice to confirm.

J-D

On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> Thank you for the responses guys.. I will be looking at the problem later this week, I guess. The initial eyeballing suggested that the RS's respond to the master's open very quickly.. And I saw our rrd stats (which is similar to ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I haven't explored enough :) I will be coming back to the problem soon and will let you know about my findings..
>
>  I have been running scans and planning to run incremental loads on the db and get some useful numbers.. Scans are blazing fast with just one storefile per region: 75-95 MBps per node! (this is if locality between RS and data is kind of preserved: i.e., after a major compact is issued)..
>
> I will update on my status soon..
> Thank you again
> Vidhya
>
> On 8/27/10 5:42 PM, "Jonathan Gray" <jg...@facebook.com> wrote:
>
> Vidhya,
>
> Could you post a snippet of an RS log during this time?  You should be able to see what's happening between when the OPEN message gets there and the OPEN completes.
>
> Like Stack said, it's probably that its single-threaded in the version you're using and with all the file opening, your NN and DNs are under heavy load.  Do you see io-wait or anything else jump up across the cluster at this time?  You have ganglia setup on this bad boy?
>
> JG
>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>> Stack
>> Sent: Friday, August 27, 2010 5:36 PM
>> To: user@hbase.apache.org
>> Subject: Re: Initial region loads in hbase..
>>
>> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>>  The server has lots of regions to open and its taking time?  Is the
>> meta table being beat up?  Could this be holding up region opens?
>>
>> Good luck V,
>> St.Ack
>>
>>
>> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
>> <vi...@yahoo-inc.com> wrote:
>> > Hi guys,
>> >  A couple of days back, I had posted a problem on regions taking too
>> much time to load when I restart Hbase.. I have a table that has around
>> 80 K regions on 650 nodes (!) ..
>> >  I was checking the logs in the master and I notice that the time it
>> takes from assigning a region to a region server to the point when it
>> recognizes that the region is open in that server takes around 20-30
>> minutes!
>> >   Apart from master being the bottleneck here, can you guys let me
>> know what the other possible cases are as to why this may happen?
>> >
>> > Thank you
>> > Vidhya
>> >
>> > Below is an example for region with start key 0000003404803994 where
>> the assignment takes place at 22:59 while the confirmation that it got
>> open came at 23:19...
>> >
>> > 2010-08-27 22:59:02,642 DEBUG
>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> is not valid;  serverAddress=, startCode=0 unknown.
>> > 2010-08-27 22:59:02,643 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
>> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeCreated with state SyncConnected with path
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeCreated with path
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeCreated state:SyncConnected
>> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
>> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
>> base.master.HMaster>Created ZNode
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>> UNASSIGNED zNode
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in state M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeChildrenChanged with state SyncConnected with path
>> /hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeChildrenChanged with path /hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeChildrenChanged state:SyncConnected
>> path:/hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,647 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Assigning for
>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
>> nregions to assign=1, regions to give other servers than this=0,
>> isMetaAssign=false
>> > 2010-08-27 22:59:02,647 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Assigning
>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>> > 2010-08-27 22:59:02,647 INFO
>> org.apache.hadoop.hbase.master.RegionManager: Assigning region
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> to b3130020.yst.yahoo.net,60020,1282940735627
>> > 2010-08-27 22:59:02,653 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
>> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
>> in state = M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,653 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in state M2ZK_REGION_OFFLINE
>> >
>> >
>> >
>> >
>> > AND THEN,
>> >
>> > 2010-08-27 23:18:53,591 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation:
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> open on b3130020.yst.yahoo.net,60020,1282940735627
>> > 2010-08-27 23:18:53,591 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in region .META.,,1 with startcode=1282940735627,
>> server=b3130020.yst.yahoo.net:60020
>> > 2010-08-27 23:18:53,677 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
>> region is open...
>> >
>> >
>> >
>
>

Re: Initial region loads in hbase..

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Tim,

The bulk of the documentation is here:
http://hbase.apache.org/docs/r0.89.20100726/metrics.html

HBase uses Hadoop Metrics in order to publish its metrics, which in
turn can be fed to Ganglia.

J-D

On Mon, Aug 30, 2010 at 10:25 AM, Time Less <ti...@gmail.com> wrote:
>> Like Stack said, it's probably that its single-threaded in the version
>> you're using and with all the file opening, your NN and DNs are under heavy
>> load.  Do you see io-wait or anything else jump up across the cluster at
>> this time?  You have ganglia setup on this bad boy?
>>
>
> I did a search for Ganglia monitoring of HBase and couldn't turn up anything
> useful. Is there a URL for HBase plugins for Ganglia you coul share?
>
> --
> timeless(ness)
>

Re: Initial region loads in hbase..

Posted by Time Less <ti...@gmail.com>.
> Like Stack said, it's probably that its single-threaded in the version
> you're using and with all the file opening, your NN and DNs are under heavy
> load.  Do you see io-wait or anything else jump up across the cluster at
> this time?  You have ganglia setup on this bad boy?
>

I did a search for Ganglia monitoring of HBase and couldn't turn up anything
useful. Is there a URL for HBase plugins for Ganglia you coul share?

-- 
timeless(ness)

Re: Initial region loads in hbase..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
Thank you for the responses guys.. I will be looking at the problem later this week, I guess. The initial eyeballing suggested that the RS's respond to the master's open very quickly.. And I saw our rrd stats (which is similar to ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I haven't explored enough :) I will be coming back to the problem soon and will let you know about my findings..

 I have been running scans and planning to run incremental loads on the db and get some useful numbers.. Scans are blazing fast with just one storefile per region: 75-95 MBps per node! (this is if locality between RS and data is kind of preserved: i.e., after a major compact is issued)..

I will update on my status soon..
Thank you again
Vidhya

On 8/27/10 5:42 PM, "Jonathan Gray" <jg...@facebook.com> wrote:

Vidhya,

Could you post a snippet of an RS log during this time?  You should be able to see what's happening between when the OPEN message gets there and the OPEN completes.

Like Stack said, it's probably that its single-threaded in the version you're using and with all the file opening, your NN and DNs are under heavy load.  Do you see io-wait or anything else jump up across the cluster at this time?  You have ganglia setup on this bad boy?

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Friday, August 27, 2010 5:36 PM
> To: user@hbase.apache.org
> Subject: Re: Initial region loads in hbase..
>
> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>  The server has lots of regions to open and its taking time?  Is the
> meta table being beat up?  Could this be holding up region opens?
>
> Good luck V,
> St.Ack
>
>
> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
> > Hi guys,
> >  A couple of days back, I had posted a problem on regions taking too
> much time to load when I restart Hbase.. I have a table that has around
> 80 K regions on 650 nodes (!) ..
> >  I was checking the logs in the master and I notice that the time it
> takes from assigning a region to a region server to the point when it
> recognizes that the region is open in that server takes around 20-30
> minutes!
> >   Apart from master being the bottleneck here, can you guys let me
> know what the other possible cases are as to why this may happen?
> >
> > Thank you
> > Vidhya
> >
> > Below is an example for region with start key 0000003404803994 where
> the assignment takes place at 22:59 while the confirmation that it got
> open came at 23:19...
> >
> > 2010-08-27 22:59:02,642 DEBUG
> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> is not valid;  serverAddress=, startCode=0 unknown.
> > 2010-08-27 22:59:02,643 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeCreated with state SyncConnected with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeCreated with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeCreated state:SyncConnected
> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
> base.master.HMaster>Created ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created/updated
> UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeChildrenChanged with state SyncConnected with path
> /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeChildrenChanged with path /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeChildrenChanged state:SyncConnected
> path:/hbase/UNASSIGNED
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning for
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
> nregions to assign=1, regions to give other servers than this=0,
> isMetaAssign=false
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> > 2010-08-27 22:59:02,647 INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> to b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
> in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> >
> >
> >
> >
> > AND THEN,
> >
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> open on b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in region .META.,,1 with startcode=1282940735627,
> server=b3130020.yst.yahoo.net:60020
> > 2010-08-27 23:18:53,677 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
> region is open...
> >
> >
> >


RE: Initial region loads in hbase..

Posted by Jonathan Gray <jg...@facebook.com>.
Vidhya,

Could you post a snippet of an RS log during this time?  You should be able to see what's happening between when the OPEN message gets there and the OPEN completes.

Like Stack said, it's probably that its single-threaded in the version you're using and with all the file opening, your NN and DNs are under heavy load.  Do you see io-wait or anything else jump up across the cluster at this time?  You have ganglia setup on this bad boy?

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Friday, August 27, 2010 5:36 PM
> To: user@hbase.apache.org
> Subject: Re: Initial region loads in hbase..
> 
> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>  The server has lots of regions to open and its taking time?  Is the
> meta table being beat up?  Could this be holding up region opens?
> 
> Good luck V,
> St.Ack
> 
> 
> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
> > Hi guys,
> >  A couple of days back, I had posted a problem on regions taking too
> much time to load when I restart Hbase.. I have a table that has around
> 80 K regions on 650 nodes (!) ..
> >  I was checking the logs in the master and I notice that the time it
> takes from assigning a region to a region server to the point when it
> recognizes that the region is open in that server takes around 20-30
> minutes!
> >   Apart from master being the bottleneck here, can you guys let me
> know what the other possible cases are as to why this may happen?
> >
> > Thank you
> > Vidhya
> >
> > Below is an example for region with start key 0000003404803994 where
> the assignment takes place at 22:59 while the confirmation that it got
> open came at 23:19...
> >
> > 2010-08-27 22:59:02,642 DEBUG
> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> is not valid;  serverAddress=, startCode=0 unknown.
> > 2010-08-27 22:59:02,643 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeCreated with state SyncConnected with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeCreated with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeCreated state:SyncConnected
> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
> base.master.HMaster>Created ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created/updated
> UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeChildrenChanged with state SyncConnected with path
> /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeChildrenChanged with path /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeChildrenChanged state:SyncConnected
> path:/hbase/UNASSIGNED
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning for
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
> nregions to assign=1, regions to give other servers than this=0,
> isMetaAssign=false
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> > 2010-08-27 22:59:02,647 INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> to b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
> in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> >
> >
> >
> >
> > AND THEN,
> >
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> open on b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in region .META.,,1 with startcode=1282940735627,
> server=b3130020.yst.yahoo.net:60020
> > 2010-08-27 23:18:53,677 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
> region is open...
> >
> >
> >

Re: Initial region loads in hbase..

Posted by Stack <st...@duboce.net>.
In 0.20, open on a regionserver is single-threaded.  Could that be it?
 The server has lots of regions to open and its taking time?  Is the
meta table being beat up?  Could this be holding up region opens?

Good luck V,
St.Ack


On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> Hi guys,
>  A couple of days back, I had posted a problem on regions taking too much time to load when I restart Hbase.. I have a table that has around 80 K regions on 650 nodes (!) ..
>  I was checking the logs in the master and I notice that the time it takes from assigning a region to a region server to the point when it recognizes that the region is open in that server takes around 20-30 minutes!
>   Apart from master being the bottleneck here, can you guys let me know what the other possible cases are as to why this may happen?
>
> Thank you
> Vidhya
>
> Below is an example for region with start key 0000003404803994 where the assignment takes place at 22:59 while the confirmation that it got open came at 23:19...
>
> 2010-08-27 22:59:02,642 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. is not valid;  serverAddress=, startCode=0 unknown.
> 2010-08-27 22:59:02,643 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeCreated with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeCreated with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeCreated state:SyncConnected path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.hbase.master.HMaster>Created ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created/updated UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeChildrenChanged with state SyncConnected with path /hbase/UNASSIGNED
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeChildrenChanged with path /hbase/UNASSIGNED
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeChildrenChanged state:SyncConnected path:/hbase/UNASSIGNED
> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> 2010-08-27 22:59:02,647 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. to b3130020.yst.yahoo.net,60020,1282940735627
> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists in state = M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
>
>
>
>
> AND THEN,
>
> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. open on b3130020.yst.yahoo.net,60020,1282940735627
> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in region .META.,,1 with startcode=1282940735627, server=b3130020.yst.yahoo.net:60020
> 2010-08-27 23:18:53,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as region is open...
>
>
>

Re: Initial region loads in hbase..

Posted by Ryan Rawson <ry...@gmail.com>.
Hi,

No one has run HBase at that scale that I know of.  Perhaps you can
poke at the problem some more and come up with a working theory and
possibly a code fix?

Good luck!
-ryan

On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> Hi guys,
>  A couple of days back, I had posted a problem on regions taking too much time to load when I restart Hbase.. I have a table that has around 80 K regions on 650 nodes (!) ..
>  I was checking the logs in the master and I notice that the time it takes from assigning a region to a region server to the point when it recognizes that the region is open in that server takes around 20-30 minutes!
>   Apart from master being the bottleneck here, can you guys let me know what the other possible cases are as to why this may happen?
>
> Thank you
> Vidhya
>
> Below is an example for region with start key 0000003404803994 where the assignment takes place at 22:59 while the confirmation that it got open came at 23:19...
>
> 2010-08-27 22:59:02,642 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. is not valid;  serverAddress=, startCode=0 unknown.
> 2010-08-27 22:59:02,643 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeCreated with state SyncConnected with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeCreated with path /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeCreated state:SyncConnected path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.hbase.master.HMaster>Created ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created/updated UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: Event NodeChildrenChanged with state SyncConnected with path /hbase/UNASSIGNED
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeChildrenChanged with path /hbase/UNASSIGNED
> 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: Got zkEvent NodeChildrenChanged state:SyncConnected path:/hbase/UNASSIGNED
> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
> 2010-08-27 22:59:02,647 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning serverName=b3130020.yst.yahoo.net,60020,1282940735627, load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> 2010-08-27 22:59:02,647 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. to b3130020.yst.yahoo.net,60020,1282940735627
> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists in state = M2ZK_REGION_OFFLINE
> 2010-08-27 22:59:02,653 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in state M2ZK_REGION_OFFLINE
>
>
>
>
> AND THEN,
>
> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. open on b3130020.yst.yahoo.net,60020,1282940735627
> 2010-08-27 23:18:53,591 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. in region .META.,,1 with startcode=1282940735627, server=b3130020.yst.yahoo.net:60020
> 2010-08-27 23:18:53,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as region is open...
>
>
>