You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zhou wenjian (JIRA)" <ji...@apache.org> on 2012/08/21 09:51:38 UTC

[jira] [Created] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Zhou wenjian created HBASE-6625:
-----------------------------------

             Summary: If we have hundreds of thousands of  regions getChildren will encouter zk exception
                 Key: HBASE-6625
                 URL: https://issues.apache.org/jira/browse/HBASE-6625
             Project: HBase
          Issue Type: Bug
            Reporter: Zhou wenjian
            Assignee: Zhou wenjian


2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
ing socket connection and attempting reconnect 
java.io.IOException: Packet len4320092 is out of range! 
        at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
        at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
        at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
wing exception 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
        at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
        at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Zhou wenjian (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438522#comment-13438522 ] 

Zhou wenjian commented on HBASE-6625:
-------------------------------------

the log appears in 90. 
zk version: 3.3.3
seems 3.4 is affected too. 

when the client read from zk, it will check the length of data default is 4M
if (len < 0 || len >= ClientCnxn.packetLen) {
            throw new IOException("Packet len" + len + " is out of range!");
        }
I think maybe we can increase the size of jute.maxbuffer when we start the cluster.
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439876#comment-13439876 ] 

Zhihong Ted Yu commented on HBASE-6625:
---------------------------------------

At the moment, merging region is an action people don't usually want to tackle.
With lowered hbase.regionserver.regionSplitLimit, some regions would grow almost without bound in size. How are we going to deal with that ?
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439928#comment-13439928 ] 

Jonathan Hsieh commented on HBASE-6625:
---------------------------------------

I've feel that we should make merging regions a robust and constantly tested feature instead of just a script.  There was some discussion about this with 0.92 becuase of HFile v2.  When we have that then we can make a long running system test to merge/split/merge/split while read/write load is going on.
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439953#comment-13439953 ] 

stack commented on HBASE-6625:
------------------------------

I like your suggestion of bounding the total Jon (and the merge suggestion).  Its simple.  Lets get it in now and then have the discussion about how to go beyond this limitation?
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439936#comment-13439936 ] 

Jonathan Hsieh commented on HBASE-6625:
---------------------------------------

If we lowered regionsplit limit to something 10x what is considered reasonable?  Max int is quite large, 100k regions is also quite large.  If you have that many regions your are "doing it wrong" or purposely trying to break hbase. 

If we have 100k 10GB regions, this means we have 1 Exabyte (10^15) of region data *per* region server.  I believe the largest hdfs clusters haven't gotten to that size yet. 

I don't see the point of allowing that to happen (even accidentially).  Setting it to something an order of mag larger than reasonable would hold us over for a year or so. :)  

                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439955#comment-13439955 ] 

Jonathan Hsieh commented on HBASE-6625:
---------------------------------------

I slightly misspoke in my previous comment -- the exception is in the assignment manager so this would be 100k regions per cluster.  So its an exabyte of data for the hbase cluster (not per region server).  Still, large enough to last for at least a year or two. :)
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6625) If we have hundreds of thousands of regions getChildren will encouter zk exception

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439863#comment-13439863 ] 

Jonathan Hsieh commented on HBASE-6625:
---------------------------------------

the jute.maxbuffer is a ZK setting.  Maybe instead we should enforce a max number of regions per region server?  

Maybe since most region should have at most 100 for 92+ and 1000 for 90 we should instead change the default of hbase.regionserver.regionSplitLimit to something like 1000 instead of MAX_INT to avoid the problem.
                
> If we have hundreds of thousands of  regions getChildren will encouter zk exception
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-6625
>                 URL: https://issues.apache.org/jira/browse/HBASE-6625
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhou wenjian
>            Assignee: Zhou wenjian
>
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x05\xB3\x06 
> g\xE8r\xBB]\x09\xCF,1336724029944.079cb2f8a375e66fa089291b82f2a03f. state=OFFLINE, ts=1336909053108 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x84\x8 
> 8$7\xB1\xC4\xFCg,1336724030660.76c07780231942231013c7feb5e5eb14. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=CreateNewTableWith100000Regions,\x08s\x89\xC 
> B\x9B\xF0\xE4\xCA\x97\xB0,1336724030660.fa38b9d8367387a64a327087cb43b3e0. state=OFFLINE, ts=1336909055089, server=dw76.kgb.sqa.cm4,60020,1336908983944 
> 2012-05-13 19:37:37,528 INFO org.apache.hadoop.hbase.master.AssignmentManager: dw76.kgb.sqa.cm4,60020,1336908983944 unassigned znodes=58464 of total=120002 
> 2012-05-13 19:37:37,758 WARN org.apache.zookeeper.ClientCnxn: Session 0x13745fc2c8d0001 for server dw51.kgb.sqa.cm4/10.232.98.51:2180, unexpected error, clos 
> ing socket connection and attempting reconnect 
> java.io.IOException: Packet len4320092 is out of range! 
>         at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:710) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:869) 
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) 
> 2012-05-13 19:37:37,860 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x13745fc2c8d0001 Unable to list children of znode /hbase-new4/unassigned 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x13745fc2c8d0001 Received unexpected KeeperException, re-thro 
> wing exception 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned children 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-new4/unassigned 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) 
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) 
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:302) 
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndGetNewChildren(ZKUtil.java:413) 
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeChildrenChanged(AssignmentManager.java:759) 
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:314) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 
> 2012-05-13 19:37:37,861 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira