You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "gaojinchao (JIRA)" <ji...@apache.org> on 2012/06/20 03:54:42 UTC
[jira] [Commented] (HBASE-4246) Cluster with too many regions cannot withstand some master failover scenarios

    [ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397214#comment-13397214 ] 

gaojinchao commented on HBASE-4246:
-----------------------------------

Hi, It also happpened in our cluster when we restarted whole cluster(it has 129723 regions).

2012-06-19 19:29:00,961 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 80400ccd4a1f3438cc23774ca8a88d17 with OFFLINE state
2012-06-19 19:29:00,965 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=80400ccd4a1f3438cc23774ca8a88d17
2012-06-19 19:29:00,966 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 7f1a56641906ae0a6cc6919bd927df76 with OFFLINE state
2012-06-19 19:29:00,969 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=7f1a56641906ae0a6cc6919bd927df76
2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len4670048 is out of range!
	at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:721)
	at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:880)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
2012-06-19 19:29:01,174 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:20000-0x137ed2eb936fb85 Unable to list children of znode /hbase/unassigned 
                
> Cluster with too many regions cannot withstand some master failover scenarios
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4246
>                 URL: https://issues.apache.org/jira/browse/HBASE-4246
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.96.0
>
>
> We ran into the following sequence of events:
> - master startup failed after only ROOT had been assigned (for another reason)
> - restarted the master without restarting other servers. Since there was at least one region assigned, it went through the failover code path
> - master scanned META and inserted every region into /hbase/unassigned in ZK.
> - then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet len6080218 is out of range!" since the IPC response was larger than the default maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira