You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "gaojinchao (JIRA)" <ji...@apache.org> on 2012/06/20 03:54:42 UTC
[jira] [Commented] (HBASE-4246) Cluster with too many regions
cannot withstand some master failover scenarios
[ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397214#comment-13397214 ]
gaojinchao commented on HBASE-4246:
-----------------------------------
Hi, It also happpened in our cluster when we restarted whole cluster(it has 129723 regions).
2012-06-19 19:29:00,961 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 80400ccd4a1f3438cc23774ca8a88d17 with OFFLINE state
2012-06-19 19:29:00,965 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=80400ccd4a1f3438cc23774ca8a88d17
2012-06-19 19:29:00,966 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 7f1a56641906ae0a6cc6919bd927df76 with OFFLINE state
2012-06-19 19:29:00,969 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=7f1a56641906ae0a6cc6919bd927df76
2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len4670048 is out of range!
at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:721)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:880)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
2012-06-19 19:29:01,174 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:20000-0x137ed2eb936fb85 Unable to list children of znode /hbase/unassigned
> Cluster with too many regions cannot withstand some master failover scenarios
> -----------------------------------------------------------------------------
>
> Key: HBASE-4246
> URL: https://issues.apache.org/jira/browse/HBASE-4246
> Project: HBase
> Issue Type: Bug
> Components: master, zookeeper
> Affects Versions: 0.90.4
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.96.0
>
>
> We ran into the following sequence of events:
> - master startup failed after only ROOT had been assigned (for another reason)
> - restarted the master without restarting other servers. Since there was at least one region assigned, it went through the failover code path
> - master scanned META and inserted every region into /hbase/unassigned in ZK.
> - then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet len6080218 is out of range!" since the IPC response was larger than the default maximum.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira