You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2010/04/14 03:25:52 UTC
[jira] Commented: (HBASE-2441) ZK failures early in RS startup sequence cause infinite busy loop

    [ https://issues.apache.org/jira/browse/HBASE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856719#action_12856719 ] 

Todd Lipcon commented on HBASE-2441:
------------------------------------

I think I caused this by starting a RS while the master was down, and then killing ZK. First got the NPE because metrics wasn't initialized yet when abort() came:
{code}
2010-04-13 17:40:28,495 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher 
java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1263)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:373)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
{code}
and then looped forever with:
{code}
2010-04-13 18:00:19,158 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Start code already taken, trying another one
2010-04-13 18:00:19,158 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase/rs -- check quorum servers, currently=monster01.sf.cloudera.com:2222
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeRSLocation(ZooKeeperWrapper.java:586)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1339)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:428)
        at java.lang.Thread.run(Thread.java:619)
{code}

> ZK failures early in RS startup sequence cause infinite busy loop
> -----------------------------------------------------------------
>
>                 Key: HBASE-2441
>                 URL: https://issues.apache.org/jira/browse/HBASE-2441
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> If the RS loses its ZK session before it reports for duty, the abort() call will trigger an NPE, and then the stop boolean doesn't get toggled. The RS will then loop forever trying to register itself in the expired ZK session, and fill up the logs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira