You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Shrijeet Paliwal (Commented) (JIRA)" <ji...@apache.org> on 2012/01/11 06:32:43 UTC

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

    [ https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183875#comment-13183875 ] 

Shrijeet Paliwal commented on HBASE-3638:
-----------------------------------------

We just hit this issue today in production. We did not do an FS bootstrap (I assume you mean cleaning /hbase directory from hdfs by FS bootstrap). It was a regular day a RS was throwing not serving exceptions and I went ahead and restarted it. It was not a META or ROOT serving RS. Following this RS restart hbck started reporting holes in regions. 

Later, for some unexplainable, crazy and panicky reason I restarted Master and all other region servers. This is the point where master started complaining META is in OPENED state in ZK, for a server which no longer exists. And like Todd explained in the other Jira, master went to an unending loop. 

The work around was to clear up all files from ZK data directory. 

What do you think Stack, can master pick a *stale* ZK state which is not a leftover from previous HBase install, in other words a stale state created by itself?
                
> If a FS bootstrap, need to also ensure ZK is cleaned
> ----------------------------------------------------
>
>                 Key: HBASE-3638
>                 URL: https://issues.apache.org/jira/browse/HBASE-3638
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), noticed that we were doing a bootstrap on startup but then we were picking up the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: BOOTSTRAP: creating ROOT and first META regions
> .....
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 1028785192 references a server no longer up X.X.X; letting RIT timeout so will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira