You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2016/06/03 21:08:59 UTC
[jira] [Commented] (HBASE-15406) Split / merge switch left disabled after early termination of hbck

    [ https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314814#comment-15314814 ] 

Enis Soztutar commented on HBASE-15406:
---------------------------------------

I've looked at this again, especially related to disabling catalog janitor from HBCK in HBASE-15940. The patch as it is only handles split / merge switch and not balancer (which is also disabled in master). I think we should disable catalog janitor as well. But I think we should simplify this patch before 1.3 is released since it is too complex to understand what is going on. The switches have 3 states? We call it a "lock", but save state there and switch back the state? Sorry but this is way too complex to be released I think. I thought the plan was to use ephemeral node to track active HBCK, but the final patch ended up doing something else. 

The problem we are trying to solve is that during HBCK runs or some other "admin" operations, we should not have balancer, catalog janitor and split/merge running. The problem is that HBCK run is not tracked from the master, so that if we disable these switches, they can be left disabled if HBCK run is aborted. 

Can we revert this patch and solve the root cause of the problem instead of adding all of this complexity. I propose we add a "Maintenance Mode" in master similar to the region split / merge, balancer and other switches. The maintenance mode will effectively put all other switches in disabled mode. When admin / HBCK puts the master in maintenance mode, she can optionally supply an ephemeral znode path that the master will watch. As soon as all ephemeral nodes goes away, master will go out of maintenance mode. Every instance of HBCK creates an ephemeral znode, so that even more than one instance is running, there won't be issues if one finishes, while the others are going. wdyt? 

> Split / merge switch left disabled after early termination of hbck
> ------------------------------------------------------------------
>
>                 Key: HBASE-15406
>                 URL: https://issues.apache.org/jira/browse/HBASE-15406
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Heng Chen
>            Priority: Critical
>              Labels: reviewed
>             Fix For: 2.0.0, 1.3.0, 1.4.0
>
>         Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default value after hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)