You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Hsieh (Created) (JIRA)" <ji...@apache.org> on 2012/01/05 03:38:39 UTC

[jira] [Created] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

[uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
-----------------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-5128
                 URL: https://issues.apache.org/jira/browse/HBASE-5128
             Project: HBase
          Issue Type: New Feature
            Reporter: Jonathan Hsieh
            Assignee: Jonathan Hsieh


The current (0.90.5, 0.92.0rc2) versions of hbck detect most of the invariant violations (orphans is new).  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases.  When complete will likely deprecate the OfflineMetaRepair tool and subsume several META hole related problems.

{code}
/**
 * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
 * table integrity.  
 * 
 * Region consistency checks verify that META, region deployment on
 * region servers and the state of data in HDFS (.regioninfo files) all are in
 * accordance. 
 * 
 * Table integrity checks verify that that all possible row keys can resolve to
 * exactly one region of a table.  This means there are no individual degenerate
 * or backwards regions; no holes between regions; and that there no overlapping
 * regions. 
 * 
 * The general repair strategy works in these steps.
 * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
 * 2) Repair Region Consistency with META and assignments
 * 
 * For table integrity repairs, the tables their region directories are scanned
 * for .regioninfo files.  Each table's integrity is then verified.  If there 
 * are any orphan regions (regions with no .regioninfo files), or holes, new 
 * regions are fabricated.  Backwards regions are sidelined as well as empty
 * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
 * a new region is created and all data is merged into the new region.  
 * 
 * Table integrity repairs deal solely with HDFS and can be done offline -- the
 * hbase region servers or master do not need to be running.  These phase can be
 * use to completely reconstruct the META table in an offline fashion. 
 * 
 * Region consistency requires three conditions -- 1) valid .regioninfo file 
 * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
 * and 3) a region is deployed only at the regionserver that is was assigned to.
 * 
 * Region consistency requires hbck to contact the HBase master and region
 * servers, so the connect() must first be called successfully.  Much of the
 * region consistency information is transient and less risky to repair.
 */
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228838#comment-13228838 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Stack  The 0.90 version has been used against version that didn't use the offline method and I don't think that the order matters.   I'll double check and report back before I attempt any commits.  What's your thoughts on getting it into 0.92.1rcX if rc0 doesn't make it (not blocking it but getting in if the window opens up?)?  
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237315#comment-13237315 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Thanks for all the reviews LarH, Stack and Ted!  This has been committed to 0.90/0.92/0.94/trunk branches.
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187127#comment-13187127 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ted sounds good.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236165#comment-13236165 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ted 

bq. w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ?

This doesn't seem to capture the fact that there are multiple places where the region needs to be closed.  Maybe fixMultiAssignment?
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186106#comment-13186106 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4384
-----------------------------------------------------------


Did a quick flyby... Looks great.


src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/3435/#comment9856>

    I liked this better before :)



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/3435/#comment9857>

    Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed?
    
    I'm thinking, once this method exists folks will eventually called for other reasons.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9858>

    Nice documentation. This tool is awesome.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9860>

    nice!



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9861>

    Yeah, strange that we do not follow posix here.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9863>

    <0.90.6?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9864>

    I think you said in the intro, that you need to check the availability of this rpc.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9859>

    I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name.
    
    Also should messages go to STDOUT (out) and error go to STDERR (err)?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/3435/#comment9865>

    No wait in case of exception. Is that by design?


- Lars


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Summary: [uber hbck] Online automated repair of table integrity and region consistency problems  (was: [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.)

Issue rename to be more concise.
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236121#comment-13236121 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

bq. Thanks for the mega review
The other way around: after going through the code in detail, I can see your effort in this tool. Thank you on behalf of hbase users.

w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ?

My comments are just advice. As long as a few bugs are addressed (in patches for trunk and .92), I am fine with follow on work.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187128#comment-13187128 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ted sounds good.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-v4.patch
                hbase-5128-0.94-v4.patch
                hbase-5128-0.92-v4.patch

Updated to address ted's last concern, arcanist fixes, and a handful of findbug fixes. 
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237357#comment-13237357 ] 

Hudson commented on HBASE-5128:
-------------------------------

Integrated in HBase-0.92 #337 (See [https://builds.apache.org/job/HBase-0.92/337/])
    HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304667)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228994#comment-13228994 ] 

stack commented on HBASE-5128:
------------------------------

@Jon Should go into 0.92 soon as ready. On...

bq.  I'll double check and report back before I attempt any commits.

That'd be cool.  If you don't do it, I will.  Its pretty critical we not break rolling restart.  Good on you Jon.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237337#comment-13237337 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771>
bq.  >
bq.  >     I suggest renaming holeStart as startRow and renaming holeStop as stopRow.
bq.  >     Then you don't need the comment on 1700.

renamed to holeStartKey and holeStopKey to make it clear.  Add log message to inform user about action.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1812
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1812>
bq.  >
bq.  >     Should include maxMerge in the log.

great suggestion.  done.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1849
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1849>
bq.  >
bq.  >     I wonder whether we should bail if there have been two IOE's, one on 1759 and one here.

This is soft state (doesn't modifiy the file system) so I'm less adamant about hard stopping when these conditions a reached.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1863
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1863>
bq.  >
bq.  >     'Creating' -> 'Created'

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1864
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1864>
bq.  >
bq.  >     Are newRegion and region representing the same entity ?

Good catch, changed to:

       LOG.info("Created new empty container region: " +
            newRegion + " to contain regions: " + Joiner.on(",").join(overlap));


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1872
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1872>
bq.  >
bq.  >     If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ?

hm.. it is possible to have multiple empty overlapping regions merged that do no HFile moves, which would still count as a fix.  I've changed where the return value is added to just increment HBaseFsck's fixes count by 1.


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2159
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2159>
bq.  >
bq.  >     Should say 'unable to get regions from master' or something similar

"Fatal error: unable to get root region location. Exiting..."


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2298
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2298>
bq.  >
bq.  >     Please remove this.

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2299
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2299>
bq.  >
bq.  >     'with not' -> 'without'
bq.  >     Should also include some info on the entry.

"with no"


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2311
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2311>
bq.  >
bq.  >     Please remove this.

done


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2821
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2821>
bq.  >
bq.  >     Typo: maximum

k


bq.  On 2012-03-22 18:10:45, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2705
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2705>
bq.  >
bq.  >     Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime

k


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6229
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236014#comment-13236014 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489>
bq.  >
bq.  >     Shall we continue with the remaining HFiles ?
bq.  
bq.  jmhsieh wrote:
bq.      good point. changed break to continue.

Actually, I think I'm going to change this back to break for the time being -- fail fast and make the user do something about it until we get testing to make sure this recovery makes sense.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6208
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198399#comment-13198399 ] 

Luke Lu commented on HBASE-5128:
--------------------------------

It seems to me that there is nothing in the hbck that's hdfs specific. The comments/variables/methods that refer "Hdfs" should just use "Dfs", IMO.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237669#comment-13237669 ] 

stack commented on HBASE-5128:
------------------------------

Hurray!!!!

Would suggest you stick something in the release note section Jon as means of spreading the good news about this fat tool.  What about this section in the reference manual: http://hbase.apache.org/book.html#hbck  Should we update it some?

Good stuff
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236997#comment-13236997 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review6304
-----------------------------------------------------------

Ship it!


Went through a third.  Minors below that should not hold up commit.   Get it in!!!  Great stuff Jon.


src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment13682>

    Good doc (though I've said this previous, its still good doc)



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment13683>

    Why TreeMap it if its encoded region names?  These are hashes so no value sorting them?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment13684>

    Ditto on sort here?  Why sort by table name?  How does sort prevent dupes?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment13685>

    This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup.  clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does.  For the future.


- Michael


On 2012-03-23 16:13:50, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-23 16:13:50)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This should nearly be to ready for integration.  This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.  
bq.  
bq.  - It needs to track HTableDescritors instead of reading them from the file system.
bq.  - It uses a different HBaseFsckRepair.forceOfflineInZK method -- which for some reason means we don't need HBASE-5563.
bq.  - Uses HServerAddress instead of ServerName
bq.  
bq.  This version is close to what we've used on production clusters.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All TestHBaseFsck unit tests pass.  Currently running full suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193499#comment-13193499 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

It was also suggested that I need to worry about compactions due to a HRegion flush when I close regions during overlap merging.  At least in  0.90, this is not actually necessary -- the closeRegion HMaster side actually flushes but ignores the return value of internalFlushcache return flag that specifies if a region needs to be compacted.

                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186290#comment-13186290 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-13 23:33:00, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1059
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1059>
bq.  >
bq.  >     Good question. They look the same to me.
bq.  >     I think one, possibly clearRegionFromTransition, should be removed.

Think I should remove this in this patch or do a separate jira for it?


bq.  On 2012-01-13 23:33:00, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1744
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line1744>
bq.  >
bq.  >     I think you meant regionOffline()

yes.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4378
-----------------------------------------------------------


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235412#comment-13235412 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6213
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13390>

    Please check return value from delete() call.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13392>

    You renamed it to regionInfoMap, right ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13391>

    I think we should handle RejectedExecutionException and re-submit the item.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13393>

    Shall we log something since these two calls may take some time.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13394>

    Please move this to line 1178



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13395>

    Indentation.


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235912#comment-13235912 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554>
bq.  >
bq.  >     Can we do this in the current JIRA ?
bq.  >     
bq.  >     Why do we need to reload for every type of fix ?

I'd rather do it in a follow on issue.  Correctness first, then performance.  This patch is massive already.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 404
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line404>
bq.  >
bq.  >     Should be 'what are online'

"get regions according to what is online on each RegionServer"


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line418>
bq.  >
bq.  >     checkAndRestoreConsistency() would be a better name.

every other variable is fix* so I think it seems ok to keep this fix as well.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435>
bq.  >
bq.  >     I think master.synchronousBalanceSwitch() is better candidate for this action.

I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 457
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line457>
bq.  >
bq.  >     the trailing s of '.regioninfos' should be removed.

"Orphaned regions are regions without a .regioninfo file in them."


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 484
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line484>
bq.  >
bq.  >     I don't see where the hf is closed.

good catch!


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 488
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line488>
bq.  >
bq.  >     Should hfile be added to a list so that we can report them collectively ?
bq.  >     
bq.  >     Currently user has to search the output of hbck.

bq. From my point of view it is easier to keep these all on separate lines so we can grep the output.  Adding word "orphan" to log message.  


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489>
bq.  >
bq.  >     Shall we continue with the remaining HFiles ?

good point. changed break to continue.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 501
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line501>
bq.  >
bq.  >     Help me understand this comparison:
bq.  >     are we shrinking the range here ?

Good catch! 

The goal here is to indeed expand the region to cover the range of all the hfiles.  


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 531
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line531>
bq.  >
bq.  >     Should read 'If there are errors to be fixed'

   * This method determines if there are table integrity errors in HDFS.  If
   * there are errors and the appropriate "fix" options are enabled, the method
   * will first correct orphan regions making them into legit regiondirs, and
   * then reload to merge potentially overlapping regions.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 567
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line567>
bq.  >
bq.  >     Some assertion here for the declared state (no holes) ?

removed no orphans, no holes from comment - the overlap repairs could happen if the hdfs hole fix options are off.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 655
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line655>
bq.  >
bq.  >     This exception isn't used.
bq.  >     Do we need it ?

not needed and removed.  I believe this is in the 0.90 version and a remnant of porting back and forth between versions.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702>
bq.  >
bq.  >     Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?

The guard makes this only executed once per table.  In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there -- in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.)  


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 800
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line800>
bq.  >
bq.  >     Please put this on line 734

done.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 924
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line924>
bq.  >
bq.  >     rename() returns a boolean, should we check the return value ?

added check similar to the one in the following call to rename.


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 817
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line817>
bq.  >
bq.  >     Why is tablesInfo declared again ?

removed


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 642
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line642>
bq.  >
bq.  >     This exception isn't used.
bq.  >     Do we need it ?

Removed from here. Not needed in this version, but is used in 0.90 version.  


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6208
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233141#comment-13233141 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-11 01:25:43, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 2652
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line2652>
bq.  >
bq.  >     Can we deprecate this method in 0.94 and remove it in 0.96 ?

Completed in HBASE-5588.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5823
-----------------------------------------------------------


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236773#comment-13236773 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519649/hbase-5128-v4.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235965#comment-13235965 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 948
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line948>
bq.  >
bq.  >     Please check return value from delete() call.

done


bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1040
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1040>
bq.  >
bq.  >     You renamed it to regionInfoMap, right ?

yes


bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076>
bq.  >
bq.  >     I think we should handle RejectedExecutionException and re-submit the item.

Follow on issue.  Failing hard here is probably good, and the change here was just more logging.


bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1235
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1235>
bq.  >
bq.  >     Shall we log something since these two calls may take some time.

Do you mean between the two calls?  The close silently method fails after a 120s timeout.


bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1257
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1257>
bq.  >
bq.  >     Please move this to line 1178

sure


bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1272
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1272>
bq.  >
bq.  >     Indentation.

sure


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6213
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235398#comment-13235398 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6208
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13371>

    Should be 'what are online'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13372>

    checkAndRestoreConsistency() would be a better name.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13373>

    I think master.synchronousBalanceSwitch() is better candidate for this action.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13374>

    the trailing s of '.regioninfos' should be removed.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13375>

    I don't see where the hf is closed.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13376>

    Should hfile be added to a list so that we can report them collectively ?
    
    Currently user has to search the output of hbck.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13377>

    Shall we continue with the remaining HFiles ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13379>

    Help me understand this comparison:
    are we shrinking the range here ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13380>

    Should read 'If there are errors to be fixed'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13382>

    Can we do this in the current JIRA ?
    
    Why do we need to reload for every type of fix ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13381>

    Some assertion here for the declared state (no holes) ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13383>

    This exception isn't used.
    Do we need it ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13384>

    This exception isn't used.
    Do we need it ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13385>

    Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13386>

    Please put this on line 734



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13387>

    Why is tablesInfo declared again ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13388>

    rename() returns a boolean, should we check the return value ?


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Description: 
The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

Here's the approach (from the comment of at the top of the new version of the file).
{code}
/**
 * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
 * table integrity.  
 * 
 * Region consistency checks verify that META, region deployment on
 * region servers and the state of data in HDFS (.regioninfo files) all are in
 * accordance. 
 * 
 * Table integrity checks verify that that all possible row keys can resolve to
 * exactly one region of a table.  This means there are no individual degenerate
 * or backwards regions; no holes between regions; and that there no overlapping
 * regions. 
 * 
 * The general repair strategy works in these steps.
 * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
 * 2) Repair Region Consistency with META and assignments
 * 
 * For table integrity repairs, the tables their region directories are scanned
 * for .regioninfo files.  Each table's integrity is then verified.  If there 
 * are any orphan regions (regions with no .regioninfo files), or holes, new 
 * regions are fabricated.  Backwards regions are sidelined as well as empty
 * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
 * a new region is created and all data is merged into the new region.  
 * 
 * Table integrity repairs deal solely with HDFS and can be done offline -- the
 * hbase region servers or master do not need to be running.  These phase can be
 * use to completely reconstruct the META table in an offline fashion. 
 * 
 * Region consistency requires three conditions -- 1) valid .regioninfo file 
 * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
 * and 3) a region is deployed only at the regionserver that is was assigned to.
 * 
 * Region consistency requires hbck to contact the HBase master and region
 * servers, so the connect() must first be called successfully.  Much of the
 * region consistency information is transient and less risky to repair.
 */
{code}



  was:
The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

Here's the approach (from the comment of at the top of the new version of the file).
{code}
/**
 * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
 * table integrity.  
 * 
 * Region consistency checks verify that META, region deployment on
 * region servers and the state of data in HDFS (.regioninfo files) all are in
 * accordance. 
 * 
 * Table integrity checks verify that that all possible row keys can resolve to
 * exactly one region of a table.  This means there are no individual degenerate
 * or backwards regions; no holes between regions; and that there no overlapping
 * regions. 
 * 
 * The general repair strategy works in these steps.
 * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
 * 2) Repair Region Consistency with META and assignments
 * 
 * For table integrity repairs, the tables their region directories are scanned
 * for .regioninfo files.  Each table's integrity is then verified.  If there 
 * are any orphan regions (regions with no .regioninfo files), or holes, new 
 * regions are fabricated.  Backwards regions are sidelined as well as empty
 * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
 * a new region is created and all data is merged into the new region.  
 * 
 * Table integrity repairs deal solely with HDFS and can be done offline -- the
 * hbase region servers or master do not need to be running.  These phase can be
 * use to completely reconstruct the META table in an offline fashion. 
 * 
 * Region consistency requires three conditions -- 1) valid .regioninfo file 
 * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
 * and 3) a region is deployed only at the regionserver that is was assigned to.
 * 
 * Region consistency requires hbck to contact the HBase master and region
 * servers, so the connect() must first be called successfully.  Much of the
 * region consistency information is transient and less risky to repair.
 */
{code}



    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5128:
------------------------------

    Attachment: 5128-trunk.addendum

Addendum for trunk.
Hadoop QA couldn't work when compilation is broken.
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Description: 
The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

Here's the approach (from the comment of at the top of the new version of the file).
{code}
/**
 * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
 * table integrity.  
 * 
 * Region consistency checks verify that META, region deployment on
 * region servers and the state of data in HDFS (.regioninfo files) all are in
 * accordance. 
 * 
 * Table integrity checks verify that that all possible row keys can resolve to
 * exactly one region of a table.  This means there are no individual degenerate
 * or backwards regions; no holes between regions; and that there no overlapping
 * regions. 
 * 
 * The general repair strategy works in these steps.
 * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
 * 2) Repair Region Consistency with META and assignments
 * 
 * For table integrity repairs, the tables their region directories are scanned
 * for .regioninfo files.  Each table's integrity is then verified.  If there 
 * are any orphan regions (regions with no .regioninfo files), or holes, new 
 * regions are fabricated.  Backwards regions are sidelined as well as empty
 * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
 * a new region is created and all data is merged into the new region.  
 * 
 * Table integrity repairs deal solely with HDFS and can be done offline -- the
 * hbase region servers or master do not need to be running.  These phase can be
 * use to completely reconstruct the META table in an offline fashion. 
 * 
 * Region consistency requires three conditions -- 1) valid .regioninfo file 
 * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
 * and 3) a region is deployed only at the regionserver that is was assigned to.
 * 
 * Region consistency requires hbck to contact the HBase master and region
 * servers, so the connect() must first be called successfully.  Much of the
 * region consistency information is transient and less risky to repair.
 */
{code}



  was:
The current (0.90.5, 0.92.0rc2) versions of hbck detect most of the invariant violations (orphans is new).  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases.  When complete will likely deprecate the OfflineMetaRepair tool and subsume several META hole related problems.

{code}
/**
 * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
 * table integrity.  
 * 
 * Region consistency checks verify that META, region deployment on
 * region servers and the state of data in HDFS (.regioninfo files) all are in
 * accordance. 
 * 
 * Table integrity checks verify that that all possible row keys can resolve to
 * exactly one region of a table.  This means there are no individual degenerate
 * or backwards regions; no holes between regions; and that there no overlapping
 * regions. 
 * 
 * The general repair strategy works in these steps.
 * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
 * 2) Repair Region Consistency with META and assignments
 * 
 * For table integrity repairs, the tables their region directories are scanned
 * for .regioninfo files.  Each table's integrity is then verified.  If there 
 * are any orphan regions (regions with no .regioninfo files), or holes, new 
 * regions are fabricated.  Backwards regions are sidelined as well as empty
 * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
 * a new region is created and all data is merged into the new region.  
 * 
 * Table integrity repairs deal solely with HDFS and can be done offline -- the
 * hbase region servers or master do not need to be running.  These phase can be
 * use to completely reconstruct the META table in an offline fashion. 
 * 
 * Region consistency requires three conditions -- 1) valid .regioninfo file 
 * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
 * and 3) a region is deployed only at the regionserver that is was assigned to.
 * 
 * Region consistency requires hbck to contact the HBase master and region
 * servers, so the connect() must first be called successfully.  Much of the
 * region consistency information is transient and less risky to repair.
 */
{code}



    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-v3.patch
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185889#comment-13185889 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

@Jonathan:
Can we see your patch ?
Compatibility checks sound great.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182898#comment-13182898 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ted,

For #1.  I'd ideally like the tool to be backwards compatible with existing 0.90's.  I think this version will work for older versions in cases where the problem is table region holes.  This problem only affects when attempting to repair overlapping regions.   If I need to modify servers to update the unassign/close api, I'll probably put warnings on the code so that the user is aware of potential issues if using hbck to fix older versions (or possibly ask the user to failover to another master). 

For #2, makes sense -- I'll spend more time digging into what is "in-motion" causing the flaky tests.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237311#comment-13237311 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519754/hbase-5128-0.90-v4.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1289//console

This message is automatically generated.
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236011#comment-13236011 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1354
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1354>
bq.  >
bq.  >     Please log some information about this region

done


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1358
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1358>
bq.  >
bq.  >     Redundant 'with'

done


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1363
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1363>
bq.  >
bq.  >     'reassigned' -> 'reassign'

done


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1375
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1375>
bq.  >
bq.  >     It would be nice to create method so that this block of code and lines 1271-1289 can be unified.

used in 3 places, sure.


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1410
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1410>
bq.  >
bq.  >     Please remove unused code.

done


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1436
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1436>
bq.  >
bq.  >     (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ?

This logic is unchanged from since before I started modifying hbck.  I think those cases are handled in the healthy section.


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1530
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1530>
bq.  >
bq.  >     checkRegionChain() is synchronous.
bq.  >     Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ?

I generally prefer a style where we set internal variables once during constructors and avoid using get/set methods since it makes the the lifecycle of the object simpler and makes it easier if we want to parallelize it in the future.  Since this is the body of the loop it should be easy for the jvm to keep on the stack.


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1542
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1542>
bq.  >
bq.  >     This would eclipse the global counter, right ?

The return value is and added to the HbaseFsck object's fixed field at call sites.  I'll rename and add comments about return value.


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1668
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1668>
bq.  >
bq.  >     This class can be private

ok


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1728
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1728>
bq.  >
bq.  >     This class can be private

ok


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1759
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1759>
bq.  >
bq.  >     The following four lines are repeated 3 times in this class.
bq.  >     Refactor and create a new method.

ok


bq.  On 2012-03-22 16:55:08, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1445
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1445>
bq.  >
bq.  >     Looking at fixDupeAssignment(), it really does region closing and offlining.
bq.  >     Can we give it a better name ?

do you have a suggestion?  In my mind it the higher level goal of the combined actions -- it tries to fix regions that have assigned to many places.  I could see fixMultiAssignment as a slight improvement.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6224
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5128:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519405/hbase-5128-0.90-v2b.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1254//console

This message is automatically generated.)
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182417#comment-13182417 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

I'm posting a preliminary version that I'm currently testing on real clusters.  The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing.  It's got some problems I need some advice figuring out.  

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail.  I think this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state -- disable uses out of date in-memory region assignments.  If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.  

What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).  

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META.  This means the old and new regions are confiused with each other and basically something is still happening asynchronously.  I think this is the new region is being assigned and is still transitioning.  Sound about right?  To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? 
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185961#comment-13185961 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4375
-----------------------------------------------------------


I noticed lots of extra whitespaces


src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/3435/#comment9835>

    Lots of whitespaces


- Alex


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235359#comment-13235359 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6205
-----------------------------------------------------------


Nice work.


src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13364>

    I think tablesToFix would be a better name for this member.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13365>

    'encoded region name' would be clearer.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13366>

    TInfo should be TableInfo



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13367>

    Currently this config is hidden.
    It would be nice to mention it in release notes.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13368>

    fixes is a global variable.
    I think the loop condition should check that fixes increases across iterations.
    If the count doesn't increase, we can break out of the loop.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13369>

    Why is 2 specially treated here ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13370>

    Ideally a different return value (say -2) should be used.


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-trunk.patch
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237324#comment-13237324 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-23 19:53:18, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 191
bq.  > <https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line191>
bq.  >
bq.  >     Why TreeMap it if its encoded region names?  These are hashes so no value sorting them?

I think you are right.  The sorting is necessary it the range managing data structure but not here.  I'll file a follow up for this and the following issue.


bq.  On 2012-03-23 19:53:18, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 344
bq.  > <https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line344>
bq.  >
bq.  >     This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup.  clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does.  For the future.

I'll file a follow on jira for that too.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review6304
-----------------------------------------------------------


On 2012-03-23 16:13:50, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-23 16:13:50)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This should nearly be to ready for integration.  This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.  
bq.  
bq.  - It needs to track HTableDescritors instead of reading them from the file system.
bq.  - It uses a different HBaseFsckRepair.forceOfflineInZK method -- which for some reason means we don't need HBASE-5563.
bq.  - Uses HServerAddress instead of ServerName
bq.  
bq.  This version is close to what we've used on production clusters.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All TestHBaseFsck unit tests pass.  Currently running full suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237531#comment-13237531 ] 

Hudson commented on HBASE-5128:
-------------------------------

Integrated in HBase-TRUNK #2694 (See [https://builds.apache.org/job/HBase-TRUNK/2694/])
    HBASE-5128 Addendum adds two new files Jon forgot to add (Revision 1304702)
HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304665)

     Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java

jmhsieh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-trunk-v2.patch
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235191#comment-13235191 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/
-----------------------------------------------------------

(Updated 2012-03-21 23:24:13.538416)


Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.


Changes
-------

Updated from Ted and Stack's reviews.

Highlights:
- usage and actual command line params renamed and updated.


Summary
-------

This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

1) No trackHTD method needed since we can read from the file system.
2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
3) Fixed comparator in HRegionInfo
4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 

Diff: https://reviews.apache.org/r/4280/diff


Testing
-------

Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Affects Version/s: 0.96.0
                       0.94.0
               Status: Patch Available  (was: In Progress)

This version is getting very large, and though imperfect it is quite useful.  Would prefer to get this in and then add follow in jiras to improve this. 

Attached version for trunk.  Will backport to 0.92/0.94.  Also have version for 0.90 but would like to get trunk version in first now.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180138#comment-13180138 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

I've been working on a new version of hbck that solves a whole bunch of potential problems in HBase tables.   Currently it is implemented for a variant of 0.90 in mind -- there will likely be some minor work to port to stock 0.90.5, and significant work required to port it to trunk / 0.92.

                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detect most of the invariant violations (orphans is new).  However with '-fix' it can only automatically handle deployment problems with region consistency cases.  This updated version should be able to handle all cases.  When complete will likely deprecate the OfflineMetaRepair tool and subsume several META hole related problems.
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199310#comment-13199310 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4781
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10510>

    Is this wrong? Should it be "> 0" here and "< 0" below?


- Jimmy


On 2012-01-25 17:24:41, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-25 17:24:41)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235719#comment-13235719 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6224
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13407>

    Please log some information about this region



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13408>

    Redundant 'with'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13409>

    'reassigned' -> 'reassign'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13410>

    It would be nice to create method so that this block of code and lines 1271-1289 can be unified.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13411>

    Please remove unused code.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13413>

    (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13414>

    Looking at fixDupeAssignment(), it really does region closing and offlining.
    Can we give it a better name ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13415>

    checkRegionChain() is synchronous.
    Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13416>

    This would eclipse the global counter, right ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13420>

    This class can be private



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13421>

    This class can be private



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13422>

    The following four lines are repeated 3 times in this class.
    Refactor and create a new method.


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235340#comment-13235340 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519401/hbase-5128-0.90-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1253//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239959#comment-13239959 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Docs jira is here: HBASE-5634.  

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "fulin wang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236488#comment-13236488 ] 

fulin wang commented on HBASE-5128:
-----------------------------------

1.NOT_IN_META_OR_DEPLOYED
  handler.handleHoleInRegionChain(key, holeStopKey);
  
  NOT_IN_META
  HBaseFsckRepair.fixMetaHoleOnline(conf, hbi.getHdfsHRI());
   
  I think that you should check the region file of table the hole and the region in the hole, you can create region for this hole.
  otherwise you should not create region. 
  There is scenarios you shou think: the region of table is good or this region is junk file.

2.FIRST_REGION_STARTKEY_NOT_EMPTY and HOLE_IN_REGION_CHAIN
  I think when there is a type of error you can create empty region for this hole.
  if there is another error, another error you handle the first.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185985#comment-13185985 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4378
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/3435/#comment9838>

    I think you meant regionOffline()



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/3435/#comment9837>

    Good question. They look the same to me.
    I think one, possibly clearRegionFromTransition, should be removed. 


- Ted


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235273#comment-13235273 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519382/hbase-5128-0.92-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1252//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186312#comment-13186312 ] 

Zhihong Yu edited comment on HBASE-5128 at 1/14/12 9:27 PM:
------------------------------------------------------------

I think we should keep offline() and deprecate clearRegionFromTransition(). 
Let's remove clearRegionFromTransition() in another JIRA. 
                
      was (Author: zhihyu@ebaysf.com):
    I think we should keep offline() and deprecate the other method. 
Let's remove that one in another Jira. 
                  
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237358#comment-13237358 ] 

Hudson commented on HBASE-5128:
-------------------------------

Integrated in HBase-0.94 #50 (See [https://builds.apache.org/job/HBase-0.94/50/])
    HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304666)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234581#comment-13234581 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1689
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line1689>
bq.  >
bq.  >     Precede regioninfo with a dot.

done


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879>
bq.  >
bq.  >     Can we name this option fixRegionHolesOnHdfs ?
bq.  >     It would be better to note which options can be run with cluster offline.

at the moment, hbck can only be run while hbase is online.  This has not been unified with OfflineMetaRebuild yet.


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2880
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2880>
bq.  >
bq.  >     Name this fixRegionOverlapsOnHdfs ?

I'm not sure what making the flag even long buys us.  I was thinking about making it even more concise: -fixHoles, -fixOverlaps.  The assumption in this tool is that the data in the file system is golden and to reconstruct everything from there (previous version trusted meta table only).


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2949
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2949>
bq.  >
bq.  >     white space.

fixed


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 28
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line28>
bq.  >
bq.  >     Should read 'callbacks for handling particular table integrity invariant violations detected.'

updated to be in english. :)


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 33
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line33>
bq.  >
bq.  >     Please add javadoc for the handleXXX methods on what scenario each fixes.

done


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 52
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line52>
bq.  >
bq.  >     This class should be abstract.
bq.  >     It is better to put it in its own file.

done.


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 38
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line38>
bq.  >
bq.  >     Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey()

renamed to handleRegionStartKeyNotEmpty


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2878
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2878>
bq.  >
bq.  >     More than one option modifies .META. table.
bq.  >     Shall we name this option fixMetaUsingRegionInfoOnHdfs ?

there are two kinds of flags here -- individual flags like -fixAssignments, -fixMeta, -fixHdfs*, and combo flags that enable a few such as -fixAll.  I going to change the combo flags to make them more distinct;  I'll change -fixAll to be -allFix or something like that to make it clearer.  I also need to update the usage info to be more accurate.


bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2876
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2876>
bq.  >
bq.  >     Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent.
bq.  >     What was the reason for deprecating -fix ?

-fix and -fixAssigments are equivalent to the original application's behavior.  I didn't want our front line supporters to use -fix assuming old behavior and have it fix durable state (hdfs modifications), so I added other flags to enable those modifications.  With the other options it seemed like changing the name to be consistent made sense. 


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5826
-----------------------------------------------------------


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235362#comment-13235362 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519405/hbase-5128-0.90-v2b.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1254//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228057#comment-13228057 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5860
-----------------------------------------------------------

Ship it!


I went through about half of this patch.  Its plain that there have been a bunch of improvements.  There is some great stuff in here.  I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues.


src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
<https://reviews.apache.org/r/4280/#comment12760>

    Interesting.  https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this.



src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
<https://reviews.apache.org/r/4280/#comment12761>

    Will this break compatibility?  Put at the end of the Interface and it might be ok.
    
    I think we need this one.  In the past, we've addressed this issue by having the user restart master.



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/4280/#comment12764>

    Good one.



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/4280/#comment12765>

    We overloaded the method here?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12766>

    Add <p> here is you want the line between paragraphs to come out in javadoc.  You add a white space for each empty line.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12770>

    that there 'are' no...



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12772>

    s/handleful/handful/



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12773>

    Capitalize 'replaces'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12774>

    Nice doc.  Helps.
    
    Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment.  They have a tendency to diverge.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12777>

    This is 'destructive' in that it changes whats on hdfs?  If so, change the comment above.... it says 'determine'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12778>

    Declare and assign in the one step?  As is, you declare two lines above and then assign it here



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12779>

    Excellent



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12780>

    Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469
    
    ?


- Michael


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193119#comment-13193119 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 91
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line91>
bq.  >
bq.  >     I think '.META.' should be used.

ok


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 118
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line118>
bq.  >
bq.  >     Should read 'that it was assigned to'

ok


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 154
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line154>
bq.  >
bq.  >     This is about fixing region assignment, right ?
bq.  >     Better include that in javadoc.

done


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 121
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line121>
bq.  >
bq.  >     Should read 'repairs require hbase ...'
bq.  >     
bq.  >     'to' at the end is not needed.

done


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line172>
bq.  >
bq.  >     Should read ' and correct '

done


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 174
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line174>
bq.  >
bq.  >     Would regionInfoMap be a better name ?

done


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 270
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line270>
bq.  >
bq.  >     Please correct this sentence's syntax.

sure


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 280
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line280>
bq.  >
bq.  >     We should impose maximum number of iterations for the loop, right ?

good point.


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 287
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line287>
bq.  >
bq.  >     Should read 'method requires cluster to be online ...'

done.


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 289
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line289>
bq.  >
bq.  >     Should read ' to be consistent'

reworded


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 337
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line337>
bq.  >
bq.  >     Should be called checkAndFixIntegrity()

ok.


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 334
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line334>
bq.  >
bq.  >     Should be called checkAndFixConsistency()

ok


bq.  On 2012-01-14 00:15:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 343
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line343>
bq.  >
bq.  >     This sentence can be omitted.
bq.  >     If you keep it, please move it after the @return line.

removed


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4379
-----------------------------------------------------------


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228844#comment-13228844 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

@Jonathan:
Can you address QA report @ 10/Mar/12 02:00 ?

There're outstanding review comments on review board.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236162#comment-13236162 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-03-22 23:18:21.689735)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

Updated from ted's mega review.  Hbck and OfflineMetaRebuild tests pass.  

Plan on committing and filing several follow up jiras if this passes hadoop qa robot.


Summary
-------

This should nearly be to ready for integration.  This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.  

- It needs to track HTableDescritors instead of reading them from the file system.
- It uses a different HBaseFsckRepair.forceOfflineInZK method -- which for some reason means we don't need HBASE-5563.
- Uses HServerAddress instead of ServerName

This version is close to what we've used on production clusters.


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 

Diff: https://reviews.apache.org/r/3435/diff


Testing
-------

All TestHBaseFsck unit tests pass.  Currently running full suite.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193121#comment-13193121 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-11 21:15:13, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1586
bq.  > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1586>
bq.  >
bq.  >     Should be 'to end key'.

update this and handful of other comments.


bq.  On 2012-01-11 21:15:13, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1594
bq.  > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1594>
bq.  >
bq.  >     Should insert some text between newRegion and region.

updated


bq.  On 2012-01-11 21:15:13, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1600
bq.  > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1600>
bq.  >
bq.  >     This should be outside the for loop.

done


bq.  On 2012-01-11 21:15:13, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1602
bq.  > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1602>
bq.  >
bq.  >     Space between > and 0.

done


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4317
-----------------------------------------------------------


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235357#comment-13235357 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-03-22 05:16:09.079201)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

Updated with safeguard features found in trunk/0.94/0.92 versions.


Summary (updated)
-------

This should nearly be to ready for integration.  This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.  

- It needs to track HTableDescritors instead of reading them from the file system.
- It uses a different HBaseFsckRepair.forceOfflineInZK method -- which for some reason means we don't need HBASE-5563.
- Uses HServerAddress instead of ServerName

This version is close to what we've used on production clusters.


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java de6ebe3 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 7404377 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1ec17cd 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 

Diff: https://reviews.apache.org/r/3435/diff


Testing (updated)
-------

All TestHBaseFsck unit tests pass.  Currently running full suite.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234491#comment-13234491 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 805
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90973#file90973line805>
bq.  >
bq.  >     Interesting.  https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this.

HBASE-5563 committed.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java, line 229
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90974#file90974line229>
bq.  >
bq.  >     Will this break compatibility?  Put at the end of the Interface and it might be ok.
bq.  >     
bq.  >     I think we need this one.  In the past, we've addressed this issue by having the user restart master.

This is being handled in https://issues.apache.org/jira/browse/HBASE-5589.  In the notes there the compatiblity breaking does not happen.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5860
-----------------------------------------------------------


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235245#comment-13235245 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519364/hbase-5128-trunk-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185891#comment-13185891 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

I'm working on it.  Was working on some of the TODOs and got caught with another snag.  It will come soon.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236704#comment-13236704 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Porting to changes to 0.90 is causing some test flakyness in that version.  My plan is to work these out (there are more constraints there -- need to figure out why they flake, need to avoid a master-side HBASE-5563 change, and figure out the ramifications.  I plan on opening a new issue to back port this patch to 0.90.  While trunk/0.94/0.92 versions are very similar, 0.90 has several differences.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235727#comment-13235727 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line172>
bq.  >
bq.  >     I think tablesToFix would be a better name for this member.

agreed.


bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 186
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line186>
bq.  >
bq.  >     'encoded region name' would be clearer.

"It maps from encoded region name to HbckInfo structure. "


bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 198
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line198>
bq.  >
bq.  >     TInfo should be TableInfo

done


bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 363
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line363>
bq.  >
bq.  >     fixes is a global variable.
bq.  >     I think the loop condition should check that fixes increases across iterations.
bq.  >     If the count doesn't increase, we can break out of the loop.

clearState() reset's the fixes count.  I'll add a comment.


bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line365>
bq.  >
bq.  >     Why is 2 specially treated here ?

iteration 1 does repairs, iteration 2 verifies things are fixed.  If there are more something funny has happened.  adding comment.

Changed success logging message to info.


bq.  On 2012-03-22 05:21:28, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 396
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line396>
bq.  >
bq.  >     Ideally a different return value (say -2) should be used.

done


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6205
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186013#comment-13186013 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4379
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9839>

    I think '.META.' should be used.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9840>

    Should read 'that it was assigned to'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9841>

    Should read 'repairs require hbase ...'
    
    'to' at the end is not needed.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9842>

    This is about fixing region assignment, right ?
    Better include that in javadoc.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9843>

    Should read ' and correct '



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9844>

    Would regionInfoMap be a better name ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9845>

    Please correct this sentence's syntax.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9846>

    We should impose maximum number of iterations for the loop, right ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9848>

    Should read 'method requires cluster to be online ...'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9847>

    Should read ' to be consistent'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9849>

    The method is called Repair, so the return value should be number of errors fixed, right ?
    I think a Pair return value would allow both errors detected and errors fixed to be returned.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9850>

    Should be called checkAndFixConsistency()



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9851>

    Should be called checkAndFixIntegrity()



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9852>

    This sentence can be omitted.
    If you keep it, please move it after the @return line.


- Ted


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182948#comment-13182948 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

The code in HBASE-1621 code does something similar to my problem cases so it might be the solution as well -- apparently meta regioninfos has an offline flag. (not sure if this is just trunk though). 
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226711#comment-13226711 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517811/hbase-5128-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 24 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -122 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 163 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestDistributedLogSplitting
                  org.apache.hadoop.hbase.coprocessor.TestClassLoading
                  org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
                  org.apache.hadoop.hbase.master.TestRollingRestart
                  org.apache.hadoop.hbase.client.TestAdmin
                  org.apache.hadoop.hbase.client.TestShell

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199955#comment-13199955 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Update:

Recently found a case that may have been suffering from parent region not getting removed by the catalog janitor.  Since we rely on hdfs being ground truth and this version did not check have offline/split status in meta, this resulted in the tool attempting to merge all regions into one mega region.  Harsh mentioned, that the parent region cleanup issue might be related to (HBASE-4799) (target cluster didn't have this patch).

Next cuts will add some failsafes -- specifiers to repair only specific tables and to skip if a merge attempts to merge more than a specified number of regions into one region.

Also, at the moment, I also have first cut versions that for 0.92/trunk but have one flaky test.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236204#comment-13236204 ] 

Hadoop QA commented on HBASE-5128:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519538/hbase-5128-v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//console

This message is automatically generated.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184400#comment-13184400 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4317
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9714>

    Should be 'to end key'.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9715>

    Should insert some text between newRegion and region.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9716>

    This should be outside the for loop.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment9717>

    Space between > and 0.


- Ted


On 2012-01-11 12:46:37, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-11 12:46:37)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226667#comment-13226667 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/
-----------------------------------------------------------

Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.


Summary
-------

This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

1) No trackHTD method needed since we can read from the file system.
2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
3) Fixed comparator in HRegionInfo
4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 

Diff: https://reviews.apache.org/r/4280/diff


Testing
-------

Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235896#comment-13235896 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6239
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13498>

    Is @Override missing ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13474>

    Renaming this method is desirable as I mentioned earlier.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13473>

    Typo: assume



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13475>

    This method is called in two places where HBaseAdmin is available.
    
    Please change the method signature to avoid creating HBaseAdmin every time.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13478>

    Typo: handleHBCK



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13485>

    This is called in a loop in checkMetaRegion().
    It would be nice for this method to take a list of regions and wait for them to come out of RIT.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13483>

    Why ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13484>

    success is no longer set in this method.
    This can be removed.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13486>

    Shall we return directly here ?
    The new exception would be caught at line 182



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13487>

    It would be nice to cache meta for subsequent calls.
    Can be done in another JIRA.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/4280/#comment13489>

    Please use this method in the three places of HBaseFsck I mentioned.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment13494>

    Javadoc for parameters.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment13495>

    Javadoc for parameters.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment13496>

    Javadoc for parameters.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
<https://reviews.apache.org/r/4280/#comment13497>

    Can this class be package-private ?



src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
<https://reviews.apache.org/r/4280/#comment13501>

    This check was added because of failed test ?



src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13502>

    Can we reuse the method from HBaseFsck ?


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236098#comment-13236098 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771>
bq.  >
bq.  >     Is @Override missing ?

yeah, i missed all of them.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 72
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line72>
bq.  >
bq.  >     Renaming this method is desirable as I mentioned earlier.

Suggestion?


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 92
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line92>
bq.  >
bq.  >     Typo: assume

"This assumes that info is in META."


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 99
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line99>
bq.  >
bq.  >     This method is called in two places where HBaseAdmin is available.
bq.  >     
bq.  >     Please change the method signature to avoid creating HBaseAdmin every time.

thanks.  This was something missed when porting back and forth between 0.90 and 0.92.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 152
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line152>
bq.  >
bq.  >     Why ?

removed


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 161
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line161>
bq.  >
bq.  >     success is no longer set in this method.
bq.  >     This can be removed.

done (likely from 0.90 version)


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 185
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line185>
bq.  >
bq.  >     Shall we return directly here ?
bq.  >     The new exception would be caught at line 182

yes.  


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 215
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line215>
bq.  >
bq.  >     Please use this method in the three places of HBaseFsck I mentioned.

done


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java, line 274
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94419#file94419line274>
bq.  >
bq.  >     Can we reuse the method from HBaseFsck ?

done


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 1217
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94418#file94418line1217>
bq.  >
bq.  >     This check was added because of failed test ?

This is an unhandled case.  In one of the patches I had some extra ScrubMeta and DumpMeta methods that would clean this up -- this is follow on work for another jira.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java, line 30
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94417#file94417line30>
bq.  >
bq.  >     Can this class be package-private ?

not yet -- hbck needs to be moved from o.a.h.h.util to o.a.h.h.util.hbck for this to be possible.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 63
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line63>
bq.  >
bq.  >     Javadoc for parameters.

Updated in interface, added:

  /**
   * {@inheritDoc}
   */


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 71
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line71>
bq.  >
bq.  >     Javadoc for parameters.

Updated in interface, added:

  /**
   * {@inheritDoc}
   */


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 83
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line83>
bq.  >
bq.  >     Javadoc for parameters.

Updated in interface, added:

  /**
   * {@inheritDoc}
   */
(and for the other cases).


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 112
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line112>
bq.  >
bq.  >     Typo: handleHBCK

this comment is not relevent to this branch anymore, removing.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 122
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line122>
bq.  >
bq.  >     This is called in a loop in checkMetaRegion().
bq.  >     It would be nice for this method to take a list of regions and wait for them to come out of RIT.

This was a cause of a bunch of flakyness or 5 second sleeps in the older hbck so I updated this.


bq.  On 2012-03-22 19:00:46, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 207
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line207>
bq.  >
bq.  >     It would be nice to cache meta for subsequent calls.
bq.  >     Can be done in another JIRA.

follow up jira.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6239
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236727#comment-13236727 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-03-23 16:13:50.054043)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

Addresses a few last concerns and does some arcanist and findbugs related tweaks.


Summary
-------

This should nearly be to ready for integration.  This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.  

- It needs to track HTableDescritors instead of reading them from the file system.
- It uses a different HBaseFsckRepair.forceOfflineInZK method -- which for some reason means we don't need HBASE-5563.
- Uses HServerAddress instead of ServerName

This version is close to what we've used on production clusters.


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 

Diff: https://reviews.apache.org/r/3435/diff


Testing
-------

All TestHBaseFsck unit tests pass.  Currently running full suite.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5128:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519401/hbase-5128-0.90-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1253//console

This message is automatically generated.)
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186251#comment-13186251 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ram

I think I need a few days (test/polish) to get this completely ready -- if you are willing to wait/review to get this through I'm willing to hack on it today/tommorrow to get it through.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237241#comment-13237241 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Full suites of 0.92/0.94/trunk versions pass.

Looks like the 0.90 version has always had flakey tests for the same reason the 0.92/0.94/trunk versions.  It is related to assignment and HBASE-5563, but just didn't  happen as often as in the 0.92/0.94/trunk version (2/10 runs vs 5/10 runs).  HBASE-5563 would not be available on older clusters but won't cause permanent problems if this updated hbck was used against a version that did not have the improvement.

Let's say using this hbck against an older 0.90-based cluster that didn't have  HBASE-5563 or HBASE-5589.  The side effect is that you may have to run 'hbck -fixAssignments' an extra time to fix region assignment/deployment problems after disabling and deleting a table that has been fixed, or alternately, you may need to bounce the HMaster or affected RegionServer to clean up this transient state.

I currently have a 0.90 version of HBASE-5563 (attached there), and an updated HBASE-5128 for 0.90 that is as close as possible to the 0.92/0.94/trunk versions as possible.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227057#comment-13227057 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5826
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12728>

    Precede regioninfo with a dot.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12724>

    Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent.
    What was the reason for deprecating -fix ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12722>

    More than one option modifies .META. table.
    Shall we name this option fixMetaUsingRegionInfoOnHdfs ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12721>

    Can we name this option fixRegionHolesOnHdfs ?
    It would be better to note which options can be run with cluster offline.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12723>

    Name this fixRegionOverlapsOnHdfs ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment12725>

    white space.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment12726>

    Should read 'callbacks for handling particular table integrity invariant violations detected.'



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment12730>

    Please add javadoc for the handleXXX methods on what scenario each fixes.



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment12729>

    Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey()



src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
<https://reviews.apache.org/r/4280/#comment12727>

    This class should be abstract.
    It is better to put it in its own file.


- Ted


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-0.90-v2.patch

For the 0.90v2 version, TestHBaseFsck passes consistently.  This is very close to the version we've used to repair production clusters. 

0.90 version has a different HBaseFsckRepair.forceOfflineInZK() which is somehow responsible for that version not needed HBASE-5563.  I haven't investigated enough to determine why the equivalent method for the 0.92/0.94/trunk versions fail unit tests consistently.

                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237490#comment-13237490 ] 

Hudson commented on HBASE-5128:
-------------------------------

Integrated in HBase-0.94 #51 (See [https://builds.apache.org/job/HBase-0.94/51/])
    HBASE-5128 Addendum adds two missing new files (Revision 1304722)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193120#comment-13193120 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 586
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line586>
bq.  >
bq.  >     I liked this better before :)

I probably broke this out to be easier to step debug.   I can restore.


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 154
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line154>
bq.  >
bq.  >     No wait in case of exception. Is that by design?

nice catch. 


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083>
bq.  >
bq.  >     I think you said in the intro, that you need to check the availability of this rpc.

done in next version.


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1072
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1072>
bq.  >
bq.  >     <0.90.6?

updated to 0.90.6, with the assumption that this feature will not make it there, (but hopefully in to a 0.90.7)


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2275
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line2275>
bq.  >
bq.  >     I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name.
bq.  >     
bq.  >     Also should messages go to STDOUT (out) and error go to STDERR (err)?

TODO -- I'll follow up on this after the next round.


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1053
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1053>
bq.  >
bq.  >     Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed?
bq.  >     
bq.  >     I'm thinking, once this method exists folks will eventually called for other reasons.

Currently, we needed this method to explicitly remove information from the Master's memory.  In the cases where this is used, I've "directly" removed data from meta (Delete into .META.) and closed the regions on region servers directly (HRegionInterface#closeRegion).

I haven't worked it out completely yet but it probably makes sense to fix closeRegion to properly add an param that will remove this in memory master state as well. I was under the gun get something working out, and now having accomplished this I'm definitely open to refactor this to make it saner and to clean this up more.


bq.  On 2012-01-14 05:43:38, Lars Hofhansl wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 90
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line90>
bq.  >
bq.  >     Nice documentation. This tool is awesome.

thanks!


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4384
-----------------------------------------------------------


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237354#comment-13237354 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-11 14:37:01, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879>
bq.  >
bq.  >     Can we name this option fixRegionHolesOnHdfs ?
bq.  >     It would be better to note which options can be run with cluster offline.
bq.  
bq.  jmhsieh wrote:
bq.      at the moment, hbck can only be run while hbase is online.  This has not been unified with OfflineMetaRebuild yet.

HBASE-5629


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5826
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193139#comment-13193139 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-01-25 17:24:41.277326)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

This version includes updates after testing against real online but idle clusters with real induced corruptions.  This was hbck was tested successfully against apache/0.90+this patch branch region servers and regionservers on cdh3u2 (an 0.90.4-based hbase without the new offline method).  

I'm going to post usage description and images I've created to explain this better on the JIRA.

High level changes in this rev.
- hbck now wraps calls to the offline method and will use unasssign if the target region server does not support offline.
- restructured hdfs integrity repairs into more phases -- when compound problems were present we'd get into a loop where orphan repair would cause new overlaps on a subsequent integrity repair iteration.  This new approach should be deterministic. The new phases are 1) Find hdfs holes and patch (post condition: no more holes), 2) adopt orphan hdfs regions  (post condition: no orphan data in hdfs) 3) reload and fix overlaps (precondition: no holes but overlaps possible; post condition: no overlaps).  Previously integrity repairs would interate doing all three until it converged (but this didn't always happen in practice!). 
- Added more command line options that allow this hbck to only attempt certain repairs (which is necessary to get overlap repairs to work more deterministically, and needed in to get non-offline supporting hbases to converge)
- Added a few more test cases for new corruptions.

One big caveat with this rev is that the hbase was online but idle (no writes happening).   It was also suggested that I need to worry about compactions when I close regions during overlap merging (JD -- I didn't see anything in OnlineMerge -- why wasn't this a concern there?).  If so, I'd like advice on how to add guards to protect the user (is a glaring warning message or requiring confirmation sufficient?).  I'm going to do some initial testing on online and active cases -- but ideally would like this to come in follow on jiras.  


Summary
-------

I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 

Diff: https://reviews.apache.org/r/3435/diff


Testing
-------

All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  

Not ready for commit.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236104#comment-13236104 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Ted,

Thanks for the mega review -- I know it must have taken a while.  This set of patches probably should have been broken up but has had a funny history ports going back and forth between 0.90, 0.92 and a lot of hacks while firefighting mode to get it working well enough.  

I'll getting tests passing again and deal with arcanist nits.  After that do you mind if I start filing the new set of follow on jiras and then commit?  There is plenty of follow on work but plenty of goodness in here too.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235979#comment-13235979 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6254
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13525>

    Logging would show user there is progress.


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237420#comment-13237420 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

Applied addendum to trunk so that Hadoop QA can function.
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198376#comment-13198376 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > We should deprecate clearRegionFromTransition().

done.  


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 202
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line202>
bq.  >
bq.  >     We should set interrupt flag.

replaced with Thread.getCurrentThread().interrupt();


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 197
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line197>
bq.  >
bq.  >     success is local variable.
bq.  >     Why don't we change return type to boolean and return its value ?

I've cleaned this up to reuse the connection from an HBaseAdmin.   v3 already has this update in some places -- this is one of the places missed.


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1636
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1636>
bq.  >
bq.  >     This TODO has been implemented, so we can remove it.

removed


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1131
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1131>
bq.  >
bq.  >     How about naming this method hasHdfsOnlyEdits() ?

renamed to containsOnlyHdfsEdits


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1081
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1081>
bq.  >
bq.  >     This sentence should be moved before ' from ...'

That code has been refactored in v3 but the message was a bit off.  I've updated it.


bq.  On 2012-01-25 18:01:32, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083>
bq.  >
bq.  >     We should handle potential exception from this method.
bq.  >     
bq.  >     Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC.

This was something that I noted that I was going to handle in the next rev -- checkout v3, I think it addresses the concern.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4591
-----------------------------------------------------------


On 2012-01-25 17:24:41, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-25 17:24:41)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237493#comment-13237493 ] 

Hudson commented on HBASE-5128:
-------------------------------

Integrated in HBase-0.92 #338 (See [https://builds.apache.org/job/HBase-0.92/338/])
    HBASE-5128 Addendum adds two missing new files (Revision 1304723)

     Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java

                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185373#comment-13185373 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

I still need to do some more actual clusters testing, but I'm going to post another version that solved problem #1 and #2 later tonight.

#1 -- added offline(byte[] regionname) method to master ipc interface.  
#2 -- added code to wait for region to exit RIT status before moving on.  Test doesn't seem flakey anymore. (all these tests seem to pass about 25 times in row now).

I really would like to have this in the 0.90.6 release if possible -- any complaints if I added some compatibility checks to see if it can use the new API is present and blare some some mean sounding warnings if you attempt to use the overlap fixing feature against a version that does not support it? (it will mostly work but likely require a hmaster restart to be "clean" again).


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228837#comment-13228837 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Lars I believe the port to 0.94.0 and 0.92.x are likely identical and nearly trivial and I was intending on doing it.  The initial version was also for 0.90.x  and a version for that will be ported as well since my crew will be supporting that version for a while.  I may try to do a 0.90.x release at some point.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184031#comment-13184031 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-01-11 12:46:37.524636)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

Fixed bug link.  Added JD.  

JD -- the code that is similar to merging is 

- #handleOverlapGroup
- inMeta && !inHdfs && isDeployed  (in another rev I've added an unassign and believe I still have the disable/delete problem).


Summary
-------

I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 

Diff: https://reviews.apache.org/r/3435/diff


Testing
-------

All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  

Not ready for commit.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235338#comment-13235338 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

trunk/0.94/0.92 versions pass full unit test suites, (or had flakies that passed locally).  

I plan on doing one final pass on these version to find and fix findbugs/arcanist nits.  
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Release Note: 
HBaseFsck (hbck) has been updated with new repair capabilities.  hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster.  Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance.  Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions.  Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems.  The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

Caveats:
* The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
* The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option.
* This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs.  The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.

  was:
HBaseFsck (hbck) has been updated with new repair capabilities.  hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster.  Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance.  Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions.  Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems.  The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

Caveats:
* The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
* The hbck '-fix' option is present but deprecated and replaced with -fixAssignments option.
* This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs.  The 0.90 version of he tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.

    
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185970#comment-13185970 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-01-13 23:01:49, Alex Newman wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1048
bq.  > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1048>
bq.  >
bq.  >     Lots of whitespaces

Yup, I'll get them in the next pass -- from my v2 comments, I still need to get a compatibility checking thing going on, and will get the new nits on that pass.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4375
-----------------------------------------------------------


On 2012-01-13 22:49:33, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-13 22:49:33)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-0.90-v2b.patch

Previous version accidentally included two  dev tools that are not part of this patch (ScrubMeta and DumpMeta).
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186312#comment-13186312 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

I think we should keep offline() and deprecate the other method. 
Let's remove that one in another Jira. 
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-5128 started by Jonathan Hsieh.

> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187134#comment-13187134 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

I've been testing using failed splits generated by cycling the hbase master while doing a heavy write load with a high split frequency prior to HBASE-5196 patch.  A subset of problems has been fixed automatically but it seems to be a class of  problems with splitting regions that isn't being handled properly.  This actually is probably the case we are most likely to encounter.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228848#comment-13228848 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

@Zhihong 

No problem -- I intend to address the reviews.  

Sorry about the test failures -- these are actually are related to HBASE-5563 -- I'll help chenhui there.  I've been in 0.92 and 0.90 land and then away for a little bit and didn't realize that a failure in medium skips all the large tests. (I fixed the medium and expected it to pass but then the large tests ran and failed).




                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235667#comment-13235667 ] 

Lars Hofhansl commented on HBASE-5128:
--------------------------------------

We'll be trying to get this into 0.94.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-0.90-v4.patch

Cleaned up 0.90 version.  Requires HBASE-5563 to pass consistently.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Attachment: hbase-5128-0.92-v2.patch
                hbase-5128-0.94-v2.patch

0.94 and 0.92 versions have minor tweaks from trunk version and in both cases TestHBaseFsck passes.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235830#comment-13235830 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6229
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13451>

    I think we should distinguish the return value in this case (0) from that returned on line 1515.
    See comment on line 1792



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13428>

    I suggest renaming holeStart as startRow and renaming holeStop as stopRow.
    Then you don't need the comment on 1700.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13434>

    Should include maxMerge in the log.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13439>

    I wonder whether we should bail if there have been two IOE's, one on 1759 and one here.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13440>

    'Creating' -> 'Created'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13442>

    Are newRegion and region representing the same entity ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13453>

    If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13456>

    Should say 'unable to get regions from master' or something similar



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13458>

    Please remove this.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13460>

    'with not' -> 'without'
    Should also include some info on the entry.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13459>

    Please remove this.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13464>

    Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/4280/#comment13465>

    Typo: maximum


- Ted


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

          Component/s: hbck
    Affects Version/s: 0.90.5
                       0.92.0
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237675#comment-13237675 ] 

Lars Hofhansl commented on HBASE-5128:
--------------------------------------

Thanks for getting this done for 0.94, Jon!
+1 on release notes and book update, but doesn't need to hold up 0.94rc
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237355#comment-13237355 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 07:11:34, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076>
bq.  >
bq.  >     I think we should handle RejectedExecutionException and re-submit the item.
bq.  
bq.  jmhsieh wrote:
bq.      Follow on issue.  Failing hard here is probably good, and the change here was just more logging.

HBASE-5632


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6213
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186124#comment-13186124 ] 

ramkrishna.s.vasudevan commented on HBASE-5128:
-----------------------------------------------

@Jon
You want this in 0.90.6 ? Actually i was planning to take a release cut by today? 
One more thing I was working on HBASE-5155 which changes some behaviour on Enable and Disable tables (in 0.90 branch). You can take a look at it(for your patch). I will check your patch also.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199967#comment-13199967 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Jimmy mentions this actually may be HBASE-4238.  
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186254#comment-13186254 ] 

ramkrishna.s.vasudevan commented on HBASE-5128:
-----------------------------------------------

@Jon
Let me check.  May be take your time Jon before getting it through.  Not a hurry.
May be we can take it in next release? pls don't mind.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5128:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519382/hbase-5128-0.92-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 21 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1252//console

This message is automatically generated.)
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226974#comment-13226974 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5823
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
<https://reviews.apache.org/r/4280/#comment12720>

    Can we deprecate this method in 0.94 and remove it in 0.96 ?


- Ted


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185944#comment-13185944 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
-----------------------------------------------------------

(Updated 2012-01-13 22:49:33.927353)


Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.


Changes
-------

Version 2

Solved problem 1 by adding a new method to the master -- offline -- which properly removes in-memory state from the master's assignmentManager (which allows disable table to work properly and drop table to work properly).  I haven't added api compatibility checks (to gracefully handle if this hbck is used on an 0.90.5 cluster) yet -- that will be in the next version of the patch.

Solved problem 2 by adding a waitUntilAssigned.  The tests were looped and consistently pass now.

This version now "sidelines" data instead of deleting data -- so in the case where repairs go badly there is still a good chance for some manual recovery.

Fixed a bunch of typo/spacing nits.. more to come.

I still need to do some testing on real clusters-- I'm going to use the bug from HBASE-5196 or manually inject failures to generate a problematic tables.

I also need to forward port to trunk/0.92.x.


Summary
-------

I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?


This addresses bug HBASE-5128.
    https://issues.apache.org/jira/browse/HBASE-5128


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
  src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
  src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
  src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
  src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 

Diff: https://reviews.apache.org/r/3435/diff


Testing
-------

All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  

Not ready for commit.


Thanks,

jmhsieh


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Release Note: 
HBaseFsck (hbck) has been updated with new repair capabilities.  hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster.  Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance.  Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions.  Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems.  The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

Caveats:
* The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
* The hbck '-fix' option is present but deprecated and replaced with -fixAssignments option.
* This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs.  The 0.90 version of he tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.

Updated release notes.  
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237474#comment-13237474 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Thanks Ted.  I've updated the rest.  Will do better next time. :)
                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227239#comment-13227239 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

bq. What was the reason for deprecating -fix ?
I guess -fixAll may take long time to execute now that hbck is able to fix various types of problems.
Otherwise it may be desirable to let -fix correct all the problems.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198588#comment-13198588 ] 

Jonathan Hsieh commented on HBASE-5128:
---------------------------------------

Luke,

Good suggestion.  I'll integrate that into the next revs.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182693#comment-13182693 ] 

Zhihong Yu commented on HBASE-5128:
-----------------------------------

For problem #1, I think AssignmentManager.unassign() needs to be modified - currently it only removes regions from internal map upon getting RemoteException.

For problem #2, I think hbck should wait. This scenario may happen in production.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228180#comment-13228180 ] 

stack commented on HBASE-5128:
------------------------------

The only issue that I can see -- the bulk of the patch is hbck stuff, changes in tests and in hbck package only -- is the addition to the master Interface where offline method is added.  It needs to be moved to the end of the Interface so we don't break rolling restart (moving to the end of the Interface may be aprophyal but IIRC, thats the way to add a method w/o breaking backward compatibility).  We should get this into 0.92 and 0.94 after trunk commit.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193154#comment-13193154 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4591
-----------------------------------------------------------


We should deprecate clearRegionFromTransition().


src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10238>

    I think a boolean return value would help determine the outcome of the action.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10237>

    This sentence should be moved before ' from ...'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10234>

    We should handle potential exception from this method.
    
    Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10240>

    I would expect a boolean return value since we may return without throwing exception (line 1125)



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10239>

    How about naming this method hasHdfsOnlyEdits() ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10233>

    This TODO has been implemented, so we can remove it.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
<https://reviews.apache.org/r/3435/#comment10232>

    More action is needed beyond a WARN message, right ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/3435/#comment10235>

    success is local variable.
    Why don't we change return type to boolean and return its value ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
<https://reviews.apache.org/r/3435/#comment10236>

    We should set interrupt flag.


- Ted


On 2012-01-25 17:24:41, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-25 17:24:41)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 
bq.  
bq.  Diff: https://reviews.apache.org/r/3435/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All unit tests pass sometimes.  Some fail sometimes (generally the cases that fabricate new regions).  
bq.  
bq.  Not ready for commit.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Doug Meil (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187948#comment-13187948 ] 

Doug Meil commented on HBASE-5128:
----------------------------------

Hey guys, a bunch of comments just wound up on a documentation ticket I just did (HBASE-5218) that I'm pretty sure were intended for this ticket.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.92.0, 0.90.5
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237356#comment-13237356 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435>
bq.  >
bq.  >     I think master.synchronousBalanceSwitch() is better candidate for this action.
bq.  
bq.  jmhsieh wrote:
bq.      I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this.

HBASE-5630


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554>
bq.  >
bq.  >     Can we do this in the current JIRA ?
bq.  >     
bq.  >     Why do we need to reload for every type of fix ?
bq.  
bq.  jmhsieh wrote:
bq.      I'd rather do it in a follow on issue.  Correctness first, then performance.  This patch is massive already.

HBASE-5628


bq.  On 2012-03-22 06:33:20, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702
bq.  > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702>
bq.  >
bq.  >     Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?
bq.  
bq.  jmhsieh wrote:
bq.      The guard makes this only executed once per table.  In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there -- in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.)

HBASE-5631


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review6208
-----------------------------------------------------------


On 2012-03-21 23:24:13, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 23:24:13)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Online automated repair of table integrity and region consistency problems
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235047#comment-13235047 ] 

jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------



bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > I went through about half of this patch.  Its plain that there have been a bunch of improvements.  There is some great stuff in here.  I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues.

Sorry for the delay.  I'll have a new patch that addresses these comments up shortly, and will focus on porting to 0.94/0.92/0.90 to take account of HBASE-5563, HBASE-5588, and HBASE-5589.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 169
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line169>
bq.  >
bq.  >     Good one.

comment removed due to update in HBASE-5563.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1025
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line1025>
bq.  >
bq.  >     We overloaded the method here?

This was a style thing -- I misread the method when I read it so I rewrote be more verbose but readable.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 96
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line96>
bq.  >
bq.  >     Add <p> here is you want the line between paragraphs to come out in javadoc.  You add a white space for each empty line.

html-ized javadoc.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 102
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line102>
bq.  >
bq.  >     that there 'are' no...

fixed


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 131
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line131>
bq.  >
bq.  >     s/handleful/handful/

fixed


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 135
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line135>
bq.  >
bq.  >     Capitalize 'replaces'

removed from here and fixed in usage


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 139
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line139>
bq.  >
bq.  >     Nice doc.  Helps.
bq.  >     
bq.  >     Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment.  They have a tendency to diverge.

removed flags and added link to the printUsageAndExit method.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 394
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line394>
bq.  >
bq.  >     Declare and assign in the one step?  As is, you declare two lines above and then assign it here

Done.  I likely reused the var a couple times in an earlier rev.


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 898
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line898>
bq.  >
bq.  >     Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469
bq.  >     
bq.  >     ?

changed to use FSUtil.getRootDir().


bq.  On 2012-03-12 23:35:46, Michael Stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365
bq.  > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line365>
bq.  >
bq.  >     This is 'destructive' in that it changes whats on hdfs?  If so, change the comment above.... it says 'determine'

changed to 'repair'


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4280/#review5860
-----------------------------------------------------------


On 2012-03-10 01:04:58, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4280/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-10 01:04:58)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.
bq.  
bq.  1) No trackHTD method needed since we can read from the file system.
bq.  2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
bq.  3) Fixed comparator in HRegionInfo
bq.  4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.
bq.  
bq.  I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.
bq.  
bq.  This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.      https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 
bq.    src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 
bq.  
bq.  Diff: https://reviews.apache.org/r/4280/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests cover many many situations and pass.  Most "live" testing has been done on 0.90.x versions.  Many improvements and features added from experience.  Not much testing live on the trunk versions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Jonathan Hsieh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-5128:
----------------------------------

    Fix Version/s: 0.96.0
                   0.94.0
                   0.92.2
                   0.90.7
    
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>
>         Attachments: hbase-5128-0.90-v2.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.94-v2.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228077#comment-13228077 ] 

Lars Hofhansl commented on HBASE-5128:
--------------------------------------

What's the feeling about 0.94 vs 0.96? It seems the changes are isolated enough to be not too risky for 0.94.
                
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5128
>                 URL: https://issues.apache.org/jira/browse/HBASE-5128
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: hbase-5128-trunk.patch
>
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations.  However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems.  This updated version should be able to handle all cases (including a new orphan regiondir case).  When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
>  * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
>  * table integrity.  
>  * 
>  * Region consistency checks verify that META, region deployment on
>  * region servers and the state of data in HDFS (.regioninfo files) all are in
>  * accordance. 
>  * 
>  * Table integrity checks verify that that all possible row keys can resolve to
>  * exactly one region of a table.  This means there are no individual degenerate
>  * or backwards regions; no holes between regions; and that there no overlapping
>  * regions. 
>  * 
>  * The general repair strategy works in these steps.
>  * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
>  * 2) Repair Region Consistency with META and assignments
>  * 
>  * For table integrity repairs, the tables their region directories are scanned
>  * for .regioninfo files.  Each table's integrity is then verified.  If there 
>  * are any orphan regions (regions with no .regioninfo files), or holes, new 
>  * regions are fabricated.  Backwards regions are sidelined as well as empty
>  * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
>  * a new region is created and all data is merged into the new region.  
>  * 
>  * Table integrity repairs deal solely with HDFS and can be done offline -- the
>  * hbase region servers or master do not need to be running.  These phase can be
>  * use to completely reconstruct the META table in an offline fashion. 
>  * 
>  * Region consistency requires three conditions -- 1) valid .regioninfo file 
>  * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
>  * and 3) a region is deployed only at the regionserver that is was assigned to.
>  * 
>  * Region consistency requires hbck to contact the HBase master and region
>  * servers, so the connect() must first be called successfully.  Much of the
>  * region consistency information is transient and less risky to repair.
>  */
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira