You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2011/10/03 20:11:35 UTC

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

    [ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119451#comment-13119451 ] 

jiraposter@reviews.apache.org commented on HBASE-4377:
------------------------------------------------------



bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > How long did it take to scan the cluster with 2700 inconsistencies ?
bq.  > I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time.

The cluster that had 12k total regions after clenaup. It took 2m to run (this was localdisk accesses).  I didn't feel that the runtime was something to be concerned about.  And I honestly hope this code doesn't get used too often!

We could use the same WorkItem trick to speed up the code but my feeling is that straightforward and correct is the right first step.  


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 374
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line374>
bq.  >
bq.  >     Better replace root with -ROOT-

done.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376>
bq.  >
bq.  >     b is not needed here, same with question mark.

k


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 391
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line391>
bq.  >
bq.  >     Please remove b and question mark.

k


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 307
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line307>
bq.  >
bq.  >     The .META. region is open upon return.
bq.  >     I think we should document this.

changed "live" to "open"


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 288
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line288>
bq.  >
bq.  >     It would be nice to log the path for the underlying region.
bq.  >     Otherwise what purpose does this catch/rethrow serve ?

nice catch.  Updated to include table name and path.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 309
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line309>
bq.  >
bq.  >     Looking at the usage below, maybe createNewRootAndMeta would be a better name.

done


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 352
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line352>
bq.  >
bq.  >     This log doesn't match the check above.
bq.  >     If we only produce Put for the first HbckInfo, we'd better declare that in the log.

updated error message and change behavior so that it bails out.  In this particular case, the invariant is checked before this method is called, but I'll just make it more explicit.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 356
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line356>
bq.  >
bq.  >     This would produce exception if his.size() == 0.

problem avoided with update.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line378>
bq.  >
bq.  >     Do you plan to do this in the next patch or in another JIRA ?
bq.  >     I haven't looked at the other JIRAs you mentioned, pardon me.

I'll file it as a follow-on jira.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 428
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line428>
bq.  >
bq.  >     Is there something we can do in case we get IOE from this call ?

added error logging and an attempt to revert.


bq.  On 2011-09-30 21:27:16, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 377
bq.  > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line377>
bq.  >
bq.  >     Better use boolean for return value to indicate success/failure.

done.


- jmhsieh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2126/#review2231
-----------------------------------------------------------


On 2011-09-30 00:02:16, jmhsieh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2126/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-30 00:02:16)
bq.  
bq.  
bq.  Review request for hbase, Michael Stack and Andrew Purtell.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
bq.  Author: Jonathan Hsieh <jo...@cloudera.com>
bq.  Date:   Wed Sep 28 10:18:11 2011 -0700
bq.  
bq.      HBASE-4377 [hbck] Offline rebuild .META. from fs data only
bq.      
bq.  
bq.  This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory.  It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.  
bq.  
bq.  Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.
bq.  
bq.  
bq.  This addresses bug HBASE-4377.
bq.      https://issues.apache.org/jira/browse/HBASE-4377
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2126/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's).  This was not actually used on a live cluster -- it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.
bq.  
bq.  Note also that this is not an automatic fix -- whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions.  It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.
bq.  
bq.  The tests demonstrate current expected behavior -- rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.


                
> [hbck] Offline rebuild .META. from fs data only.
> ------------------------------------------------
>
>                 Key: HBASE-4377
>                 URL: https://issues.apache.org/jira/browse/HBASE-4377
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>         Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch.  Users could move bad regions out until there is a clean rebuild.  
> It would likely fill in region split holes.  Follow on work could given options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira