You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2011/09/27 08:39:14 UTC

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115261#comment-13115261 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
-----------------------------------------------------------

Review request for hbase and Jonathan Gray.


Summary
-------

Make the Meta* operations against meta retry.  We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop.  Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code).  Upped the retries on serverside too.

Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back.  This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables.  Opening new issue
to cutback CT use over the code base.

A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.

Here is some detail to help reviews.

M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
  Clean up.  Shutdown access on some of these unused methods.  Don't
  let out HRegionInterface instances in particular since we are going
  away from raw HRI use to instead use a connection with retries:
  i.e. HTable.

  Comments on state of this class. Javadoc edits.
  getZooKeeperWatcher on HConnection is deprecated so don't use it
  in constructor.  Override MetaNodeTracker and on node delete
  reset meta location (We used to do this over in MetaNodeTracker
  but to do that we had to have a CatalogTracker over in zk package
  which is silly -- bad package encapsulation).

  (waitForRootServer) Renamed getRootServerConnection and change it
  from public to package private.
  (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
  (getMetaServerConnection) Change from public to package private.
  Use MetaReader to read the meta location in root rather than a
  raw HRegionInterface so we get retrying.
  (remaining, timedout) Added utility methods.
  (waitForMetaServer) Changed from public to private.
  (resetMetaLocation) Made it synchronized on metaAvailable.
  Not all accesses were synchronized.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
  Refactor to use HTable instead of raw HRegionInterface so we get
  retrying.  For each operation we get an HTable, use it, then close it.
  (putToMetaTable, putsToMetaTable, etc) Utility methods.
  (updateRootWithMetaMigrationStatus, etc.) Moved out to own
  class since these classes are for a one-time migration only.
    
A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
  New class that holds all Meta* methods updating meta table used
  doing the one-time migration done to meta on startup.  This class
  is marked deprecated because its going to be dropped in 0.94.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
  Retrofit methods in here to use fullScan methods with Visitor.
  (getCatalogRegionInterface, getCatalogRegionNameForTable,
    getCatalogRegionNameForRegion) Removed.
  (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
  (fullScanOfResults) Renamed as fullScan override.
  (fullScanOfRoot) Added as deprecated. We should be doing
  this against zk.
  (metaRowToRegionPair, getServerNameFromResult) Moved to Result
  (CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
  Handle few cases where methods throw InterruptedException
  (Don't let it out on the HBaseAdmin public API)

M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
  on failure. Call ServerCallable connect AFTER beforeCall rather than
  ServerCallable.instantiateServer BEFORE beforeCall.

M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
  Add to DEBUG message the connection name we were using.

M src/main/java/org/apache/hadoop/hbase/client/Result.java
  (getServerNameFromCatalogResult, parseCatalogResult,
    parseHRegionInfoFromCatalogResult) Added

M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
  Added new ThrowableWithExtraContext that takes extra context info.

M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
  instantiateServer renamed as connect

M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
  Javadoc.  Renamed instantiateServer as connect.

M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Javadoc. Use MetaReader method instead of handcoding.

M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  Allow hris can come back null when we ask for table regions.

M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
  Remove import of CatalogTracker.

M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
  Use utility in MetaReader instead of handcode it.

M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
  Use new HConnectionTestingUtility mocking tests (need to use it
  because its a bit harder mocking tests now that we use HTable instead
  of the more direct HRegionInterface).
  Add some tests of broken out utility methods.

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
  Add tests

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
  Add test of 3669 retrying.

M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
  New test utility that helps with mock of HConnection making it so can mock
  an HConnection and then have an HTable use the mocked connection.  Can do
  a mock or a spied on HConnection

M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
  The migration code moved.  Reference new location.

M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
  Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.


This addresses bug hbase-3446.
    https://issues.apache.org/jira/browse/hbase-3446


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
  src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
  src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
  src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
  src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
  src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
  src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
  src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
  src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
  src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
  src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
  src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
  src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
  src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 

Diff: https://reviews.apache.org/r/2065/diff


Testing
-------

All tests passed recently.  Rerunning again.


Thanks,

Michael


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira