You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/01/17 03:29:43 UTC

[jira] Created: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

ProcessServerShutdown fails if META moves, orphaning lots of regions
--------------------------------------------------------------------

                 Key: HBASE-3446
                 URL: https://issues.apache.org/jira/browse/HBASE-3446
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: Todd Lipcon
            Priority: Blocker


I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v9.txt

Improved retries exception reporting.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118936#comment-13118936 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
-----------------------------------------------------------

(Updated 2011-10-02 00:01:07.515888)


Review request for hbase and Jonathan Gray.


Changes
-------

Patch addresses Ted and Jon reviews.

I also redid the catalog package changes -- MetaReader/Editor and CatalogTracker -- so change is minimized; methods are cleanly added and none are removed, they are just deprecated, not unless the were private and are no longer used OR method has been moved out to the new MetaMigration class.  Hopefully this makes this patch easier to digest.


Summary
-------

Make the Meta* operations against meta retry.  We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop.  Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code).  Upped the retries on serverside too.

Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back.  This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables.  Opening new issue
to cutback CT use over the code base.

A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.

Here is some detail to help reviews.

M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
  Clean up.  Shutdown access on some of these unused methods.  Don't
  let out HRegionInterface instances in particular since we are going
  away from raw HRI use to instead use a connection with retries:
  i.e. HTable.

  Comments on state of this class. Javadoc edits.
  getZooKeeperWatcher on HConnection is deprecated so don't use it
  in constructor.  Override MetaNodeTracker and on node delete
  reset meta location (We used to do this over in MetaNodeTracker
  but to do that we had to have a CatalogTracker over in zk package
  which is silly -- bad package encapsulation).

  (waitForRootServer) Renamed getRootServerConnection and change it
  from public to package private.
  (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
  (getMetaServerConnection) Change from public to package private.
  Use MetaReader to read the meta location in root rather than a
  raw HRegionInterface so we get retrying.
  (remaining, timedout) Added utility methods.
  (waitForMetaServer) Changed from public to private.
  (resetMetaLocation) Made it synchronized on metaAvailable.
  Not all accesses were synchronized.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
  Refactor to use HTable instead of raw HRegionInterface so we get
  retrying.  For each operation we get an HTable, use it, then close it.
  (putToMetaTable, putsToMetaTable, etc) Utility methods.
  (updateRootWithMetaMigrationStatus, etc.) Moved out to own
  class since these classes are for a one-time migration only.
    
A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
  New class that holds all Meta* methods updating meta table used
  doing the one-time migration done to meta on startup.  This class
  is marked deprecated because its going to be dropped in 0.94.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
  Retrofit methods in here to use fullScan methods with Visitor.
  (getCatalogRegionInterface, getCatalogRegionNameForTable,
    getCatalogRegionNameForRegion) Removed.
  (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
  (fullScanOfResults) Renamed as fullScan override.
  (fullScanOfRoot) Added as deprecated. We should be doing
  this against zk.
  (metaRowToRegionPair, getServerNameFromResult) Moved to Result
  (CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
  Handle few cases where methods throw InterruptedException
  (Don't let it out on the HBaseAdmin public API)

M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
  on failure. Call ServerCallable connect AFTER beforeCall rather than
  ServerCallable.instantiateServer BEFORE beforeCall.

M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
  Add to DEBUG message the connection name we were using.

M src/main/java/org/apache/hadoop/hbase/client/Result.java
  (getServerNameFromCatalogResult, parseCatalogResult,
    parseHRegionInfoFromCatalogResult) Added

M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
  Added new ThrowableWithExtraContext that takes extra context info.

M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
  instantiateServer renamed as connect

M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
  Javadoc.  Renamed instantiateServer as connect.

M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Javadoc. Use MetaReader method instead of handcoding.

M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  Allow hris can come back null when we ask for table regions.

M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
  Remove import of CatalogTracker.

M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
  Use utility in MetaReader instead of handcode it.

M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
  Use new HConnectionTestingUtility mocking tests (need to use it
  because its a bit harder mocking tests now that we use HTable instead
  of the more direct HRegionInterface).
  Add some tests of broken out utility methods.

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
  Add tests

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
  Add test of 3669 retrying.

M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
  New test utility that helps with mock of HConnection making it so can mock
  an HConnection and then have an HTable use the mocked connection.  Can do
  a mock or a spied on HConnection

M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
  The migration code moved.  Reference new location.

M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
  Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.


This addresses bug hbase-3446.
    https://issues.apache.org/jira/browse/hbase-3446


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 
  src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939 
  src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945 
  src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
  src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
  src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 
  src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
  src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b 
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 
  src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
  src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
  src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
  src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
  src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 

Diff: https://reviews.apache.org/r/2065/diff


Testing
-------

All tests passed recently.  Rerunning again.


Thanks,

Michael


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982453#action_12982453 ] 

Todd Lipcon commented on HBASE-3446:
------------------------------------

After digging through the logs, I found the following:

2011-01-16 18:03:26,164 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and split region usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167.; checking daughter presence
2011-01-16 18:03:26,169 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
        at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2360)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1754)
...
        at $Proxy6.openScanner(Unknown Source)
        at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:260)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.isDaughterMissing(ServerShutdownHandler.java:256)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughter(ServerShutdownHandler.java:214)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughters(ServerShutdownHandler.java:196)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.processDeadRegion(ServerShutdownHandler.java:181)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:151)

Neither the MetaReader code nor the ServerShutdown handler has any kind of retry/blocking behavior built in here. So many of the regions on the server were left unassigned.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119406#comment-13119406 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------



bq.  On 2011-10-03 01:13:26, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 149
bq.  > <https://reviews.apache.org/r/2065/diff/2/?file=47168#file47168line149>
bq.  >
bq.  >     This is no longer true - see the fourth parameter below.

I can fix on commit


bq.  On 2011-10-03 01:13:26, Ted Yu wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 152
bq.  > <https://reviews.apache.org/r/2065/diff/2/?file=47168#file47168line152>
bq.  >
bq.  >     Out of date javadoc.

I can fix on commit.


I ran all tests and TestAdmin and TestMergeTool failed but running them individually, they passed.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2258
-----------------------------------------------------------


On 2011-10-02 00:01:07, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-02 00:01:07)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Fix Version/s: 0.90.1

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v12.txt

Testing on cluster found an NPE in a log message.  v12 also added a bit of info to other log messages.  Want to test more before commit.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446.txt

This is a start.  Not done yet.  Lots of javadoc of ServerCallable to explain what its about.  MetaReader partially done.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116024#comment-13116024 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2124
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
<https://reviews.apache.org/r/2065/#comment4913>

    This doesn't seem right.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115804#comment-13115804 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2106
-----------------------------------------------------------


Only part way done, will finish in the afternoon.  I like the idea though, good stuff stack.


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4855>

    Supposed to read "When meta is moved to zk"?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4857>

    this comment talks a lot about what is wrong but it's not clear to me what changes are actually made right now.  i see you say server-side only, but what do you propose instead?  (i imagine i will find out reading the rest of the diff)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4858>

    update this javadoc a bit... it's missing conf and you might also add additional context to stuff like abortable (which now appears optional and falls back to the connection itself?)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4859>

    when would one override this?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4860>

    is the behavior of this method unchanged?  i guess now it returns before it's verified?  any specific reason for the name change?  (its behavior is definitely different from the old getRootServerConnection()).  is it to match the Meta method?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4861>

    i thought no verification in CT?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4862>

    this is public but one with specified timeout is private now?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4863>

    eeek, good catch


- Jonathan


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115394#comment-13115394 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2084
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4732>

    Log should be changed accordingly.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v14.txt

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119118#comment-13119118 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2258
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment5236>

    This is no longer true - see the fourth parameter below.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment5237>

    Out of date javadoc.


- Ted


On 2011-10-02 00:01:07, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-02 00:01:07)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120711#comment-13120711 ] 

Jonathan Gray commented on HBASE-3446:
--------------------------------------

I've grasped most of the change and this is clearly a significant improvement.  Let's get it in!

+1 on latest patch up on RB if tests are passing.  TestMergeTool also fails on occasion for me.

Nice work stack!

You're thinking CatalogTracker follow-up in 0.94 w/ ROOT removal perhaps?
                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v2.txt

Some more progress.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982812#action_12982812 ] 

stack commented on HBASE-3446:
------------------------------

OK, this is the one you figured.  Way back, there was some argument for why HTable could not be used in place of HCM.  I'm not sure what it was now (or then really); but I just bought it.

Is this worse than 0.89?  There does not seem to be any retry facility in the scan of meta done in shutdown server handling there either? (Not that this makes the bug less severe... I'm just trying to talk about whether it should hold up 0.90.0).

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118951#comment-13118951 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2248
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment5213>

    This visitor can be merged with the visitor in updateMetaWithNewRegionInfo() after refactoring.
    The only difference between them is the boolean parameter to updateHRI().



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment5214>

    The checking here seems fragile.
    I know it is in current code base. So maybe extract the String in another JIRA.


- Ted


On 2011-10-02 00:01:07, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-02 00:01:07)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446v15.txt

Fixed TestCatalogTracker

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115592#comment-13115592 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2097
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4761>

    I was trying to minimize how many times we do System.currentTimeMillis



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4762>

    Agreed



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4764>

    Will fix w/ a comment.  If timeout  is 0, then we do not timeout.  I should call it out explicitly.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082959#comment-13082959 ] 

stack commented on HBASE-3446:
------------------------------

Running org.apache.hadoop.hbase.master.TestCatalogJanitor
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.683 sec <<< FAILURE!
Running org.apache.hadoop.hbase.io.TestHeapSize
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.187 sec <<< FAILURE!

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-3446:
----------------------------

    Assignee: stack

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127253#comment-13127253 ] 

Hudson commented on HBASE-3446:
-------------------------------

Integrated in HBase-0.92 #64 (See [https://builds.apache.org/job/HBase-0.92/64/])
    HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions
HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions

stack : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/Result.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
* /hbase/branches/0.92/src/main/ruby/hbase/admin.rb
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/branches/0.92/src/test/ruby/hbase/admin_test.rb
* /hbase/branches/0.92/src/test/ruby/shell/shell_test.rb

                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Fix Version/s:     (was: 0.90.1)
                   0.90.2

Moving out of 0.90.1.  I won't have time to spend testing this patch more before tomorrow evening, the cut-off point.  The patch looks to be working properly but I keep tripping over other issues that I have to follow to make sure this patch is not the cause.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.2
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Status: Patch Available  (was: Open)
    
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v7.txt

Still not done.  My fancy new unit test is turning up issues.  Still hunting them down.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115391#comment-13115391 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2083
-----------------------------------------------------------


Thanks for the cleanup.


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4725>

    Back to future :-)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4727>

    Nice.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4726>

    Extra so.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4730>

    Do we need to update now and pass to the helper methods ?
    The helper methods can easily figure out what now should be.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4729>

    I think > would be enough.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4731>

    I am confused by the condition here.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v11.txt

Here is version of patch that incorporates Jons' review done over on review board.  Testing now on cluster to see if cluster issues before committing.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v11.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988643#comment-12988643 ] 

stack commented on HBASE-3446:
------------------------------

Just to say that I put up latest patch on review: https://review.cloudera.org/r/1499/

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115728#comment-13115728 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2099
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4835>

    The condition, now < stopTime, is reversed for isTimedOut().


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446v23.txt

Here is what I just committed to 0.92 and trunk.
                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982821#action_12982821 ] 

Todd Lipcon commented on HBASE-3446:
------------------------------------

In 0.89 we use RetryableMetaOperation.doWithRetries() so I think it would continue to retry until successful. 

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115261#comment-13115261 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
-----------------------------------------------------------

Review request for hbase and Jonathan Gray.


Summary
-------

Make the Meta* operations against meta retry.  We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop.  Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code).  Upped the retries on serverside too.

Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back.  This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables.  Opening new issue
to cutback CT use over the code base.

A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.

Here is some detail to help reviews.

M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
  Clean up.  Shutdown access on some of these unused methods.  Don't
  let out HRegionInterface instances in particular since we are going
  away from raw HRI use to instead use a connection with retries:
  i.e. HTable.

  Comments on state of this class. Javadoc edits.
  getZooKeeperWatcher on HConnection is deprecated so don't use it
  in constructor.  Override MetaNodeTracker and on node delete
  reset meta location (We used to do this over in MetaNodeTracker
  but to do that we had to have a CatalogTracker over in zk package
  which is silly -- bad package encapsulation).

  (waitForRootServer) Renamed getRootServerConnection and change it
  from public to package private.
  (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
  (getMetaServerConnection) Change from public to package private.
  Use MetaReader to read the meta location in root rather than a
  raw HRegionInterface so we get retrying.
  (remaining, timedout) Added utility methods.
  (waitForMetaServer) Changed from public to private.
  (resetMetaLocation) Made it synchronized on metaAvailable.
  Not all accesses were synchronized.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
  Refactor to use HTable instead of raw HRegionInterface so we get
  retrying.  For each operation we get an HTable, use it, then close it.
  (putToMetaTable, putsToMetaTable, etc) Utility methods.
  (updateRootWithMetaMigrationStatus, etc.) Moved out to own
  class since these classes are for a one-time migration only.
    
A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
  New class that holds all Meta* methods updating meta table used
  doing the one-time migration done to meta on startup.  This class
  is marked deprecated because its going to be dropped in 0.94.

M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
  Retrofit methods in here to use fullScan methods with Visitor.
  (getCatalogRegionInterface, getCatalogRegionNameForTable,
    getCatalogRegionNameForRegion) Removed.
  (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
  (fullScanOfResults) Renamed as fullScan override.
  (fullScanOfRoot) Added as deprecated. We should be doing
  this against zk.
  (metaRowToRegionPair, getServerNameFromResult) Moved to Result
  (CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
  Handle few cases where methods throw InterruptedException
  (Don't let it out on the HBaseAdmin public API)

M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
  on failure. Call ServerCallable connect AFTER beforeCall rather than
  ServerCallable.instantiateServer BEFORE beforeCall.

M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
  Add to DEBUG message the connection name we were using.

M src/main/java/org/apache/hadoop/hbase/client/Result.java
  (getServerNameFromCatalogResult, parseCatalogResult,
    parseHRegionInfoFromCatalogResult) Added

M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
  Added new ThrowableWithExtraContext that takes extra context info.

M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
  instantiateServer renamed as connect

M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
  Javadoc.  Renamed instantiateServer as connect.

M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Javadoc. Use MetaReader method instead of handcoding.

M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
  Handle InterruptedException

M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  Allow hris can come back null when we ask for table regions.

M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
  Remove import of CatalogTracker.

M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
  Use utility in MetaReader instead of handcode it.

M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
  Use new HConnectionTestingUtility mocking tests (need to use it
  because its a bit harder mocking tests now that we use HTable instead
  of the more direct HRegionInterface).
  Add some tests of broken out utility methods.

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
  Add tests

M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
  Add test of 3669 retrying.

M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
  New test utility that helps with mock of HConnection making it so can mock
  an HConnection and then have an HTable use the mocked connection.  Can do
  a mock or a spied on HConnection

M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
  The migration code moved.  Reference new location.

M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
  Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.


This addresses bug hbase-3446.
    https://issues.apache.org/jira/browse/hbase-3446


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
  src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
  src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
  src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
  src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
  src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
  src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
  src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
  src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
  src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
  src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
  src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
  src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
  src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
  src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
  src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 

Diff: https://reviews.apache.org/r/2065/diff


Testing
-------

All tests passed recently.  Rerunning again.


Thanks,

Michael


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

      Resolution: Fixed
    Release Note: Makes catalog/* classes retry: e.g. MetaEditor, MetaReader and CatalogTracker.  Previously they would try once and unless successful, fail.  Retrying is courtesy of HTable instances.
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Got all tests to pass, eventually.

A bunch of tests were failing because the waitForMeta just hung on the meta-is-available boolean on master startup waiting for some background thread to set it true when meta had been set.  This was fine in old days when we'd go get an HRegionInterface to the .META. and try and ensure it is in its wherever location with verifies over the HRegionInterface instances (with no retries) but now we don't do such primitives, we've gone up the stack, and have HTables/HConnections do search and 'verify' of meta for us.  We need to run a connection get to know if meta is available (if it is available, the magic atomicboolean gets set).

Other miscellaneous stuff like testshell was failing for me because couldn't find cluster -- need to set it with the cluster's configuration.

Moved more of the meta migration code into the MetaMigrationRemoveHTD class rather than have it spread all about.

Changed the LocalHBaseCluster#join method so it uses the old threaddumping join which will dump out a thread dump if we are waiting on something > 60 seconds to finish.  Helped me debug a few tests here.

Otherwise, was what was up on rb.
                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115818#comment-13115818 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2107
-----------------------------------------------------------


Thanks for reviews Ted and Jon.  Will put up new patch when you fellas finish...


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4864>

    @Jon True.  I opened another issue with suggested fix.  I should at least reference it in here.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4865>

    Nah.  Thats TODO.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072994#comment-13072994 ] 

stack commented on HBASE-3446:
------------------------------

So, I need to update the last patch here and work on the failures seen. I want to write a test too to prove that we have retries after this patch goes in.  

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126130#comment-13126130 ] 

stack commented on HBASE-3446:
------------------------------

Thanks for +1s.  Some tests are failing though so thats why I've yet to commit it.  Working on it.
                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Fix Version/s:     (was: 0.90.2)
                   0.92.0

Moving out of 0.90.  Too big a change after all is said and done to put against a patch release.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v3.txt

Most of conversions are done.  Testing now.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446-v3.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128382#comment-13128382 ] 

Hudson commented on HBASE-3446:
-------------------------------

Integrated in HBase-TRUNK #2325 (See [https://builds.apache.org/job/HBase-TRUNK/2325/])
    HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
* /hbase/trunk/src/main/ruby/hbase/admin.rb
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/trunk/src/test/ruby/hbase/admin_test.rb
* /hbase/trunk/src/test/ruby/shell/shell_test.rb

                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116019#comment-13116019 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------



bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > There is a lot of excellence in here.  I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used.  I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff.
bq.  > 
bq.  > Great work!

Sorry about that.  Let me get you better answer to your question.  I think its not very clear because I myself was unclear on scope of CT when I started in.  What this patch has here is an attempt at shutting down CT scope with subsequent work put off for HBASE-4495.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 513
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513>
bq.  >
bq.  >     maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard

Will do


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 575
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575>
bq.  >
bq.  >     verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server)

Good point.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194>
bq.  >
bq.  >     i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :)

In the above, we'd get the HRegionInterface and do the invocation on the actual Interface.  The alternative steps back and asks an HTable instance to do the work.  If an issue with former we'd just let the exception out.  In the alternative, we'll do HTable retries before we let the exception out (and the retries are boosted in server-context).


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, line 2
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2>
bq.  >
bq.  >     missing copyright and year?

Turns out that copyright is not actually needed https://issues.apache.org/jira/browse/HBASE-3870


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568>
bq.  >
bq.  >     seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result

OK.  It was kinda nice being able to do result.getServerNameFromCatalogResult.  I suppose it does pollute.  I can move it back to MetaReader since that seems like next best place.  You are right shouldn't be generally public stuff.  Will fix.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70>
bq.  >
bq.  >     this seems like an important public method.  i like the rename and your additional comments, but maybe we should add more.  default behavior is to use a cached location, if one is not found, it is looked up in a catalog.  setting reload to true bypasses the cache and forces the lookup to a catalog.  and then, under what cases do we get an exception?  does this verify that the server is actually hosting the region?  or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 1041-1047
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041>
bq.  >
bq.  >     why do you remove the javadoc on this method?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, line 132
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132>
bq.  >
bq.  >     huh? :)

Let me fix.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java, line 52
bq.  > <https://reviews.apache.org/r/2065/diff/1/?file=45932#file45932line52>
bq.  >
bq.  >     30,000 ft desc?  i guess test name is self descriptive? :)

Will do.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
-----------------------------------------------------------


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116008#comment-13116008 ] 

jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
-----------------------------------------------------------


There is a lot of excellence in here.  I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used.  I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff.

Great work!


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4873>

    maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4874>

    verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server)



src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
<https://reviews.apache.org/r/2065/#comment4875>

    i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :)



src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
<https://reviews.apache.org/r/2065/#comment4876>

    same here!  nice  (old stuff looks ripe with race conditions)



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment4877>

    missing copyright and year?



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment4878>

    nice moving cruft to separate classes



src/main/java/org/apache/hadoop/hbase/client/Result.java
<https://reviews.apache.org/r/2065/#comment4881>

    seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result



src/main/java/org/apache/hadoop/hbase/client/Result.java
<https://reviews.apache.org/r/2065/#comment4882>

    yeah, shouldn't this be in MetaReader or some such class?



src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
<https://reviews.apache.org/r/2065/#comment4895>

    missed some whitespace



src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
<https://reviews.apache.org/r/2065/#comment4896>

    nice



src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
<https://reviews.apache.org/r/2065/#comment4897>

    this seems like an important public method.  i like the rename and your additional comments, but maybe we should add more.  default behavior is to use a cached location, if one is not found, it is looked up in a catalog.  setting reload to true bypasses the cache and forces the lookup to a catalog.  and then, under what cases do we get an exception?  does this verify that the server is actually hosting the region?  or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/2065/#comment4898>

    why do you remove the javadoc on this method?



src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
<https://reviews.apache.org/r/2065/#comment4899>

    not even necessary to put this method in here at all now (we're just using it for getting the node name at this point but it's probably still nice to have the name in stacks and such)



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4900>

    yay!  <3



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4901>

    huh? :)



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4902>

    awesome



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
<https://reviews.apache.org/r/2065/#comment4903>

    30,000 ft desc?  i guess test name is self descriptive? :)


- Jonathan


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.    Clean up.  Shutdown access on some of these unused methods.  Don't
bq.    let out HRegionInterface instances in particular since we are going
bq.    away from raw HRI use to instead use a connection with retries:
bq.    i.e. HTable.
bq.  
bq.    Comments on state of this class. Javadoc edits.
bq.    getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.    in constructor.  Override MetaNodeTracker and on node delete
bq.    reset meta location (We used to do this over in MetaNodeTracker
bq.    but to do that we had to have a CatalogTracker over in zk package
bq.    which is silly -- bad package encapsulation).
bq.  
bq.    (waitForRootServer) Renamed getRootServerConnection and change it
bq.    from public to package private.
bq.    (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.    (getMetaServerConnection) Change from public to package private.
bq.    Use MetaReader to read the meta location in root rather than a
bq.    raw HRegionInterface so we get retrying.
bq.    (remaining, timedout) Added utility methods.
bq.    (waitForMetaServer) Changed from public to private.
bq.    (resetMetaLocation) Made it synchronized on metaAvailable.
bq.    Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.    Refactor to use HTable instead of raw HRegionInterface so we get
bq.    retrying.  For each operation we get an HTable, use it, then close it.
bq.    (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.    (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.    class since these classes are for a one-time migration only.
bq.      
bq.  A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.    New class that holds all Meta* methods updating meta table used
bq.    doing the one-time migration done to meta on startup.  This class
bq.    is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.    Retrofit methods in here to use fullScan methods with Visitor.
bq.    (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.      getCatalogRegionNameForRegion) Removed.
bq.    (fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.    (fullScanOfResults) Renamed as fullScan override.
bq.    (fullScanOfRoot) Added as deprecated. We should be doing
bq.    this against zk.
bq.    (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq.    (CollectAllVisitor) Added
bq.  M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq.    Handle few cases where methods throw InterruptedException
bq.    (Don't let it out on the HBaseAdmin public API)
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.    Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq.    on failure. Call ServerCallable connect AFTER beforeCall rather than
bq.    ServerCallable.instantiateServer BEFORE beforeCall.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq.    Add to DEBUG message the connection name we were using.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq.    (getServerNameFromCatalogResult, parseCatalogResult,
bq.      parseHRegionInfoFromCatalogResult) Added
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq.    Added new ThrowableWithExtraContext that takes extra context info.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq.    instantiateServer renamed as connect
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq.    Javadoc.  Renamed instantiateServer as connect.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.    Javadoc. Use MetaReader method instead of handcoding.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq.    Handle InterruptedException
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq.    Allow hris can come back null when we ask for table regions.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq.    Remove import of CatalogTracker.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq.    Use utility in MetaReader instead of handcode it.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq.    Use new HConnectionTestingUtility mocking tests (need to use it
bq.    because its a bit harder mocking tests now that we use HTable instead
bq.    of the more direct HRegionInterface).
bq.    Add some tests of broken out utility methods.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq.    Add tests
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq.    Add test of 3669 retrying.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq.    New test utility that helps with mock of HConnection making it so can mock
bq.    an HConnection and then have an HTable use the mocked connection.  Can do
bq.    a mock or a spied on HConnection
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq.    The migration code moved.  Reference new location.
bq.  
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq.  M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq.    Was waiting on wrong events.  Was waiting on Opens rather than Splits. Fix.
bq.  
bq.  
bq.  This addresses bug hbase-3446.
bq.      https://issues.apache.org/jira/browse/hbase-3446
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION 
bq.    src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c 
bq.    src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 
bq.    src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf 
bq.    src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 
bq.    src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d 
bq.    src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 
bq.    src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe 
bq.    src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 
bq.    src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 
bq.    src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 
bq.    src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 
bq.    src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 
bq.    src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 
bq.    src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e 
bq.    src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 
bq.  
bq.  Diff: https://reviews.apache.org/r/2065/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests passed recently.  Rerunning again.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.


                
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983016#action_12983016 ] 

stack commented on HBASE-3446:
------------------------------

As you said up on IRC, we actually weren't doing this fixup in shutdown handling previously.  Its new facility in 0.90.0.  In 0.89, the basescanner would try and do this fix up everytime it ran so I suppose it'd fail and then come along later and probably succeed.

So, we've just read .META. in shutdown handling and now we're processing it.  Part of processing it is this new check on daughters.  IF the .META. goes down after we just successfully scanned it and before we've finished walking the result to check on daughters, then we run into this issue.  Seems rare enough?

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3446:
-------------------------------

    Component/s: master

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v13.txt

Update to match TRUNK.  Compiles.  Running tests and cleaning up patch.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3446:
-------------------------

    Attachment: 3446-v4.txt

More fixes for tests.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082953#comment-13082953 ] 

stack commented on HBASE-3446:
------------------------------

Just following failed so far:

Running org.apache.hadoop.hbase.catalog.TestCatalogTracker
Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.373 sec <<< FAILURE!

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira