You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/01/17 03:29:43 UTC
[jira] Created: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
ProcessServerShutdown fails if META moves, orphaning lots of regions
--------------------------------------------------------------------
Key: HBASE-3446
URL: https://issues.apache.org/jira/browse/HBASE-3446
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v9.txt
Improved retries exception reporting.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118936#comment-13118936 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
-----------------------------------------------------------
(Updated 2011-10-02 00:01:07.515888)
Review request for hbase and Jonathan Gray.
Changes
-------
Patch addresses Ted and Jon reviews.
I also redid the catalog package changes -- MetaReader/Editor and CatalogTracker -- so change is minimized; methods are cleanly added and none are removed, they are just deprecated, not unless the were private and are no longer used OR method has been moved out to the new MetaMigration class. Hopefully this makes this patch easier to digest.
Summary
-------
Make the Meta* operations against meta retry. We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop. Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code). Upped the retries on serverside too.
Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back. This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables. Opening new issue
to cutback CT use over the code base.
A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.
Here is some detail to help reviews.
M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
Clean up. Shutdown access on some of these unused methods. Don't
let out HRegionInterface instances in particular since we are going
away from raw HRI use to instead use a connection with retries:
i.e. HTable.
Comments on state of this class. Javadoc edits.
getZooKeeperWatcher on HConnection is deprecated so don't use it
in constructor. Override MetaNodeTracker and on node delete
reset meta location (We used to do this over in MetaNodeTracker
but to do that we had to have a CatalogTracker over in zk package
which is silly -- bad package encapsulation).
(waitForRootServer) Renamed getRootServerConnection and change it
from public to package private.
(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
(getMetaServerConnection) Change from public to package private.
Use MetaReader to read the meta location in root rather than a
raw HRegionInterface so we get retrying.
(remaining, timedout) Added utility methods.
(waitForMetaServer) Changed from public to private.
(resetMetaLocation) Made it synchronized on metaAvailable.
Not all accesses were synchronized.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
Refactor to use HTable instead of raw HRegionInterface so we get
retrying. For each operation we get an HTable, use it, then close it.
(putToMetaTable, putsToMetaTable, etc) Utility methods.
(updateRootWithMetaMigrationStatus, etc.) Moved out to own
class since these classes are for a one-time migration only.
A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
New class that holds all Meta* methods updating meta table used
doing the one-time migration done to meta on startup. This class
is marked deprecated because its going to be dropped in 0.94.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
Retrofit methods in here to use fullScan methods with Visitor.
(getCatalogRegionInterface, getCatalogRegionNameForTable,
getCatalogRegionNameForRegion) Removed.
(fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
(fullScanOfResults) Renamed as fullScan override.
(fullScanOfRoot) Added as deprecated. We should be doing
this against zk.
(metaRowToRegionPair, getServerNameFromResult) Moved to Result
(CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Handle few cases where methods throw InterruptedException
(Don't let it out on the HBaseAdmin public API)
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
on failure. Call ServerCallable connect AFTER beforeCall rather than
ServerCallable.instantiateServer BEFORE beforeCall.
M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
Add to DEBUG message the connection name we were using.
M src/main/java/org/apache/hadoop/hbase/client/Result.java
(getServerNameFromCatalogResult, parseCatalogResult,
parseHRegionInfoFromCatalogResult) Added
M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
Added new ThrowableWithExtraContext that takes extra context info.
M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
instantiateServer renamed as connect
M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
Javadoc. Renamed instantiateServer as connect.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Javadoc. Use MetaReader method instead of handcoding.
M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
Handle InterruptedException
M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
Handle InterruptedException
M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
Allow hris can come back null when we ask for table regions.
M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
Remove import of CatalogTracker.
M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
Use utility in MetaReader instead of handcode it.
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
Use new HConnectionTestingUtility mocking tests (need to use it
because its a bit harder mocking tests now that we use HTable instead
of the more direct HRegionInterface).
Add some tests of broken out utility methods.
M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
Add tests
M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
Add test of 3669 retrying.
M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
New test utility that helps with mock of HConnection making it so can mock
an HConnection and then have an HTable use the mocked connection. Can do
a mock or a spied on HConnection
M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
The migration code moved. Reference new location.
M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
This addresses bug hbase-3446.
https://issues.apache.org/jira/browse/hbase-3446
Diffs (updated)
-----
src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a
src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329
src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945
src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c
src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84
src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232
src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83
src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b
src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f
src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
Diff: https://reviews.apache.org/r/2065/diff
Testing
-------
All tests passed recently. Rerunning again.
Thanks,
Michael
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982453#action_12982453 ]
Todd Lipcon commented on HBASE-3446:
------------------------------------
After digging through the logs, I found the following:
2011-01-16 18:03:26,164 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and split region usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167.; checking daughter presence
2011-01-16 18:03:26,169 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2360)
at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1754)
...
at $Proxy6.openScanner(Unknown Source)
at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:260)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.isDaughterMissing(ServerShutdownHandler.java:256)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughter(ServerShutdownHandler.java:214)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughters(ServerShutdownHandler.java:196)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.processDeadRegion(ServerShutdownHandler.java:181)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:151)
Neither the MetaReader code nor the ServerShutdown handler has any kind of retry/blocking behavior built in here. So many of the regions on the server were left unassigned.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119406#comment-13119406 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
bq. On 2011-10-03 01:13:26, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 149
bq. > <https://reviews.apache.org/r/2065/diff/2/?file=47168#file47168line149>
bq. >
bq. > This is no longer true - see the fourth parameter below.
I can fix on commit
bq. On 2011-10-03 01:13:26, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 152
bq. > <https://reviews.apache.org/r/2065/diff/2/?file=47168#file47168line152>
bq. >
bq. > Out of date javadoc.
I can fix on commit.
I ran all tests and TestAdmin and TestMergeTool failed but running them individually, they passed.
- Michael
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2258
-----------------------------------------------------------
On 2011-10-02 00:01:07, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-10-02 00:01:07)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Fix Version/s: 0.90.1
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v12.txt
Testing on cluster found an NPE in a log message. v12 also added a bit of info to other log messages. Want to test more before commit.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446.txt
This is a start. Not done yet. Lots of javadoc of ServerCallable to explain what its about. MetaReader partially done.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116024#comment-13116024 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2124
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
<https://reviews.apache.org/r/2065/#comment4913>
This doesn't seem right.
- Ted
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115804#comment-13115804 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2106
-----------------------------------------------------------
Only part way done, will finish in the afternoon. I like the idea though, good stuff stack.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4855>
Supposed to read "When meta is moved to zk"?
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4857>
this comment talks a lot about what is wrong but it's not clear to me what changes are actually made right now. i see you say server-side only, but what do you propose instead? (i imagine i will find out reading the rest of the diff)
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4858>
update this javadoc a bit... it's missing conf and you might also add additional context to stuff like abortable (which now appears optional and falls back to the connection itself?)
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4859>
when would one override this?
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4860>
is the behavior of this method unchanged? i guess now it returns before it's verified? any specific reason for the name change? (its behavior is definitely different from the old getRootServerConnection()). is it to match the Meta method?
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4861>
i thought no verification in CT?
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4862>
this is public but one with specified timeout is private now?
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4863>
eeek, good catch
- Jonathan
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115394#comment-13115394 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2084
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4732>
Log should be changed accordingly.
- Ted
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v14.txt
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119118#comment-13119118 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2258
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment5236>
This is no longer true - see the fourth parameter below.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment5237>
Out of date javadoc.
- Ted
On 2011-10-02 00:01:07, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-10-02 00:01:07)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120711#comment-13120711 ]
Jonathan Gray commented on HBASE-3446:
--------------------------------------
I've grasped most of the change and this is clearly a significant improvement. Let's get it in!
+1 on latest patch up on RB if tests are passing. TestMergeTool also fails on occasion for me.
Nice work stack!
You're thinking CatalogTracker follow-up in 0.94 w/ ROOT removal perhaps?
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v2.txt
Some more progress.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982812#action_12982812 ]
stack commented on HBASE-3446:
------------------------------
OK, this is the one you figured. Way back, there was some argument for why HTable could not be used in place of HCM. I'm not sure what it was now (or then really); but I just bought it.
Is this worse than 0.89? There does not seem to be any retry facility in the scan of meta done in shutdown server handling there either? (Not that this makes the bug less severe... I'm just trying to talk about whether it should hold up 0.90.0).
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118951#comment-13118951 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2248
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment5213>
This visitor can be merged with the visitor in updateMetaWithNewRegionInfo() after refactoring.
The only difference between them is the boolean parameter to updateHRI().
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment5214>
The checking here seems fragile.
I know it is in current code base. So maybe extract the String in another JIRA.
- Ted
On 2011-10-02 00:01:07, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-10-02 00:01:07)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 54b4939
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 3570e6a
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java cf55329
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java c809945
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 417ec6c
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 65f5e84
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java e70bd83
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96b763b
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 7f21c9f
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java fc05615
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446v15.txt
Fixed TestCatalogTracker
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115592#comment-13115592 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2097
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4761>
I was trying to minimize how many times we do System.currentTimeMillis
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4762>
Agreed
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4764>
Will fix w/ a comment. If timeout is 0, then we do not timeout. I should call it out explicitly.
- Michael
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082959#comment-13082959 ]
stack commented on HBASE-3446:
------------------------------
Running org.apache.hadoop.hbase.master.TestCatalogJanitor
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.683 sec <<< FAILURE!
Running org.apache.hadoop.hbase.io.TestHeapSize
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.187 sec <<< FAILURE!
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reassigned HBASE-3446:
----------------------------
Assignee: stack
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127253#comment-13127253 ]
Hudson commented on HBASE-3446:
-------------------------------
Integrated in HBase-0.92 #64 (See [https://builds.apache.org/job/HBase-0.92/64/])
HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions
HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions
stack :
Files :
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
stack :
Files :
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/Result.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
* /hbase/branches/0.92/src/main/ruby/hbase/admin.rb
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/branches/0.92/src/test/ruby/hbase/admin_test.rb
* /hbase/branches/0.92/src/test/ruby/shell/shell_test.rb
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Fix Version/s: (was: 0.90.1)
0.90.2
Moving out of 0.90.1. I won't have time to spend testing this patch more before tomorrow evening, the cut-off point. The patch looks to be working properly but I keep tripping over other issues that I have to follow to make sure this patch is not the cause.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.2
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Status: Patch Available (was: Open)
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v7.txt
Still not done. My fancy new unit test is turning up issues. Still hunting them down.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115391#comment-13115391 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2083
-----------------------------------------------------------
Thanks for the cleanup.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4725>
Back to future :-)
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4727>
Nice.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4726>
Extra so.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4730>
Do we need to update now and pass to the helper methods ?
The helper methods can easily figure out what now should be.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4729>
I think > would be enough.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4731>
I am confused by the condition here.
- Ted
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v11.txt
Here is version of patch that incorporates Jons' review done over on review board. Testing now on cluster to see if cluster issues before committing.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v11.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988643#comment-12988643 ]
stack commented on HBASE-3446:
------------------------------
Just to say that I put up latest patch on review: https://review.cloudera.org/r/1499/
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115728#comment-13115728 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2099
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4835>
The condition, now < stopTime, is reversed for isTimedOut().
- Ted
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446v23.txt
Here is what I just committed to 0.92 and trunk.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982821#action_12982821 ]
Todd Lipcon commented on HBASE-3446:
------------------------------------
In 0.89 we use RetryableMetaOperation.doWithRetries() so I think it would continue to retry until successful.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115261#comment-13115261 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/
-----------------------------------------------------------
Review request for hbase and Jonathan Gray.
Summary
-------
Make the Meta* operations against meta retry. We do it by using HTable instances.
(HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
In 0.89, we had special RetryableMetaOperation class that was a
subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
with its retry loop. Now we just use HTable instead (Costs some on setup but
otherwise, we avoid duplicating code). Upped the retries on serverside too.
Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
heavily on CT methods getting proxy connections to meta and root servers.
CT needs to be cut back. This patch closes down access on (unused) public
methods and removes being able to get an HRegionInterface on meta and root
-- this stuff is used internally to CT only now; use MetaEditor or
MetaReader if you want to update or read catalog tables. Opening new issue
to cutback CT use over the code base.
A little off topic but couldn't help it since was in MetaReader and MetaEditor
trying to clean them up, I ended up moving meta migration code out to its
own class rather than have it in all inside in MetaEditor.
Here is some detail to help reviews.
M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
Clean up. Shutdown access on some of these unused methods. Don't
let out HRegionInterface instances in particular since we are going
away from raw HRI use to instead use a connection with retries:
i.e. HTable.
Comments on state of this class. Javadoc edits.
getZooKeeperWatcher on HConnection is deprecated so don't use it
in constructor. Override MetaNodeTracker and on node delete
reset meta location (We used to do this over in MetaNodeTracker
but to do that we had to have a CatalogTracker over in zk package
which is silly -- bad package encapsulation).
(waitForRootServer) Renamed getRootServerConnection and change it
from public to package private.
(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
(getMetaServerConnection) Change from public to package private.
Use MetaReader to read the meta location in root rather than a
raw HRegionInterface so we get retrying.
(remaining, timedout) Added utility methods.
(waitForMetaServer) Changed from public to private.
(resetMetaLocation) Made it synchronized on metaAvailable.
Not all accesses were synchronized.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
Refactor to use HTable instead of raw HRegionInterface so we get
retrying. For each operation we get an HTable, use it, then close it.
(putToMetaTable, putsToMetaTable, etc) Utility methods.
(updateRootWithMetaMigrationStatus, etc.) Moved out to own
class since these classes are for a one-time migration only.
A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
New class that holds all Meta* methods updating meta table used
doing the one-time migration done to meta on startup. This class
is marked deprecated because its going to be dropped in 0.94.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
Retrofit methods in here to use fullScan methods with Visitor.
(getCatalogRegionInterface, getCatalogRegionNameForTable,
getCatalogRegionNameForRegion) Removed.
(fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
(fullScanOfResults) Renamed as fullScan override.
(fullScanOfRoot) Added as deprecated. We should be doing
this against zk.
(metaRowToRegionPair, getServerNameFromResult) Moved to Result
(CollectAllVisitor) Added
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Handle few cases where methods throw InterruptedException
(Don't let it out on the HBaseAdmin public API)
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
on failure. Call ServerCallable connect AFTER beforeCall rather than
ServerCallable.instantiateServer BEFORE beforeCall.
M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
Add to DEBUG message the connection name we were using.
M src/main/java/org/apache/hadoop/hbase/client/Result.java
(getServerNameFromCatalogResult, parseCatalogResult,
parseHRegionInfoFromCatalogResult) Added
M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
Added new ThrowableWithExtraContext that takes extra context info.
M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
instantiateServer renamed as connect
M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
Javadoc. Renamed instantiateServer as connect.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Javadoc. Use MetaReader method instead of handcoding.
M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
Handle InterruptedException
M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
Handle InterruptedException
M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
Allow hris can come back null when we ask for table regions.
M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
Remove import of CatalogTracker.
M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
Use utility in MetaReader instead of handcode it.
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
Use new HConnectionTestingUtility mocking tests (need to use it
because its a bit harder mocking tests now that we use HTable instead
of the more direct HRegionInterface).
Add some tests of broken out utility methods.
M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
Add tests
M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
Add test of 3669 retrying.
M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
New test utility that helps with mock of HConnection making it so can mock
an HConnection and then have an HTable use the mocked connection. Can do
a mock or a spied on HConnection
M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
The migration code moved. Reference new location.
M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
This addresses bug hbase-3446.
https://issues.apache.org/jira/browse/hbase-3446
Diffs
-----
src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
Diff: https://reviews.apache.org/r/2065/diff
Testing
-------
All tests passed recently. Rerunning again.
Thanks,
Michael
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Resolution: Fixed
Release Note: Makes catalog/* classes retry: e.g. MetaEditor, MetaReader and CatalogTracker. Previously they would try once and unless successful, fail. Retrying is courtesy of HTable instances.
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
Got all tests to pass, eventually.
A bunch of tests were failing because the waitForMeta just hung on the meta-is-available boolean on master startup waiting for some background thread to set it true when meta had been set. This was fine in old days when we'd go get an HRegionInterface to the .META. and try and ensure it is in its wherever location with verifies over the HRegionInterface instances (with no retries) but now we don't do such primitives, we've gone up the stack, and have HTables/HConnections do search and 'verify' of meta for us. We need to run a connection get to know if meta is available (if it is available, the magic atomicboolean gets set).
Other miscellaneous stuff like testshell was failing for me because couldn't find cluster -- need to set it with the cluster's configuration.
Moved more of the meta migration code into the MetaMigrationRemoveHTD class rather than have it spread all about.
Changed the LocalHBaseCluster#join method so it uses the old threaddumping join which will dump out a thread dump if we are waiting on something > 60 seconds to finish. Helped me debug a few tests here.
Otherwise, was what was up on rb.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115818#comment-13115818 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2107
-----------------------------------------------------------
Thanks for reviews Ted and Jon. Will put up new patch when you fellas finish...
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4864>
@Jon True. I opened another issue with suggested fix. I should at least reference it in here.
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4865>
Nah. Thats TODO.
- Michael
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072994#comment-13072994 ]
stack commented on HBASE-3446:
------------------------------
So, I need to update the last patch here and work on the failures seen. I want to write a test too to prove that we have retries after this patch goes in.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126130#comment-13126130 ]
stack commented on HBASE-3446:
------------------------------
Thanks for +1s. Some tests are failing though so thats why I've yet to commit it. Working on it.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Fix Version/s: (was: 0.90.2)
0.92.0
Moving out of 0.90. Too big a change after all is said and done to put against a patch release.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v3.txt
Most of conversions are done. Testing now.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446-v3.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128382#comment-13128382 ]
Hudson commented on HBASE-3446:
-------------------------------
Integrated in HBase-TRUNK #2325 (See [https://builds.apache.org/job/HBase-TRUNK/2325/])
HBASE-3446 ProcessServerShutdown fails if META moves, orphaning lots of regions
stack :
Files :
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
* /hbase/trunk/src/main/ruby/hbase/admin.rb
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
* /hbase/trunk/src/test/ruby/hbase/admin_test.rb
* /hbase/trunk/src/test/ruby/shell/shell_test.rb
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt, 3446v23.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116019#comment-13116019 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > There is a lot of excellence in here. I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used. I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff.
bq. >
bq. > Great work!
Sorry about that. Let me get you better answer to your question. I think its not very clear because I myself was unclear on scope of CT when I started in. What this patch has here is an attempt at shutting down CT scope with subsequent work put off for HBASE-4495.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 513
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513>
bq. >
bq. > maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard
Will do
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 575
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575>
bq. >
bq. > verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server)
Good point.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194>
bq. >
bq. > i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :)
In the above, we'd get the HRegionInterface and do the invocation on the actual Interface. The alternative steps back and asks an HTable instance to do the work. If an issue with former we'd just let the exception out. In the alternative, we'll do HTable retries before we let the exception out (and the retries are boosted in server-context).
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, line 2
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2>
bq. >
bq. > missing copyright and year?
Turns out that copyright is not actually needed https://issues.apache.org/jira/browse/HBASE-3870
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568>
bq. >
bq. > seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result
OK. It was kinda nice being able to do result.getServerNameFromCatalogResult. I suppose it does pollute. I can move it back to MetaReader since that seems like next best place. You are right shouldn't be generally public stuff. Will fix.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70>
bq. >
bq. > this seems like an important public method. i like the rename and your additional comments, but maybe we should add more. default behavior is to use a cached location, if one is not found, it is looked up in a catalog. setting reload to true bypasses the cache and forces the lookup to a catalog. and then, under what cases do we get an exception? does this verify that the server is actually hosting the region? or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct?
Will look into this.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 1041-1047
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041>
bq. >
bq. > why do you remove the javadoc on this method?
Will look into this.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, line 132
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132>
bq. >
bq. > huh? :)
Let me fix.
bq. On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq. > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java, line 52
bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45932#file45932line52>
bq. >
bq. > 30,000 ft desc? i guess test name is self descriptive? :)
Will do.
- Michael
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
-----------------------------------------------------------
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116008#comment-13116008 ]
jiraposter@reviews.apache.org commented on HBASE-3446:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
-----------------------------------------------------------
There is a lot of excellence in here. I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used. I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff.
Great work!
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4873>
maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard
src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/2065/#comment4874>
verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server)
src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
<https://reviews.apache.org/r/2065/#comment4875>
i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :)
src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
<https://reviews.apache.org/r/2065/#comment4876>
same here! nice (old stuff looks ripe with race conditions)
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment4877>
missing copyright and year?
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
<https://reviews.apache.org/r/2065/#comment4878>
nice moving cruft to separate classes
src/main/java/org/apache/hadoop/hbase/client/Result.java
<https://reviews.apache.org/r/2065/#comment4881>
seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result
src/main/java/org/apache/hadoop/hbase/client/Result.java
<https://reviews.apache.org/r/2065/#comment4882>
yeah, shouldn't this be in MetaReader or some such class?
src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
<https://reviews.apache.org/r/2065/#comment4895>
missed some whitespace
src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
<https://reviews.apache.org/r/2065/#comment4896>
nice
src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
<https://reviews.apache.org/r/2065/#comment4897>
this seems like an important public method. i like the rename and your additional comments, but maybe we should add more. default behavior is to use a cached location, if one is not found, it is looked up in a catalog. setting reload to true bypasses the cache and forces the lookup to a catalog. and then, under what cases do we get an exception? does this verify that the server is actually hosting the region? or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct?
src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<https://reviews.apache.org/r/2065/#comment4898>
why do you remove the javadoc on this method?
src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
<https://reviews.apache.org/r/2065/#comment4899>
not even necessary to put this method in here at all now (we're just using it for getting the node name at this point but it's probably still nice to have the name in stacks and such)
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4900>
yay! <3
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4901>
huh? :)
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
<https://reviews.apache.org/r/2065/#comment4902>
awesome
src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
<https://reviews.apache.org/r/2065/#comment4903>
30,000 ft desc? i guess test name is self descriptive? :)
- Jonathan
On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2065/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-27 06:38:09)
bq.
bq.
bq. Review request for hbase and Jonathan Gray.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Make the Meta* operations against meta retry. We do it by using HTable instances.
bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc).
bq. In 0.89, we had special RetryableMetaOperation class that was a
bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries
bq. with its retry loop. Now we just use HTable instead (Costs some on setup but
bq. otherwise, we avoid duplicating code). Upped the retries on serverside too.
bq.
bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying
bq. heavily on CT methods getting proxy connections to meta and root servers.
bq. CT needs to be cut back. This patch closes down access on (unused) public
bq. methods and removes being able to get an HRegionInterface on meta and root
bq. -- this stuff is used internally to CT only now; use MetaEditor or
bq. MetaReader if you want to update or read catalog tables. Opening new issue
bq. to cutback CT use over the code base.
bq.
bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor
bq. trying to clean them up, I ended up moving meta migration code out to its
bq. own class rather than have it in all inside in MetaEditor.
bq.
bq. Here is some detail to help reviews.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq. Clean up. Shutdown access on some of these unused methods. Don't
bq. let out HRegionInterface instances in particular since we are going
bq. away from raw HRI use to instead use a connection with retries:
bq. i.e. HTable.
bq.
bq. Comments on state of this class. Javadoc edits.
bq. getZooKeeperWatcher on HConnection is deprecated so don't use it
bq. in constructor. Override MetaNodeTracker and on node delete
bq. reset meta location (We used to do this over in MetaNodeTracker
bq. but to do that we had to have a CatalogTracker over in zk package
bq. which is silly -- bad package encapsulation).
bq.
bq. (waitForRootServer) Renamed getRootServerConnection and change it
bq. from public to package private.
bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq. (getMetaServerConnection) Change from public to package private.
bq. Use MetaReader to read the meta location in root rather than a
bq. raw HRegionInterface so we get retrying.
bq. (remaining, timedout) Added utility methods.
bq. (waitForMetaServer) Changed from public to private.
bq. (resetMetaLocation) Made it synchronized on metaAvailable.
bq. Not all accesses were synchronized.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq. Refactor to use HTable instead of raw HRegionInterface so we get
bq. retrying. For each operation we get an HTable, use it, then close it.
bq. (putToMetaTable, putsToMetaTable, etc) Utility methods.
bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq. class since these classes are for a one-time migration only.
bq.
bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq. New class that holds all Meta* methods updating meta table used
bq. doing the one-time migration done to meta on startup. This class
bq. is marked deprecated because its going to be dropped in 0.94.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq. Retrofit methods in here to use fullScan methods with Visitor.
bq. (getCatalogRegionInterface, getCatalogRegionNameForTable,
bq. getCatalogRegionNameForRegion) Removed.
bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc.
bq. (fullScanOfResults) Renamed as fullScan override.
bq. (fullScanOfRoot) Added as deprecated. We should be doing
bq. this against zk.
bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result
bq. (CollectAllVisitor) Added
bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
bq. Handle few cases where methods throw InterruptedException
bq. (Don't let it out on the HBaseAdmin public API)
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext
bq. on failure. Call ServerCallable connect AFTER beforeCall rather than
bq. ServerCallable.instantiateServer BEFORE beforeCall.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
bq. Add to DEBUG message the connection name we were using.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java
bq. (getServerNameFromCatalogResult, parseCatalogResult,
bq. parseHRegionInfoFromCatalogResult) Added
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
bq. Added new ThrowableWithExtraContext that takes extra context info.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
bq. instantiateServer renamed as connect
bq.
bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
bq. Javadoc. Renamed instantiateServer as connect.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq. Javadoc. Use MetaReader method instead of handcoding.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
bq. Handle InterruptedException
bq.
bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
bq. Allow hris can come back null when we ask for table regions.
bq.
bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
bq. Remove import of CatalogTracker.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java
bq. Use utility in MetaReader instead of handcode it.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
bq. Use new HConnectionTestingUtility mocking tests (need to use it
bq. because its a bit harder mocking tests now that we use HTable instead
bq. of the more direct HRegionInterface).
bq. Add some tests of broken out utility methods.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
bq. Add tests
bq.
bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
bq. Add test of 3669 retrying.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
bq. New test utility that helps with mock of HConnection making it so can mock
bq. an HConnection and then have an HTable use the mocked connection. Can do
bq. a mock or a spied on HConnection
bq.
bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java
bq. The migration code moved. Reference new location.
bq.
bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix.
bq.
bq.
bq. This addresses bug hbase-3446.
bq. https://issues.apache.org/jira/browse/hbase-3446
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0
bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c
bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1
bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf
bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d
bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9
bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe
bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4
bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814
bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c
bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408
bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18
bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3
bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2
bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e
bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6
bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc
bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450
bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80
bq.
bq. Diff: https://reviews.apache.org/r/2065/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All tests passed recently. Rerunning again.
bq.
bq.
bq. Thanks,
bq.
bq. Michael
bq.
bq.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983016#action_12983016 ]
stack commented on HBASE-3446:
------------------------------
As you said up on IRC, we actually weren't doing this fixup in shutdown handling previously. Its new facility in 0.90.0. In 0.89, the basescanner would try and do this fix up everytime it ran so I suppose it'd fail and then come along later and probably succeed.
So, we've just read .META. in shutdown handling and now we're processing it. Part of processing it is this new check on daughters. IF the .META. goes down after we just successfully scanned it and before we've finished walking the result to check on daughters, then we run into this issue. Seems rare enough?
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HBASE-3446:
-------------------------------
Component/s: master
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v13.txt
Update to match TRUNK. Compiles. Running tests and cleaning up patch.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3446:
-------------------------
Attachment: 3446-v4.txt
More fixes for tests.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META
moves, orphaning lots of regions
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082953#comment-13082953 ]
stack commented on HBASE-3446:
------------------------------
Just following failed so far:
Running org.apache.hadoop.hbase.catalog.TestCatalogTracker
Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.373 sec <<< FAILURE!
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt
>
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira