You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2014/06/02 19:22:02 UTC
[jira] [Commented] (HBASE-11282) Load balancer may move a region which is participating in snapshot

    [ https://issues.apache.org/jira/browse/HBASE-11282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015596#comment-14015596 ] 

stack commented on HBASE-11282:
-------------------------------

bq. Mechanism of making load balancer be aware of region operation is desirable such that snapshot doesn't fail due to the above scenario.

It is allowed that snapshots may fail for any of myriad reasons.

Tying together two systems we'd like to keep disparate -- the balancer and snapshotting -- unless it really necessary seems like a bad direction to me.


> Load balancer may move a region which is participating in snapshot
> ------------------------------------------------------------------
>
>                 Key: HBASE-11282
>                 URL: https://issues.apache.org/jira/browse/HBASE-11282
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> The region was tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
> From master log:
> {code}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Found an existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.       destination server is h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal,60020,1394494963812 accepted as a dest server = true
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Using pre-existing plan for tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.;     plan=hri=tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7., src=h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165, dest=h2-ubuntu12-sec-     1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> 2014-03-10 23:48:09,035 INFO  [AM.ZK.Worker-pool2-t42] master.RegionStates: Transitioned {289ebdee6adf0a3b9c2bbcbe2ff522e7 state=CLOSED, ts=1394495289035, server=h2-       ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165} to {289ebdee6adf0a3b9c2bbcbe2ff522e7 state=OFFLINE, ts=1394495289035, server=h2-ubuntu12-sec-        1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
> 2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] zookeeper.ZKAssign: master:60000-0x244aa9920190b04, quorum=h2-ubuntu12-sec-1394425849-hbase-8.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-1.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal:2181, baseZNode=/hbase Creating (or updating) unassigned     node 289ebdee6adf0a3b9c2bbcbe2ff522e7 with OFFLINE state
> 2014-03-10 23:48:09,044 INFO  [AM.ZK.Worker-pool2-t42] master.AssignmentManager: Assigning tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. to h2-ubuntu12-sec-    1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
> {code}
> From hbase-hbase-regionserver-h2-ubuntu12-sec-1394425849-hbase-9.log :
> {code}
> 2014-03-10 23:48:08,487 WARN  [member: 'h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165' subprocedure-pool1-thread-1] snapshot.                    RegionServerSnapshotManager: Got Exception in SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:325)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException: tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5699)
>   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5663)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
>   at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:65)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> {code}
> Load balancer's move of the underlying region caused FlushSnapshotSubprocedure to fail.
> Mechanism of making load balancer be aware of region operation is desirable such that snapshot doesn't fail due to the above scenario.



--
This message was sent by Atlassian JIRA
(v6.2#6252)