You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Aleksandr Shulman (JIRA)" <ji...@apache.org> on 2013/12/12 01:46:07 UTC

[jira] [Created] (HBASE-10136) [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table

Aleksandr Shulman created HBASE-10136:
-----------------------------------------

             Summary: [Online Schema Change]: Online Schema Change on a table conflicts with snapshot attempt on the table
                 Key: HBASE-10136
                 URL: https://issues.apache.org/jira/browse/HBASE-10136
             Project: HBase
          Issue Type: Bug
          Components: snapshots
    Affects Versions: 0.96.0, 0.98.1, 0.99.0
            Reporter: Aleksandr Shulman
            Assignee: Aleksandr Shulman


Expected behavior:
A user can take a snapshot of a table while that table is undergoing an online schema change.

Observed behavior:
Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. 

As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. 

Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning.

Immediate error:
{code}type=FLUSH }' is still in progress!
2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 10000ms while waiting for snapshot completion.
2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master...
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress!
Snapshot failure occurred
org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:60000 ms
	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
	at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
	at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}

Likely root cause of error:
{code}Exception in SnapshotSubprocedurePool
java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
	at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
	at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing
	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
	at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	... 5 more{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)