You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2016/02/09 02:44:18 UTC
[jira] [Commented] (PHOENIX-2667) Race condition between IndexBuilder and Split for region lock

    [ https://issues.apache.org/jira/browse/PHOENIX-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138214#comment-15138214 ] 

Enis Soztutar commented on PHOENIX-2667:
----------------------------------------

I think the events are happening something like this: 
 (1) multi() request for a bunch of writes arrives at the region. We acquire the region read lock, then process the mutation in the primary table. 

(2) We write the edits to WAL. In the WAL path, we are preparing the index mutations and sending them to be committed to the index table. However, this prepare also have to read from the primary table region:
{code}
	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7755)
	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7741)
	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:7650)
	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2484)
	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2479)
	at org.apache.phoenix.hbase.index.covered.data.LocalTable.getCurrentRowState(LocalTable.java:63)
	at org.apache.phoenix.hbase.index.covered.LocalTableState.ensureLocalStateInitialized(LocalTableState.java:158)
	at org.apache.phoenix.hbase.index.covered.LocalTableState.getIndexedColumnsTableState(LocalTableState.java:126)
	at org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(PhoenixIndexCodec.java:162)
	at org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(PhoenixIndexCodec.java:120)
	at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(CoveredColumnsIndexBuilder.java:403)
	at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(CoveredColumnsIndexBuilder.java:287)
	at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(CoveredColumnsIndexBuilder.java:239)
	at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(CoveredColumnsIndexBuilder.java:136)
	at org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(CoveredColumnsIndexBuilder.java:99)
	at org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
	at org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:129)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
{code}

(3) concurrently, we have decided to split the region, and now we are waiting on acquiring the write lock. 
(4) The reads from the IndexBuilder to the primary region executes, however, they are blocked since they cannot acquire the region write lock due to the RWLock heuristics. Although ReentrantRWLock is not fair by default, it implements this heuristics: 
{code}
    static final class NonfairSync extends Sync {
        private static final long serialVersionUID = -8159625535654395037L;
        final boolean writerShouldBlock() {
            return false; // writers can always barge
        }
        final boolean readerShouldBlock() {
            /* As a heuristic to avoid indefinite writer starvation,
             * block if the thread that momentarily appears to be head
             * of queue, if one exists, is a waiting writer.  This is
             * only a probabilistic effect since a new reader will not
             * block if there is a waiting writer behind other enabled
             * readers that have not yet drained from the queue.
             */
            return apparentlyFirstQueuedIsExclusive();
        }
    }
{code} 

Above heuristics means that, if (1) and (2) happens, then we enqueue the write lock request from(4) before all reads have finished for the index updates, we are kind of in a (soft) dead lock. 

Notice that the RPC handler thread holds the read lock, and waits on the index builder thread pool. The index builder thread tries to acquire the read lock again, but ends up being queued up after the write lock. The write lock just waits for the handler thread to finish, resulting in this sophisticated dead lock. 


> Race condition between IndexBuilder and Split for region lock
> -------------------------------------------------------------
>
>                 Key: PHOENIX-2667
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2667
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>
> In a production cluster, we have seen a condition where the split did not finish for 30+ minutes. Also due to this, no request was being serviced in this time frame affectively making the region offline. 
> The jstack provides 3 types of threads waiting on the regions read or write locks. 
> First, the handlers are all blocked on trying to acquire the read lock on the region in multi(), most of the handlers are like this:
> {code}
> Thread 2328: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=226 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(int, long) @bci=122, line=1033 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(int, long) @bci=25, line=1326 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(long, java.util.concurrent.TimeUnit) @bci=10, line=873 (Compiled frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.lock(java.util.concurrent.locks.Lock, int) @bci=27, line=7754 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.lock(java.util.concurrent.locks.Lock) @bci=3, line=7741 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(org.apache.hadoop.hbase.regionserver.Region$Operation) @bci=211, line=7650 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(org.apache.hadoop.hbase.regionserver.HRegion$BatchOperationInProgress) @bci=21, line=2803 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(org.apache.hadoop.hbase.client.Mutation[], long, long) @bci=12, line=2760 (Compiled frame)
>  - org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$RegionActionResult$Builder, org.apache.hadoop.hbase.regionserver.Region, org.apache.
>  - org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(org.apache.hadoop.hbase.regionserver.Region, org.apache.hadoop.hbase.quotas.OperationQuota, org.apache.hadoop.hbase.protobuf
>  - org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(com.google.protobuf.RpcController, org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest) @bci=407, line=2032 (Compiled frame)
>  - org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Messa
>  - org.apache.hadoop.hbase.ipc.RpcServer.call(com.google.protobuf.BlockingService, com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.Message, org.apache.hadoop.hbase.CellScanner, long,
>  - org.apache.hadoop.hbase.ipc.CallRunner.run() @bci=345, line=101 (Compiled frame)
>  - org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(java.util.concurrent.BlockingQueue) @bci=54, line=130 (Compiled frame)
>  - org.apache.hadoop.hbase.ipc.RpcExecutor$1.run() @bci=20, line=107 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}
> second, the IndexBuilder threads from Phoenix index are also blocked waiting on the region read locks: 
> {code}
> Thread 17566: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=226 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(int, long) @bci=122, line=1033 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(int, long) @bci=25, line=1326 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(long, java.util.concurrent.TimeUnit) @bci=10, line=873 (Compiled frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.lock(java.util.concurrent.locks.Lock, int) @bci=27, line=7754 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.lock(java.util.concurrent.locks.Lock) @bci=3, line=7741 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(org.apache.hadoop.hbase.regionserver.Region$Operation) @bci=211, line=7650 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.getScanner(org.apache.hadoop.hbase.client.Scan, java.util.List) @bci=4, line=2484 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.getScanner(org.apache.hadoop.hbase.client.Scan) @bci=3, line=2479 (Compiled frame)
>  - org.apache.phoenix.hbase.index.covered.data.LocalTable.getCurrentRowState(org.apache.hadoop.hbase.client.Mutation, java.util.Collection) @bci=43, line=63 (Compiled frame)
>  - org.apache.phoenix.hbase.index.covered.LocalTableState.ensureLocalStateInitialized(java.util.Collection) @bci=29, line=158 (Compiled frame)
>  - org.apache.phoenix.hbase.index.covered.LocalTableState.getIndexedColumnsTableState(java.util.Collection) @bci=2, line=126 (Compiled frame)
>  - org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(org.apache.phoenix.hbase.index.covered.TableState, boolean) @bci=213, line=162 (Compiled frame)
>  - org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(org.apache.phoenix.hbase.index.covered.TableState) @bci=3, line=120 (Compiled frame)
>  - org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager, org.apache.phoenix.hbase.index.covered.LocalTab
>  - org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager, long, org.apache.phoenix.hbase.index.covere
>  - org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager, org.apache.phoenix.hbase.index.covered.Batch, or
>  - org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager, org.apache.hadoop.hbase.client.Mutation) @
>  - org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(org.apache.hadoop.hbase.client.Mutation) @bci=11, line=99 (Compiled frame)
>  - org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call() @bci=11, line=133 (Compiled frame)
>  - org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call() @bci=1, line=129 (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=262 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}
> Finally, the splitting of the region is being executed, which is trying to close the region. This operation acquires the regions write lock, but it is also blocked: 
> {code}
> Thread 18417: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=945 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.doClose(boolean, org.apache.hadoop.hbase.monitoring.MonitoredTask) @bci=278, line=1401 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.HRegion.close(boolean) @bci=62, line=1349 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(org.apache.hadoop.hbase.Server, org.apache.hadoop.hbase.regionserver.RegionServerServices, boolean) @bci=238, line=394 (Interpr
>  - org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(org.apache.hadoop.hbase.Server, org.apache.hadoop.hbase.regionserver.RegionServerServices, org.apache.hadoop.hbase.security.Use
>  - org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(org.apache.hadoop.hbase.Server, org.apache.hadoop.hbase.regionserver.RegionServerServices, org.apache.hadoop.hbase.security.User) @bci=
>  - org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(org.apache.hadoop.hbase.security.User) @bci=357, line=82 (Interpreted frame)
>  - org.apache.hadoop.hbase.regionserver.SplitRequest.run() @bci=74, line=154 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}
> This looks like a race between the split thread and the global index updates racing for the region lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)