You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@fluo.apache.org by GitBox <gi...@apache.org> on 2019/02/13 22:50:10 UTC

[GitHub] keith-turner opened a new issue #1069: Saw deadlock in fluo map reduce load job

keith-turner opened a new issue #1069: Saw deadlock in fluo map reduce load job
URL: https://github.com/apache/fluo/issues/1069
 
 
   While running the stress test I saw map reduce jobs hang when trying to close.  Jstacking a map reduce process I saw the following deadlock.
   
   ```
   "main" #1 prio=5 os_prio=0 tid=0x00007fd42c016800 nid=0x5774 waiting on condition [0x00007fd434d87000]
      java.lang.Thread.State: WAITING (parking)
   	at sun.misc.Unsafe.park(Native Method)
   	- parking to wait for  <0x00000000ef5abf08> (a java.util.concurrent.CountDownLatch$Sync)
   	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
   	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
   	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
   	at org.apache.fluo.core.impl.SharedBatchWriter.close(SharedBatchWriter.java:192)
   	at org.apache.fluo.core.impl.SharedResources.close(SharedResources.java:211)
   	- locked <0x00000000ef008a40> (a org.apache.fluo.core.impl.SharedResources)
   	at org.apache.fluo.core.impl.Environment.close(Environment.java:254)
   	at org.apache.fluo.core.client.FluoClientImpl.close(FluoClientImpl.java:116)
   	at org.apache.fluo.mapreduce.FluoOutputFormat$2.close(FluoOutputFormat.java:96)
   	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:682)
   	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:805)
   	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
   	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
   	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
   ```
   
   ```
   "Fluo-0001-001-sharedBW" #53 daemon prio=5 os_prio=0 tid=0x00007fd42d935800 nid=0x59ef waiting for monitor entry [0x00007fd411154000]
      java.lang.Thread.State: BLOCKED (on object monitor)
   	at org.apache.fluo.core.impl.SharedResources.getTimestampTracker(SharedResources.java:138)
   	- waiting to lock <0x00000000ef008a40> (a org.apache.fluo.core.impl.SharedResources)
   	at org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:771)
   	- locked <0x00000000eff41da0> (a org.apache.fluo.core.impl.TransactionImpl)
   	at org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:777)
   	at org.apache.fluo.core.async.CommitManager$CQCommitObserver.finish(CommitManager.java:68)
   	at org.apache.fluo.core.async.CommitManager$CQCommitObserver.committed(CommitManager.java:87)
   	at org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep.lambda$getMainOp$0(TransactionImpl.java:1377)
   	at org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep$$Lambda$88/441397001.apply(Unknown Source)
   	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
   	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
   	at org.apache.fluo.core.impl.SharedBatchWriter$MutationBatch.countDown(SharedBatchWriter.java:74)
   	at org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.processBatches(SharedBatchWriter.java:123)
   	at org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.run(SharedBatchWriter.java:93)
   	at java.lang.Thread.run(Thread.java:748)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services