You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz Wo Nicholas Sze (JIRA)" <ji...@apache.org> on 2018/11/09 21:03:00 UTC

[jira] [Assigned] (RATIS-404) Deadlock in ratis between appendEntries and RaftLogWorker

     [ https://issues.apache.org/jira/browse/RATIS-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo Nicholas Sze reassigned RATIS-404:
-----------------------------------------

    Resolution: Fixed
      Assignee: Mukul Kumar Singh

I have committed this.  Thanks, [~msingh]!

> Deadlock in ratis between appendEntries and RaftLogWorker
> ---------------------------------------------------------
>
>                 Key: RATIS-404
>                 URL: https://issues.apache.org/jira/browse/RATIS-404
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 0.3.0
>            Reporter: Mukul Kumar Singh
>            Assignee: Mukul Kumar Singh
>            Priority: Major
>             Fix For: 0.3.0
>
>         Attachments: RATIS-404.001.patch
>
>
> The deadlock happens when the RaftLogWorker queue is completely full.  This happens when the following thread is trying to enqueue holding onto the RaftServerImpl lock.
> {code}
> "grpc-default-executor-18" #459 daemon prio=5 os_prio=0 tid=0x00007f8cd4a4a000 nid=0x5f6 waiting on condition [0x00007f8c31df2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000098dd53d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>         at org.apache.ratis.server.storage.RaftLogWorker.addIOTask(RaftLogWorker.java:186)
>         at org.apache.ratis.server.storage.RaftLogWorker.writeLogEntry(RaftLogWorker.java:300)
>         at org.apache.ratis.server.storage.SegmentedRaftLog.appendEntry(SegmentedRaftLog.java:302)
>         at org.apache.ratis.server.storage.SegmentedRaftLog.append(SegmentedRaftLog.java:379)
>         at org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:914)
>         - locked <0x000000009893b638> (a org.apache.ratis.server.impl.RaftServerImpl)
>         at org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:821)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$18(RaftServerProxy.java:434)
>         at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$310/1439556067.apply(Unknown Source)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:309)
>         at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$176/355487796.get(Unknown Source)
>         at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:82)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:309)
>         at org.apache.ratis.server.impl.RaftServerProxy$$Lambda$175/1025132044.apply(Unknown Source)
>         at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>         at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
>         at org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:308)
>         at org.apache.ratis.server.impl.RaftServerProxy.appendEntriesAsync(RaftServerProxy.java:434)
>         at org.apache.ratis.grpc.server.GrpcServerProtocolService$1.onNext(GrpcServerProtocolService.java:76)
>         at org.apache.ratis.grpc.server.GrpcServerProtocolService$1.onNext(GrpcServerProtocolService.java:66)
>         at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:683)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>         at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> The RaftLogWorker thread is in turn blocked on locking the RaftServerImpl lock as in the following trace.
> {code}
> "c5a4d441-cb73-47a2-94b5-fc8233061955-RaftLogWorker" #440 daemon prio=5 os_prio=0 tid=0x00000000026a2000 nid=0x5e3 waiting for monitor entry [0x00007f8c884aa000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.ratis.server.impl.RaftServerImpl.lambda$appendEntriesAsync$21(RaftServerImpl.java:925)
>         - waiting to lock <0x000000009893b638> (a org.apache.ratis.server.impl.RaftServerImpl)
>         at org.apache.ratis.server.impl.RaftServerImpl$$Lambda$316/47202155.apply(Unknown Source)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>         at org.apache.ratis.server.storage.SegmentedRaftLog$Task.done(SegmentedRaftLog.java:83)
>         at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:220)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)