You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "SzyWilliam (via GitHub)" <gi...@apache.org> on 2023/04/18 16:37:39 UTC

[GitHub] [ratis] SzyWilliam commented on pull request #876: RATIS-1834. ServerRequestStreamObserver is not properly closed.

SzyWilliam commented on PR #876:
URL: https://github.com/apache/ratis/pull/876#issuecomment-1513478739

   Thanks very much for the patch! I used this patch and did tests and here is what I found.
   The leader successfully send 175 installSnapshot RPCs to the follower (thanks to the streaming timeout). 
   ```java
   2023-04-18 11:29:51,004 [grpc-default-executor-15] INFO  o.a.r.g.s.GrpcLogAppender$InstallSnapshotResponseHandler:530 - 7@group-000100000002->9-InstallSnapshotResponseHandler: Completed InstallSnapshot. Reply: serverReply {
     requestorId: "7"
     replyId: "9"
     raftGroupId {
       id: "GGGGGGGGGG\000\001\000\000\000\002"
     }
     success: true
   }
   term: 1
   requestIndex: 175
    
   2023-04-18 11:29:51,004 [grpc-default-executor-15] INFO  o.a.r.s.i.FollowerInfoImpl:126 - Follower 7@group-000100000002->9 acknowledged installing snapshot 
   ```
   However, after the 175th RPC sent from leader, the leader didn't proceed on. On the contrary, it somehow stopped until 4.4s passed and this installSnapshot streaming connection was then cancelled and closed by a RST_STREAM.
   ```java
   2023-04-18 11:29:55,480 [grpc-default-executor-15] WARN  o.a.ratis.util.LogUtils:122 - 7@group-000100000002->9-InstallSnapshotResponseHandler: Failed InstallSnapshot: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL
   ```
   
   On the client side, it replies to the 175th installSnapshot RPC normally and 4.4s later it discovered that this stream is cancelled by the leader.
   ```java
   2023-04-18 11:29:50,946 [grpc-default-executor-1] INFO  o.a.r.s.i.SnapshotInstallationHandler:100 - 9@group-000100000002: reply installSnapshot: 7<-9#0:OK-t1,SUCCESS,requestIndex=138 
   2023-04-18 11:29:55,423 [grpc-default-executor-0] WARN  o.a.ratis.util.LogUtils:122 - 9: INSTALL_SNAPSHOT onError, lastRequest: 7->9#0-t1,chunk:04d6f0e0-41d8-4a40-b65f-f195bce7a405,175: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client cancelled 
   ```
   
   No GC or other abnormalities detected meanwhile.
   
   This situation repeats 12 times, all stuck at installSnapshot RPC index **175**. Therefore, I guess the 175th is the last chunk of this snapshot and suspect that there are deadlock situations in streaming installSnapshot **completion**.
   Now it seems that `ServerRequestStreamObserver` is not to blame for this deadlock. Are there anything that I missed out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org