You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by GitBox <gi...@apache.org> on 2020/10/23 08:01:39 UTC

[GitHub] [incubator-ratis] szetszwo opened a new pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

szetszwo opened a new pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231


   See https://issues.apache.org/jira/browse/RATIS-1105


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715505860


   @runzhiwang  thanks for identifying the out-of-order reply.
   
   @szetszwo it sounds surprising that the ordering sounds like was guaranteed by the order of replies so potentially because faster code execution, replies could become out-of-order (some replies run faster than earlier ones) . I thought the ordering is maintained by `orderedStreamAsync`. E.g. there is a sequence number so a reply is available only when all lower sequence numbers have replies. Maybe this is not the case in Ratis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on a change in pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#discussion_r511237331



##########
File path: ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java
##########
@@ -234,7 +236,8 @@ public void channelRead(ChannelHandlerContext ctx, Object msg) throws IOExceptio
           }
         }
 
-        JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
+        previous = previous.thenCombine(JavaUtils.allOf(remoteWrites), (u, v) -> null)
+            .thenCombine(localWrite, (v, bytesWritten) -> {

Review comment:
       > ... slow previous will slow down the following request.  ...
   
   It is a correct behaviour for Streaming.  Streaming is FIFO like a queue.  We should guarantee the replies are in-ordered.
   
   It is not an unordered RPC which allows later requests replied before the earlier requests.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang removed a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang removed a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715189784


   @szetszwo The following failed ut seems related to this PR.
   
   `TestDataStream.testDataStreamMultipleServer:164->runTestDataStream:177->runTestDataStream:214 expected:<993818> but was:<745180>`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715197069


   @szetszwo The following failed ut seems related to this PR.
   
   `TestDataStream.testDataStreamMultipleServer:164->runTestDataStream:177->runTestDataStream:214 expected:<993818> but was:<745180>`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715633677


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on a change in pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#discussion_r511198017



##########
File path: ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java
##########
@@ -234,7 +236,8 @@ public void channelRead(ChannelHandlerContext ctx, Object msg) throws IOExceptio
           }
         }
 
-        JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
+        previous = previous.thenCombine(JavaUtils.allOf(remoteWrites), (u, v) -> null)
+            .thenCombine(localWrite, (v, bytesWritten) -> {

Review comment:
       this makes sense!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang merged pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang merged pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715505860


   @runzhiwang  thanks for identifying the out-of-order reply.
   
   @szetszwo it sounds surprising that the ordering sounds like was guaranteed by the order of replies so potentially because faster code execution, replies could become out-of-order (some replies run faster than earlier ones) . I thought the ordering is maintained by `orderedStreamAsync`. E.g. there is a sequence number so a reply is available only when all lower sequence numbers have replies. Maybe this is not the case in Ratis. Will take a look on the mechanism for the `orderedStreamAsync`  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715623720


   > ... it sounds surprising that the ordering sounds like was guaranteed by the order of replies ...
   
   There is a bug in NettyServerStreamRpc.getServerHandler().  The replies are sent asynchronously so that the replies can be sent out of order.  The current reply should wait for the previous reply as shown in 61c63e6f3aa25dc7f6730d2106fb7bd451f610ad .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe the reply of some messages were out of order, or some reply was missed.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715310758


   Oops, the TestDataStream.testDataStreamMultipleServer failed the same way.  Let me check again. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715409662


   @runzhiwang , thanks for digging it up.  As you mentioned, the replies are out of order.  The problem probably is because the refactoring makes the code run faster since it replaces Java API ByteBuffer.allocateDirect(..) with a faster Netty API PooledByteBufAllocator.DEFAULT.directBuffer(..).  Therefore, the replies could go out of order.  I will fix it.
   
   BTW, I do notice that the test runs faster after the refactoring.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe the reply of some messages were mixed.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-717632358


   @szetszwo @amaliujia The out of order still exist. I will try to find the reason.
   ```
   dataLength:623279 bytesWritten:623279
   dataLength:623279 bytesWritten:623279
   dataLength:601357 bytesWritten:601357
   dataLength:601357 bytesWritten:601357
   dataLength:601357 bytesWritten:601357
   assert i:7 size:623279 written:601357
   assert i:8 size:601357 written:623279
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715505860


   @runzhiwang  thanks for identifying the out-of-order reply.
   
   @szetszwo it sounds surprising that the ordering sounds like was guaranteed by the order of replies so potentially because faster code execution, replies could become out-of-order (some replies run faster than earlier ones) . I thought the ordering is maintained by `OrderedStreamAsync`. E.g. there is a sequence number so a reply is available only when all lower sequence numbers have replies. Maybe this is not the case in Ratis. Will take a look on the mechanism for the `OrderedStreamAsync`  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715617431


   I might be wrong: after check `OrderedStreamAsync` and `SlidingWindow`, I think it only guarantees sending requests are in an order but not check replies. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 **System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);**
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         **System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());**
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe the reply of some messages were mixed.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715617431


   I might be wrong: after check `OrderedStreamAsync` and `SlidingWindow`, I think it only guarantees sending requests are in an order but not check replies. If so, the server can easily break the ordering by reply out-of-order.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
       }
   ```
   
   Following is the print result when test fail, we can find written: 791264  is the next and next message of dataLength: 987326, but the `assert i:7` need they are equal, so assert fails.  We can find the reply of assert i:7, i:8, i:9  were out of order.
   
   ```
   assert i:5 size:993851 written:993851, dataLen0, offset3911413 stream:1
   assert i:6 size:916080 written:916080, dataLen0, offset4905264 stream:1
   dataLength:987326 bytesWritten:987326
   dataLength:987326 bytesWritten:987326
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   assert i:7 size:987326 written:791264, dataLen0, offset7494622 stream:1
   assert i:8 size:685952 written:987326, dataLen0, offset5821344 stream:1
   assert i:9 size:791264 written:685952, dataLen0, offset6808670 stream:1
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 791264  is the next and next message of dataLength: 987326, but the `assert i:7` need they are equal, so assert fails.  We can find the reply of assert i:7, i:8, i:9  were out of order.
   
   ```
   assert i:5 size:993851 written:993851
   assert i:6 size:916080 written:916080
   dataLength:987326 bytesWritten:987326
   dataLength:987326 bytesWritten:987326
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   assert i:7 size:987326 written:791264
   assert i:8 size:685952 written:987326
   assert i:9 size:791264 written:685952
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. some reply was out of order.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715505860


   @runzhiwang  thanks for identifying the out-of-order reply.
   
   @szetszwo it sounds surprising that the ordering sounds like was guaranteed by the order of replies so potentially because faster code execution, replies could become out-of-order. I thought the ordering is maintained by `orderedStreamAsync`. E.g. there is a sequence number so a reply is available only when all lower sequence numbers have replies. Maybe this is not the case in Ratis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe the reply of some messages were mixed.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe the reply of some messages were out of order.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715285048


   @szetszwo Let me find the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
       }
   ```
   
   Following is the print result when test fail, we can find written: 791264  is the next and next message of dataLength: 987326, but the `assert i:7` need they are equal, so assert fails.  We can find the reply of assert i:7, i:8, i:8  were out of order.
   
   ```
   assert i:5 size:993851 written:993851, dataLen0, offset3911413 stream:1
   assert i:6 size:916080 written:916080, dataLen0, offset4905264 stream:1
   dataLength:987326 bytesWritten:987326
   dataLength:987326 bytesWritten:987326
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:685952 bytesWritten:685952
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   dataLength:791264 bytesWritten:791264
   assert i:7 size:987326 written:791264, dataLen0, offset7494622 stream:1
   assert i:8 size:685952 written:987326, dataLen0, offset5821344 stream:1
   assert i:9 size:791264 written:685952, dataLen0, offset6808670 stream:1
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang edited a comment on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715370067


   @szetszwo There are some clues,  I print some message as following.
   ```
           JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
                 System.err.println("dataLength:" + request.getDataLength() + " bytesWritten:" + bytesWritten);
                 buf.release();
                 sendReply(remoteWrites, request, bytesWritten, ctx);
                 return null;
           });
   ```
   
   ```
       // check writeAsync requests
       for(int i = 0; i < futures.size(); i++) {
         final DataStreamReply reply = futures.get(i).join();
         Assert.assertTrue(reply.isSuccess());
         System.err.println("assert i:" + i + " size:" + sizes.get(i).longValue() +
             " written:" + reply.getBytesWritten() + ", dataLen" + reply.getDataLength() + ", offset" +
             reply.getStreamOffset() + " stream:" + reply.getStreamId());
         Assert.assertEquals(sizes.get(i).longValue(), reply.getBytesWritten());
       }
   ```
   
   Following is the print result when test fail, we can find written: 669041 is the next message of dataLength: 895899, but the assert need they are equal, so assert fails. Maybe some reply was out of order,  or missed.
   
   ```
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:895899 bytesWritten:895899
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   dataLength:669041 bytesWritten:669041
   
   assert i:6 size:895899 written:669041, dataLen0, offset4837228 stream:1
   
   Error:  testDataStreamMultipleServer(org.apache.ratis.datastream.TestDataStream)  Time elapsed: 0.78 s  <<< FAILURE!
   java.lang.AssertionError: expected:<895899> but was:<669041>
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on a change in pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on a change in pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#discussion_r511210979



##########
File path: ratis-netty/src/main/java/org/apache/ratis/netty/server/NettyServerStreamRpc.java
##########
@@ -234,7 +236,8 @@ public void channelRead(ChannelHandlerContext ctx, Object msg) throws IOExceptio
           }
         }
 
-        JavaUtils.allOf(remoteWrites).thenCombine(localWrite, (v, bytesWritten) -> {
+        previous = previous.thenCombine(JavaUtils.allOf(remoteWrites), (u, v) -> null)
+            .thenCombine(localWrite, (v, bytesWritten) -> {

Review comment:
       @szetszwo Because the slow previous will slow down the following request. So I am considering another fix: Use Map instead of queue for the [replies](https://github.com/apache/incubator-ratis/blob/master/ratis-netty/src/main/java/org/apache/ratis/netty/client/NettyClientStreamRpc.java#L46). If we use queue, when send request1, request2, but response reply2, reply1, when client receive reply2 it will complete request1, which should complete request2,  then error happen. If we use Map instead of queue, we do not need to wait the previous.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715240428


   @runzhiwang ,  thanks.  Will check the test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715255425


   I have run TestDataStream.testDataStreamMultipleServer repeatedly.  It failed with "too many files" exception.  Found that the workerGroup in NettyClientStreamRpc is not shutdown.
   
   After fixing it, the test can be run 2000+ times without any failure.  Unfortunately, I cannot reproduce the same failure as in the build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715189784


   @szetszwo The following failed ut seems related to this PR.
   
   `TestDataStream.testDataStreamMultipleServer:164->runTestDataStream:177->runTestDataStream:214 expected:<993818> but was:<745180>`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #231: RATIS-1105. Refactor Netty streaming encoder and decoder.

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #231:
URL: https://github.com/apache/incubator-ratis/pull/231#issuecomment-715653248


   @szetszwo Thanks the patch. @amaliujia Thanks for review. I have merged it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org