You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/09/21 09:59:47 UTC

[GitHub] [bookkeeper] eolivelli opened a new pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

eolivelli opened a new pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801


   - Fixes #2787
   - On some Linux systems the value 1234 is rounded to 1236, this patch simply uses 1236 as value in order to prevent failures


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r714861523



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       @dlg99 is it possible that 1234 is corrupted to 1236 due an endianness problem ?
   but 1236 is not changed to another value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r713018874



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       I have this setup:
   
   > mvn -v
   > Maven home: /usr/share/maven
   > Java version: 1.8.0_292, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
   > Default locale: en, platform encoding: UTF-8
   > OS name: "linux", version: "4.15.0-64-generic", arch: "amd64", family: "unix"
   
   >  uname -a
   > Linux jenkins-pulsar-basic-runner-324 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
   
   The problem always reproduces on our Jenkins slaves at DataStax, basically every test run starts a new VM, installs Ubuntu and Java and then builds bookkeeper from the master ASF branch.
   it is 100% reproducible on those machines.
   
   I had not time to investigate, because I should learn more about the machines.
   We should check that kernel version + Java version....
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli merged pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r713031254



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       @eolivelli do you know if with this change the test is passing consistently in your environment? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r712912670



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       Sure, that's the RFC, but where did you see that some linux implementations round it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#issuecomment-925873069


   The tests passed on this branch on DataStax CI
   
   ```
   [INFO] Running org.apache.bookkeeper.proto.TestPerChannelBookieClient
   [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.099 s - in org.apache.bookkeeper.proto.TestPerChannelBookieClient
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r712908658



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       Some references
   
   https://datatracker.ietf.org/doc/html/rfc5482
   
   I didn't find more




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#issuecomment-923824544


   @RaulGracia PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
dlg99 commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r714379665



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       @eolivelli this smells like an endianness problem somewhere rather than a rounding problem. See https://github.com/jnr/jnr-unixsocket/pull/55 for example. Did it start at some BK version (dependencies updates etc) or it was the first run on these VMs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r712891678



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       Out of curiosity, is it documented anywhere? How did you find that this rounding is going, @eolivelli ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
dlg99 commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r714994150



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       TBH I do not know.
   I experimented with endianness conversion, neither 1234 nor 1236 seem problematic or radically different, even if I add casting to unsigned int.
   OTOH, TCP_USER_TIMEOUT is defined as a 15bit unsigned in the RFC but I don't think 15 vs 16bit matters for either value. 
   
   but extra 2ms timeout making the test pass makes even less sense




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r715324160



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       The other cure for this problem is to revert the change at all.
   
   @RaulGracia said that in the test environments this feature is a good enhancement and fixes a problem.
   
   So probably we can change the test to set a value that is likely to be used in production.
   If Netty/Linux or other parts om the stack do not preserve every value it is not a BK problem.
   We can open a ticket on Netty, I am pretty sure that we cannot so anything in BK.
   
   I want to fix this test that is consistently failing and it is blocking some CI pipeline.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli edited a comment on pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli edited a comment on pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#issuecomment-923824544


   @RaulGracia @nicoloboschi @Ghatage  @dlg99 PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r714507750



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       The test was born a few weeks ago.
   It passes on GH CI and it fails on Datastax CI.
   
   Let me paste the results of the tests on this branch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] dlg99 commented on a change in pull request #2801: Issue 2787: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout fails

Posted by GitBox <gi...@apache.org>.
dlg99 commented on a change in pull request #2801:
URL: https://github.com/apache/bookkeeper/pull/2801#discussion_r714475685



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -314,7 +314,7 @@ public void testEpollChannelTcpUserTimeout() throws Exception {
         EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();
         OrderedExecutor executor = getOrderedSafeExecutor();
         ClientConfiguration conf = new ClientConfiguration();
-        int tcpUserTimeout = 1234;
+        int tcpUserTimeout = 1236; // this value may be rounded on some Linux implementations

Review comment:
       Another suspect is that [TCP_USER_TIMEOUT](https://man7.org/linux/man-pages/man7/tcp.7.html) (tcpUserTimeout feeds into it) is `unsigned int` but netty's interface to jni passes it as it is jint: https://github.com/netty/netty/blob/23405e2000427e9cad104913e7071d8d08e91a3c/transport-native-epoll/src/main/c/netty_epoll_linuxsocket.c#L144-L146
   
   i.e. for multicast netty casts parameter to `u_int`: https://github.com/netty/netty/blob/23405e2000427e9cad104913e7071d8d08e91a3c/transport-native-epoll/src/main/c/netty_epoll_linuxsocket.c#L78-L81 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org