You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/08/12 10:56:36 UTC

[GitHub] [bookkeeper] RaulGracia opened a new pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

RaulGracia opened a new pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761


   ### Motivation
   
   Added `TCP_USER_TIMEOUT` in Epoll channel config to limit the time a connection is left sending keepalives to a non-responding Bookie.
   
   ### Changes
   
   The original issue reported that in scenarios where Bookies may go down unexpectedly and change their IP (e.g., Kubernetes), the Bookkeeper client may be left for some time attempting to connect with the old IP of the restarted Bookie (see #2482 for details). To prevent this problem from happening (in Epoll channels), we introduce the following changes:
   - Epoll channels are now configured with `TCP_USER_TIMEOUT`. This parameter rules over the underlying TCP keepalive configuration (see https://datatracker.ietf.org/doc/html/rfc5482), which may be defaulted to retry for too long depending on the environment (e.g., 10-15 minutes in our experience).
   - To prevent adding more configuration parameters, the existing `clientConnectTimeoutMillis` value in `ClientConfiguration` is the one used to set `TCP_USER_TIMEOUT` due to its similarity.
   
   ### Validation
   
   We have reproduced the original testing environment in which this problem appears consistently:
   - Cluster with 4 Bookies and 3 Kubernetes nodes, in addition to https://pravega.io which uses the Bookkeeper client.
   - Deployed an application to do IO to Pravega (and therefore, to Bookkeeper).
   - Periodically shut down a Kubernetes node, so Bookkeeper pods on it are restarted as well.
   
   Considering this test procedure, without the proposed PR we consistently observe Bookkeeper clients getting stuck trying to contact with old IPs from Bookies. With this change, we confirmed via logs that the configuration change takes place and we have not been able to reproduce the original problem so far after performing multiple node reboots.
   
   Master Issue: #2482
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r694119185



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +300,30 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {
+        OrderedExecutor executor = getOrderedSafeExecutor();
+        EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();

Review comment:
       I tried on Mac and didn't pass for me:
   
   ```
   [INFO] Running org.apache.bookkeeper.proto.TestPerChannelBookieClient
   [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 35.007 s <<< FAILURE! - in org.apache.bookkeeper.proto.TestPerChannelBookieClient
   [ERROR] org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout  Time elapsed: 2.99 s  <<< ERROR!
   java.lang.UnsatisfiedLinkError: failed to load the required native library
   	at io.netty.channel.epoll.Epoll.ensureAvailability(Epoll.java:80)
   	at io.netty.channel.epoll.EpollEventLoop.<clinit>(EpollEventLoop.java:51)
   	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:150)
   	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:35)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47)
   	at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:112)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:99)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:76)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:52)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:45)
   	at org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout(TestPerChannelBookieClient.java:312)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ExceptionInInitializerError
   	at io.netty.channel.epoll.Epoll.<clinit>(Epoll.java:39)
   	... 27 more
   Caused by: java.lang.IllegalStateException: Only supported on Linux
   	at io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:284)
   	at io.netty.channel.epoll.Native.<clinit>(Native.java:70)
   	... 28 more
   
   [ERROR] org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout  Time elapsed: 0.843 s  <<< ERROR!
   java.lang.NoClassDefFoundError: Could not initialize class io.netty.channel.epoll.EpollEventLoop
   	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:150)
   	at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:35)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
   	at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47)
   	at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:112)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:99)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:76)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:52)
   	at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:45)
   	at org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout(TestPerChannelBookieClient.java:312)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.lang.Thread.run(Thread.java:748)
   
   [ERROR] org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout  Time elapsed: 30.8 s  <<< ERROR!
   java.util.concurrent.TimeoutException
   	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.startBookie(BookKeeperClusterTestCase.java:673)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.startAndAddBookie(BookKeeperClusterTestCase.java:640)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.startNewBookieAndReturnAddress(BookKeeperClusterTestCase.java:629)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.startNewBookie(BookKeeperClusterTestCase.java:622)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.startBKCluster(BookKeeperClusterTestCase.java:252)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.setUp(BookKeeperClusterTestCase.java:160)
   	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.setUp(BookKeeperClusterTestCase.java:145)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.lang.Thread.run(Thread.java:748)
   
   [INFO] 
   [INFO] Results:
   [INFO] 
   [ERROR] Errors: 
   [ERROR] org.apache.bookkeeper.proto.TestPerChannelBookieClient.testEpollChannelTcpUserTimeout
   [ERROR]   Run 1: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout:312 » UnsatisfiedLink
   [ERROR]   Run 2: TestPerChannelBookieClient.testEpollChannelTcpUserTimeout:312 » NoClassDefFound
   [ERROR]   Run 3: TestPerChannelBookieClient>BookKeeperClusterTestCase.setUp:145->BookKeeperClusterTestCase.setUp:160->BookKeeperClusterTestCase.startBKCluster:252->BookKeeperClusterTestCase.startNewBookie:622->BookKeeperClusterTestCase.startNewBookieAndReturnAddress:629->BookKeeperClusterTestCase.startAndAddBookie:640->BookKeeperClusterTestCase.startBookie:673 » Timeout
   [INFO] 
   [INFO] 
   [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#issuecomment-900964892


   @eolivelli @fpj I have addressed your comments. We have also done another internal round of tests and this PR seems to fix the original issue. If you are ok to merge it, it would be great to have it in `4.14` branch to be included in a future release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r687636384



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll, also set the connection timeout via TCP_USER_TIMEOUT.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getClientConnectTimeoutMillis());

Review comment:
       I agree that it is better to add a new configuration parameter.
   
   also this option is not like `clientConnectTimeoutMillis`, that usually is meant to give a maximum time for the server to set up the connection, I see it more like `readTimeout` (https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ClientConfiguration.java#L586) that is a generic timeout for reads from network (after the connection is properly set up)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r695623522



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +302,42 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {

Review comment:
       I validated that it doesn't fail on MacOS because of the absence of epoll:
   
   ```
   [INFO] Running org.apache.bookkeeper.proto.TestPerChannelBookieClient
   [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 2.619 s - in org.apache.bookkeeper.proto.TestPerChannelBookieClient
   [INFO] 
   [INFO] Results:
   [INFO] 
   [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  30.897 s
   [INFO] Finished at: 2021-08-24T23:03:34+02:00
   [INFO] ------------------------------------------------------------------------
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#issuecomment-900964892


   @eolivelli @fpj I have addressed your comments. We have also done another internal round of tests and this PR seems to fix the original issue. If you are ok to merge it, it would be great to have it in `4.14` branch to be included in a future release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691936646



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +300,30 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {
+        OrderedExecutor executor = getOrderedSafeExecutor();
+        EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();

Review comment:
       Is this test passing on Mac?
   Because Epoll is not available and the native libraries are not loaded.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r695471441



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll channels, we can set the TCP user timeout.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getTcpUserTimeoutMillis());

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r687629434



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll, also set the connection timeout via TCP_USER_TIMEOUT.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getClientConnectTimeoutMillis());

Review comment:
       It sounds right to add it here, but I don't really like the idea of reusing the configuration parameter. That has a different purpose. I'd say it is fine to add a configuration parameter. For a new configuration parameter, we can either have a default or not set it in the case the configuration does not include a value for it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691120423



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll channels, we can set the TCP user timeout.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getTcpUserTimeoutMillis());

Review comment:
       If the configuration option isn't set, then I'd rather not set a value. According to the RFC (https://datatracker.ietf.org/doc/html/rfc5482):
   
   > In the absence of an application-specified user timeout, the TCP specification [RFC0793] defines a default user timeout of 5 minutes.
   
   I'd leave it to the system defaults.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli merged pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691066016



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll, also set the connection timeout via TCP_USER_TIMEOUT.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getClientConnectTimeoutMillis());

Review comment:
       Sure, I have added as separate setting for this option.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691066016



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll, also set the connection timeout via TCP_USER_TIMEOUT.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getClientConnectTimeoutMillis());

Review comment:
       Sure, I have added as separate setting for this option.

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll channels, we can set the TCP user timeout.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getTcpUserTimeoutMillis());

Review comment:
       In all the `ClientConfiguration` timeout properties, there is defined a default value, so I have followed the same pattern. Note that there are properties in that class that set a different default as the one coming from the original library; for example, `CLIENT_CONNECT_TIMEOUT_MILLIS` has a default value in `ClientConfiguration` (10 seconds) different to the default value in Netty (30 seconds).
   
   Also, there are documents ([like this proposal in the gRPC repository](https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md)) that propose setting `TCP_USER_TIMEOUT` value to 20 seconds by default, which is similar to what we are setting now (10 seconds). 
   
   Overall, leaving this property unset may be hindering the benefits of this PR unless otherwise specified by the client (who may not know about this property).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r693793841



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +300,30 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {
+        OrderedExecutor executor = getOrderedSafeExecutor();
+        EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();

Review comment:
       Even without this test, I'm not able to build bookkeeper regularly on MacOS. I typically build it on an Ubuntu VM. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691265956



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll channels, we can set the TCP user timeout.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getTcpUserTimeoutMillis());

Review comment:
       In all the `ClientConfiguration` timeout properties, there is defined a default value, so I have followed the same pattern. Note that there are properties in that class that set a different default as the one coming from the original library; for example, `CLIENT_CONNECT_TIMEOUT_MILLIS` has a default value in `ClientConfiguration` (10 seconds) different to the default value in Netty (30 seconds).
   
   Also, there are documents ([like this proposal in the gRPC repository](https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md)) that propose setting `TCP_USER_TIMEOUT` value to 20 seconds by default, which is similar to what we are setting now (10 seconds). 
   
   Overall, leaving this property unset may be hindering the benefits of this PR unless otherwise specified by the client (who may not know about this property).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r691120423



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -540,6 +541,8 @@ protected ChannelFuture connect() {
         bootstrap.group(eventLoopGroup);
         if (eventLoopGroup instanceof EpollEventLoopGroup) {
             bootstrap.channel(EpollSocketChannel.class);
+            // For Epoll channels, we can set the TCP user timeout.
+            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, conf.getTcpUserTimeoutMillis());

Review comment:
       If the configuration option isn't set, then I'd rather not set a value. According to the RFC (https://datatracker.ietf.org/doc/html/rfc5482):
   
   > In the absence of an application-specified user timeout, the TCP specification [RFC0793] defines a default user timeout of 5 minutes.
   
   I'd leave it to the system defaults.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r694094695



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +300,30 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {
+        OrderedExecutor executor = getOrderedSafeExecutor();
+        EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();

Review comment:
       The other thing, @eolivelli , is that this option only seems to be available with the epoll implementation. Consequently, it won't work with other transport implementations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] RaulGracia commented on a change in pull request #2761: Issue 2482: Added TCP_USER_TIMEOUT to Epoll channel config

Posted by GitBox <gi...@apache.org>.
RaulGracia commented on a change in pull request #2761:
URL: https://github.com/apache/bookkeeper/pull/2761#discussion_r695471273



##########
File path: bookkeeper-server/src/test/java/org/apache/bookkeeper/proto/TestPerChannelBookieClient.java
##########
@@ -295,4 +300,30 @@ public void safeRun() {
         eventLoopGroup.shutdownGracefully();
         executor.shutdown();
     }
+
+    /**
+     * Test that TCP user timeout is correctly set in EpollEventLoopGroup.
+     */
+    @Test
+    public void testEpollChannelTcpUserTimeout() throws Exception {
+        OrderedExecutor executor = getOrderedSafeExecutor();
+        EventLoopGroup eventLoopGroup = new EpollEventLoopGroup();

Review comment:
       @eolivelli the test is passing now on a Mac (@fpj confirmed that).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org