You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/04 13:19:35 UTC

[GitHub] [ozone] ChenSammi opened a new pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

ChenSammi opened a new pull request #2498:
URL: https://github.com/apache/ozone/pull/2498


   https://issues.apache.org/jira/browse/HDDS-5537


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r726724584



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +67,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class channelType;
+
+    if (Epoll.isAvailable()) {
+      eventLoopGroup = new EpollEventLoopGroup(poolSize * 2);
+      channelType = EpollServerDomainSocketChannel.class;
+    } else {
+      eventLoopGroup = new NioEventLoopGroup(poolSize * 2);
+      channelType = NioServerSocketChannel.class;
+    }
+
     NettyServerBuilder nettyServerBuilder = NettyServerBuilder.forPort(port)
         .maxInboundMessageSize(OzoneConsts.OZONE_SCM_CHUNK_MAX_SIZE)
+        .workerEventLoopGroup(eventLoopGroup)
+        .bossEventLoopGroup(eventLoopGroup)
+        .channelType(channelType)

Review comment:
       @adoroszlai , thanks for the code review.   It's because EpollServerSocketChannel is not supported on Mac. I believe most of us  use Mac for code develoepment.  I'd like to make sure replication related unit tests will not fail on Mac book.   You can find that there is an channelType identify process ahead. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r773939981



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -109,12 +150,14 @@ public void start() throws IOException {
     }
 
     port = server.getPort();
-
   }
 
   public void stop() {
     try {
-      server.shutdown().awaitTermination(10L, TimeUnit.SECONDS);
+      eventLoopGroup.shutdownGracefully().sync();
+      executor.shutdown();
+      executor.awaitTermination(5L, TimeUnit.SECONDS);
+      server.shutdown().awaitTermination(5L, TimeUnit.SECONDS);

Review comment:
       Event loop group should be shutdown last (#2900 fixed similar problem in `XceiverServerGrpc`, introduced in #2824):
   
   ```suggestion
         executor.shutdown();
         executor.awaitTermination(5L, TimeUnit.SECONDS);
         server.shutdown().awaitTermination(5L, TimeUnit.SECONDS);
         eventLoopGroup.shutdownGracefully().sync();
   ```

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +68,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class<? extends ServerChannel> channelType;
+
+    if (Epoll.isAvailable()) {
+      eventLoopGroup = new EpollEventLoopGroup(poolSize / 4);

Review comment:
       Can you please explain how 4 is derived?  (Magic number?)

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
##########
@@ -72,13 +72,25 @@
    */
   @Config(key = "replication.streams.limit",
       type = ConfigType.INT,
-      defaultValue = "10",
+      defaultValue = "12",
       tags = {DATANODE},
       description = "The maximum number of replication commands a single " +
           "datanode can execute simultaneously"
   )
   private int replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
 
+  /**
+   * The maximum number of replication requests.
+   */
+  @Config(key = "replication.queue.limit",

Review comment:
       Cany you please add this new config to `ReplicationServer#ReplicationConfig` instead of `DatanodeConfig`?  That is a more specific class for config related to replication.  If we also moved existing `replicationMaxStreams`, too, then we could avoid introducing `DatanodeConfiguration` in `ReplicationServer`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r774423043



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
##########
@@ -72,13 +72,25 @@
    */
   @Config(key = "replication.streams.limit",
       type = ConfigType.INT,
-      defaultValue = "10",
+      defaultValue = "12",
       tags = {DATANODE},
       description = "The maximum number of replication commands a single " +
           "datanode can execute simultaneously"
   )
   private int replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
 
+  /**
+   * The maximum number of replication requests.
+   */
+  @Config(key = "replication.queue.limit",

Review comment:
       Moving existing config in #2943.

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
##########
@@ -72,13 +72,25 @@
    */
   @Config(key = "replication.streams.limit",
       type = ConfigType.INT,
-      defaultValue = "10",
+      defaultValue = "12",
       tags = {DATANODE},
       description = "The maximum number of replication commands a single " +
           "datanode can execute simultaneously"
   )
   private int replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
 
+  /**
+   * The maximum number of replication requests.
+   */
+  @Config(key = "replication.queue.limit",

Review comment:
       Can you please add this new config to `ReplicationServer#ReplicationConfig` instead of `DatanodeConfig`?  That is a more specific class for config related to replication.  If we also moved existing `replicationMaxStreams`, too, then we could avoid introducing `DatanodeConfiguration` in `ReplicationServer`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] github-actions[bot] closed pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #2498:
URL: https://github.com/apache/ozone/pull/2498


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-948914238


   @ChenSammi it seems unit test failures are related to this change, the same failures repeated twice:
   
   ```
   Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
   Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...
   Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer
   ```
   
   https://github.com/apache/ozone/runs/3933857760?check_suite_focus=true#step:4:1471
   https://github.com/apache/ozone/runs/3967634877?check_suite_focus=true#step:4:1461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r683897522



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.replication;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.security.x509.SecurityConfig;
+import org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration;
+import org.junit.Test;
+
+import java.io.IOException;
+
+/**
+ * Test the replication server.
+ */
+public class TestReplicationServer {
+
+  @Test
+  public void testEpollEnabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {
+      e.printStackTrace();
+    }
+  }
+
+  @Test
+  public void testEpollDisabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    System.setProperty(
+        "org.apache.ratis.thirdparty.io.netty.transport.noNative", "true");
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {

Review comment:
       ditto.

##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.replication;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.security.x509.SecurityConfig;
+import org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration;
+import org.junit.Test;
+
+import java.io.IOException;
+
+/**
+ * Test the replication server.
+ */
+public class TestReplicationServer {
+
+  @Test
+  public void testEpollEnabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {
+      e.printStackTrace();
+    }

Review comment:
       you should just let it throw exception when there's one, otherwise it will still succeed even if an IOException is throw.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-963762403


   > 
   > 
   > This one turned out to be unrelated, fixed by #2806:
   > 
   > > ```
   > > Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...
   > > ```
   > 
   > But these two are still there:
   > 
   > > ```
   > > Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
   > > Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer
   > > ```
   > 
   > https://github.com/apache/ozone/runs/4139083271#step:4:1409
   
   Thanks @adoroszlai , will look at it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r733457198



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +68,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class<? extends ServerChannel> channelType;
+
+    if (Epoll.isAvailable()) {
+      eventLoopGroup = new EpollEventLoopGroup(poolSize * 2);
+      channelType = EpollServerDomainSocketChannel.class;

Review comment:
       How is this different from EpollServerSocketChannel?
   If we use domain socket do we bind to a domain socket instead like the example in here? https://stackoverflow.com/questions/48354562/binding-a-java-grpc-server-to-a-unix-domain-socket/48358536#48358536




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-1063334253


   /pending


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] github-actions[bot] commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-1083829243


   Thank you very much for the patch. I am closing this PR __temporarily__ as there was no activity recently and it is waiting for response from its author.
   
   It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.
   
   It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.
   
   If you need ANY help to finish this PR, please [contact the community](https://github.com/apache/hadoop-ozone#contact) on the mailing list or the slack channel."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] jojochuang commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r732453715



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +67,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class channelType;
+
+    if (Epoll.isAvailable()) {
+      eventLoopGroup = new EpollEventLoopGroup(poolSize * 2);
+      channelType = EpollServerDomainSocketChannel.class;
+    } else {
+      eventLoopGroup = new NioEventLoopGroup(poolSize * 2);
+      channelType = NioServerSocketChannel.class;
+    }
+
     NettyServerBuilder nettyServerBuilder = NettyServerBuilder.forPort(port)
         .maxInboundMessageSize(OzoneConsts.OZONE_SCM_CHUNK_MAX_SIZE)
+        .workerEventLoopGroup(eventLoopGroup)
+        .bossEventLoopGroup(eventLoopGroup)
+        .channelType(channelType)

Review comment:
       we can consider adding KQueueSocketChannel support for Mac later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-963319041


   This one turned out to be unrelated, fixed by #2806:
   
   > ```
   > Error:    TestStorageVolumeChecker.testVolumeDeletion:231 » OutOfMemory unable to create...
   > ```
   
   But these two are still there:
   
   > ```
   > Error:    TestDatanodeStateMachine.testDatanodeStateContext:262 » Timeout Timed out wait...
   > Error:    TestDatanodeUpgradeToScmHA.testReadsDuringFinalization:138->addContainer:612->dispatchRequest:672->dispatchRequest:679 » NullPointer
   > ```
   
   https://github.com/apache/ozone/runs/4139083271#step:4:1409


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-895015754


   Hey  @elek ,  would you help to take another look of this patch ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#issuecomment-895014681


   > 
   > 
   > i don't like how the test is written but as long as it does fail when there's an error, I am fine.
   
   The test tries to verify that when epoll is not available, the eventLoopGroup still can be created.  Previously this part of logic is handled by grpc itself.  Just want to make sure it doesn't break anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r683969313



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.replication;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.security.x509.SecurityConfig;
+import org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration;
+import org.junit.Test;
+
+import java.io.IOException;
+
+/**
+ * Test the replication server.
+ */
+public class TestReplicationServer {
+
+  @Test
+  public void testEpollEnabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {
+      e.printStackTrace();
+    }

Review comment:
       Sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r782257512



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
##########
@@ -72,13 +72,25 @@
    */
   @Config(key = "replication.streams.limit",
       type = ConfigType.INT,
-      defaultValue = "10",
+      defaultValue = "12",
       tags = {DATANODE},
       description = "The maximum number of replication commands a single " +
           "datanode can execute simultaneously"
   )
   private int replicationMaxStreams = REPLICATION_MAX_STREAMS_DEFAULT;
 
+  /**
+   * The maximum number of replication requests.
+   */
+  @Config(key = "replication.queue.limit",

Review comment:
       Done in merge from `master`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r710396341



##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +67,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class channelType;

Review comment:
       Nit:
   
   ```suggestion
       Class<? extends ServerChannel> channelType;
   ```
   
   (+ import)

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/ReplicationServer.java
##########
@@ -57,22 +67,52 @@
 
   private int port;
 
+  private int poolSize;
+
+  private int queueLimit;
+
+  private ThreadPoolExecutor executor;
+
+  private EventLoopGroup eventLoopGroup;
+
   public ReplicationServer(
       ContainerController controller,
       ReplicationConfig replicationConfig,
-      SecurityConfig secConf,
-      CertificateClient caClient
-  ) {
+      SecurityConfig secConf, DatanodeConfiguration dnConfig,
+      CertificateClient caClient) {
     this.secConf = secConf;
     this.caClient = caClient;
     this.controller = controller;
     this.port = replicationConfig.getPort();
+    this.poolSize = dnConfig.getReplicationMaxStreams();
+    this.queueLimit = dnConfig.getReplicationQueueLimit();
     init();
   }
 
   public void init() {
+    executor = new ThreadPoolExecutor(
+        poolSize, poolSize, 60, TimeUnit.SECONDS,
+        new LinkedBlockingQueue<>(queueLimit),
+        new ThreadFactoryBuilder().setDaemon(true)
+            .setNameFormat("ReplicationServerExecutor-%d")
+            .build());
+
+    Class channelType;
+
+    if (Epoll.isAvailable()) {
+      eventLoopGroup = new EpollEventLoopGroup(poolSize * 2);
+      channelType = EpollServerDomainSocketChannel.class;
+    } else {
+      eventLoopGroup = new NioEventLoopGroup(poolSize * 2);
+      channelType = NioServerSocketChannel.class;
+    }
+
     NettyServerBuilder nettyServerBuilder = NettyServerBuilder.forPort(port)
         .maxInboundMessageSize(OzoneConsts.OZONE_SCM_CHUNK_MAX_SIZE)
+        .workerEventLoopGroup(eventLoopGroup)
+        .bossEventLoopGroup(eventLoopGroup)
+        .channelType(channelType)

Review comment:
       Can you please explain why setting channel type and event loop group are needed?  Based on `NettyServerBuilder#channelType` javadoc this is the same as the default behavior:
   
   ```
   Specifies the channel type to use, by default we use {@code EpollServerSocketChannel} if
   available, otherwise using {@link NioServerSocketChannel}.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a change in pull request #2498: HDDS-5537. Limit grpc threads in ReplicationServer.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on a change in pull request #2498:
URL: https://github.com/apache/ozone/pull/2498#discussion_r683967584



##########
File path: hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationServer.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.container.replication;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.security.x509.SecurityConfig;
+import org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration;
+import org.junit.Test;
+
+import java.io.IOException;
+
+/**
+ * Test the replication server.
+ */
+public class TestReplicationServer {
+
+  @Test
+  public void testEpollEnabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {
+      e.printStackTrace();
+    }
+  }
+
+  @Test
+  public void testEpollDisabled() {
+    OzoneConfiguration config = new OzoneConfiguration();
+    DatanodeConfiguration dnConf =
+        config.getObject(DatanodeConfiguration.class);
+    ReplicationServer.ReplicationConfig replicationConfig =
+        config.getObject(ReplicationServer.ReplicationConfig.class);
+    SecurityConfig secConf = new SecurityConfig(config);
+    System.setProperty(
+        "org.apache.ratis.thirdparty.io.netty.transport.noNative", "true");
+    ReplicationServer server = new ReplicationServer(null, replicationConfig,
+        secConf, dnConf, null);
+    try {
+      server.start();
+      server.stop();
+    } catch (IOException e) {

Review comment:
       OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org