You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/07/14 08:31:32 UTC

[GitHub] [hadoop-ozone] ChenSammi commented on a change in pull request #1185: HDDS-3933. Fix memory leak because of too many Datanode State Machine Thread

ChenSammi commented on a change in pull request #1185:
URL: https://github.com/apache/hadoop-ozone/pull/1185#discussion_r454191790



##########
File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
##########
@@ -268,6 +268,13 @@
       datanode periodically send node report to SCM. Unit could be
       defined with postfix (ns,ms,s,m,h,d)</description>
   </property>
+  <property>
+    <name>hdds.statemachine.endpoint.task.thread.count</name>
+    <value>2</value>
+    <tag>OZONE, DATANODE, MANAGEMENT</tag>
+    <description>Maximum number of threads in the thread pool that Datanode
+      will use for GETVERSION, REGISTER, HEARTBEAT to SCMs.</description>
+  </property>

Review comment:
       Can we add more details about what's the proper value for this proerpty if user want to change it? 

##########
File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/datanode/RunningDatanodeState.java
##########
@@ -200,22 +203,31 @@ public void execute(ExecutorService executor) {
   @Override
   public DatanodeStateMachine.DatanodeStates
       await(long duration, TimeUnit timeUnit)
-      throws InterruptedException, ExecutionException, TimeoutException {
+      throws InterruptedException {
     int count = connectionManager.getValues().size();
     int returned = 0;
-    long timeLeft = timeUnit.toMillis(duration);
+    long durationMS = timeUnit.toMillis(duration);
     long startTime = Time.monotonicNow();
-    List<Future<EndPointStates>> results = new LinkedList<>();
-
-    while (returned < count && timeLeft > 0) {
-      Future<EndPointStates> result =
-          ecs.poll(timeLeft, TimeUnit.MILLISECONDS);
-      if (result != null) {
-        results.add(result);
-        returned++;
+    Set<Future<EndPointStates>> results = new HashSet<>();
+
+    while (returned < count
+        && (durationMS - (Time.monotonicNow() - startTime))> 0) {
+      for (Future<EndPointStates> future : futures) {
+        if (future != null && future.isDone() && !results.contains(future)) {
+          results.add(future);
+          returned++;
+        }
       }
-      timeLeft = timeLeft - (Time.monotonicNow() - startTime);
+
+      Thread.sleep(durationMS / 10);
     }
+
+    for (Future<EndPointStates> future : futures) {
+      if (future != null && !future.isDone()) {
+        future.cancel(true);

Review comment:
       Instead of cancel the future, I would suggest to skip the next time execution, and let the current future finish it.  Also, please add LOG if this case happens. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org