You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/18 04:26:16 UTC

[GitHub] [ozone] symious opened a new pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

symious opened a new pull request #2550:
URL: https://github.com/apache/ozone/pull/2550


   ## What changes were proposed in this pull request?
   
   Root cause is when MiniOzoneHAClusterImpl#bootstrapOzoneManager is creating a new OM, it may encounter a port conflict, this function will retry with a new port, but before that, the metadataManager of the first OM didn't close the lock on the rocksdb, which causes the test to fail for the retry.
   
   Options to solve:
   
   I tried to add a "metadataManager.stop()" in the constructor of OM when it fails to start RPC server, but it will prompt another error about the lock on ratis directory.
   I tried to stop the ratisServer too, but in https://github.com/apache/ratis/blob/dc0b68b4c0b8c187a08f669422a2cd099d7be0b7/ratis-common/src/main/java/org/apache/ratis/util/LifeCycle.java#L308, the close function will not be called, so the lock won't be released. Tried to call the closeMethod for State.NEW, but something wrong else happened.
   So I think it's much easier to just check if the port is available in MiniOzoneHAClusterImpl. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5632
   
   ## How was this patch tested?
   
   Test passed with following change.
   ```
   @@ -697,9 +698,11 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
    
        long leaderSnapshotIndex = getOMLeader().getRatisSnapshotIndex();
    
   +    int start = 0;
        while (true) {
          try {
   -        basePort = 10000 + RANDOM.nextInt(1000) * 4;
   +//        basePort = 10000 + RANDOM.nextInt(1000) * 4;
   +        basePort = 10000 + start * 4;
            OzoneConfiguration newConf = addNewOMToConfig(getOMServiceId(),
                omNodeId, basePort);
    
   @@ -721,6 +724,7 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
            if (e instanceof BindException ||
                e.getCause() instanceof BindException) {
              ++retryCount;
   +          start++;
              LOG.info("MiniOzoneHACluster port conflicts, retried {} times",
                  retryCount);
            } else {
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691083966



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       I'd go for calling it once for each distinct port.  Any port can be occupied independently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691394753



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       This reminds me: we could use `NetUtils.createLocalServerAddress(int)` from Ratis to get multiple free ports.
   
   https://github.com/apache/ratis/blob/5e444084307dce717816dd16c7942d785fd3e9d1/ratis-common/src/main/java/org/apache/ratis/util/NetUtils.java#L119-L143




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691404343



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       The Ratis version is nice. It actually binds the socket and keeps it bound while it gets the rest of the requested sockets. That means it will return a list of unique sockets for sure, while the version we are working with, could return duplicate ports, although that is being handled via adding the port to the set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691027164



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Rather than trying incremental ports, can we just ask the OS for a free one, eg:
   
   private int getFreePort() {
     ServerSocket ss = null;
     try {
       ss = new ServerSocket(0);
       return ss.getLocalPort();
     } catch { 
        ...
     } finally {
       if (ss != null) {
         ss.close();
       }
   }

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Rather than trying incremental ports, can we just ask the OS for a free one, eg:
   
   ```
   private int getFreePort() {
     ServerSocket ss = null;
     try {
       ss = new ServerSocket(0);
       return ss.getLocalPort();
     } catch { 
        ...
     } finally {
       if (ss != null) {
         ss.close();
       }
   }
   ```

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       The discussion here suggests a few problems etc with various approaches:
   
   https://stackoverflow.com/questions/434718/sockets-discover-port-availability-using-java

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       I wonder if the method could give the same port back on consecutive calls. I guess we can try it and see.
   
   It would be great if we could just pass port zero to OM and let it start on a free port by itself, but I guess other things need to connect to it, so they need the port, which would be unknown. Therefore the approach we are taking here is likely the most sensible.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {

Review comment:
       It would be good to reuse `SCMTestUtils.getReusableAddress()` here rather than creating a new version in this class. It returns an InetSocketAddress, but we can easily get the port from it.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       If getFreePort() returns -1, then it will just add it to the set, which will cause problems later I think?
   
   However if we switch to use `SCMTestUtils.getReusableAddress()` then we can handle the exception it throws and try again to get an address.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       The Ratis version is nice. It actually binds the socket and keeps it bound while it gets the rest of the requested sockets. That means it will return a list of unique sockets for sure, while the version we are working with, could return duplicate ports, although that is being handled via adding the port to the set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r692044746



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       As we already have a dependency on Ratis across the project its fine to add the dependency. If we didn't I probably would copy the code, rather than take a dependency just for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#issuecomment-901945023


   Thanks for the review. @sodonnel @adoroszlai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691768736



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       Yes, the ratis version is more mature. Have updated to use the ratis version.
   Got a small problem, if I didn't use ratis in the project, but I'd want to use the utils provided by ratis, should I copy the code or add the dependency?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691027164



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Rather than trying incremental ports, can we just ask the OS for a free one, eg:
   
   ```
   private int getFreePort() {
     ServerSocket ss = null;
     try {
       ss = new ServerSocket(0);
       return ss.getLocalPort();
     } catch { 
        ...
     } finally {
       if (ss != null) {
         ss.close();
       }
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691027164



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Rather than trying incremental ports, can we just ask the OS for a free one, eg:
   
   private int getFreePort() {
     ServerSocket ss = null;
     try {
       ss = new ServerSocket(0);
       return ss.getLocalPort();
     } catch { 
        ...
     } finally {
       if (ss != null) {
         ss.close();
       }
   }




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691380670



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       If getFreePort() returns -1, then it will just add it to the set, which will cause problems later I think?
   
   However if we switch to use `SCMTestUtils.getReusableAddress()` then we can handle the exception it throws and try again to get an address.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#issuecomment-900802673


   @adoroszlai Could you help to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691046302



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       There are already some tests that use random port using this `ServerSocket(0)` technique.  We could just call this method to get OM port:
   
   https://github.com/apache/ozone/blob/d96399142f1f84f54d6dd783dbff6b8415f1c1e6/hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/SCMTestUtils.java#L115-L122

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       I'd go for calling it once for each distinct port.  Any port can be occupied independently.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       This reminds me: we could use `NetUtils.createLocalServerAddress(int)` from Ratis to get multiple free ports.
   
   https://github.com/apache/ratis/blob/5e444084307dce717816dd16c7942d785fd3e9d1/ratis-common/src/main/java/org/apache/ratis/util/NetUtils.java#L119-L143




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#issuecomment-900802673


   @adoroszlai Could you help to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691377863



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {

Review comment:
       It would be good to reuse `SCMTestUtils.getReusableAddress()` here rather than creating a new version in this class. It returns an InetSocketAddress, but we can easily get the port from it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691037186



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       @sodonnel Thanks for review.
   Yeah, I was checking the same page, too.
   
   Normally we don't use ports under 1000, but in docker it should be fine?
   Another case is if the port returned is larger than 65531, it will exceed the port range?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691050325



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Sure. 
   Do we need to call the method 4 times, or add a restriction that basePort is under 65532?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691031199



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       The discussion here suggests a few problems etc with various approaches:
   
   https://stackoverflow.com/questions/434718/sockets-discover-port-availability-using-java




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2550:
URL: https://github.com/apache/ozone/pull/2550


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691116307



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       I wonder if the method could give the same port back on consecutive calls. I guess we can try it and see.
   
   It would be great if we could just pass port zero to OM and let it start on a free port by itself, but I guess other things need to connect to it, so they need the port, which would be unknown. Therefore the approach we are taking here is likely the most sensible.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691037186



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       @sodonnel Thanks for review.
   Yeah, I was checking the same page, too.
   
   Normally we don't use ports under 1000, but in docker it should be fine?
   Another case is if the port returned is larger than 65531, it will exceed the port range?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       Sure. 
   Do we need to call the method 4 times, or add a restriction that basePort is under 65532?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       Sure, updated to use `SCMTestUtils.getReusableAddress()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691046302



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -700,6 +701,9 @@ public void bootstrapOzoneManager(String omNodeId) throws Exception {
     while (true) {
       try {
         basePort = 10000 + RANDOM.nextInt(1000) * 4;
+        if (!isPortAvailable(basePort)) {

Review comment:
       There are already some tests that use random port using this `ServerSocket(0)` technique.  We could just call this method to get OM port:
   
   https://github.com/apache/ozone/blob/d96399142f1f84f54d6dd783dbff6b8415f1c1e6/hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/SCMTestUtils.java#L115-L122




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2550: HDDS-5632. Intermittent failure in TestOzoneManagerBootstrap#testBootstrapTwoNewOMs

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2550:
URL: https://github.com/apache/ozone/pull/2550#discussion_r691399209



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneHAClusterImpl.java
##########
@@ -966,4 +970,31 @@ public StorageContainerManager getStorageContainerManager() {
     return getStorageContainerManagers().get(0);
   }
 
+  private int getFreePort() {
+    ServerSocket ss = null;
+    try {
+      ss = new ServerSocket(0);
+      return ss.getLocalPort();
+    } catch (IOException e) {
+      e.printStackTrace();
+    } finally {
+      if (ss != null) {
+        try {
+          ss.close();
+        } catch (IOException e) {
+          LOG.error("Got exception while closing ServerSocket: " +
+              e.getMessage());
+        }
+      }
+    }
+    return -1;
+  }
+
+  private Set<Integer> getFreePortSet(int size) {
+    Set<Integer> portSet = new HashSet<>();
+    while (portSet.size() < size) {
+      portSet.add(getFreePort());

Review comment:
       Sure, updated to use `SCMTestUtils.getReusableAddress()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org