You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/19 13:39:51 UTC

[GitHub] [ozone] sodonnel opened a new pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

sodonnel opened a new pull request #2554:
URL: https://github.com/apache/ozone/pull/2554


   ## What changes were proposed in this pull request?
   
   The integration (ozone) test suit is the slowest part of the github actions build, taking over 2 hours usually. In a random PR I checked, 2hr16.
   
   Often in integration tests, a large part of the test time is spent creating a new mini-Ozone cluster for each test, which can take 10 - 20 seconds to startup.
   
   I also timed stopping a mini-cluster and found that can take up to 10 seconds.
   
   Changing the tests to reuse the same cluster can be difficult and make the tests less standalone and more brittle, which is not a good thing. Changing the tests is also time consuming work.
   
   Assuming a test runs for longer than the time taken to setup a mini-cluster and stop it, it would make the tests faster if we pre-created a mini-cluster in the background. Then when one test completes, the next cluster is already there, saving the startup time. Obviously this costs more concurrent cpu to reduce the wall clock time.
   
   We could also queue the shutdown of the clusters in another background thread.
   
   The slowest part of the Integration (Ozone) test suit are the decommission tests, taking 843 seconds on the last run I checked.
   
   This PR adds a Mini-Cluster provider to the Decommission tests as an experiment to see if it makes the runtime significantly faster in practice. If it does, this may be something we can roll out across other integration tests.
   
   As a baseline, I ran the decommission tests on my laptop, and it took 8min 37s.
   
   After the changes in this PR, the test suit ran in 3min 53s.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5644
   
   ## How was this patch tested?
   
   Existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] avijayanhwx commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-905013641


   This is a great improvement, thanks for working on this @sodonnel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695566169



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;
+    private Thread createThread;
+    private Thread reapThread;
+
+    private final BlockingQueue<MiniOzoneCluster> clusters
+        = new ArrayBlockingQueue<>(preCreatedLimit);
+    private final BlockingQueue<MiniOzoneCluster> expiredClusters
+        = new ArrayBlockingQueue<>(1024);
+
+
+    public MiniClusterProvider(OzoneConfiguration conf,
+        MiniOzoneCluster.Builder builder) {
+      this.conf = conf;
+      this.builder = builder;
+      createThread = createClusters();
+      reapThread = reapClusters();
+    }
+
+    public MiniOzoneCluster provide() throws InterruptedException {
+      return clusters.poll(100, SECONDS);
+    }
+
+    public void destroy(MiniOzoneCluster c) throws InterruptedException {
+      expiredClusters.put(c);
+    }
+
+    public void shutdown() throws InterruptedException {
+      shouldRun = false;
+      createThread.interrupt();
+      createThread.join();
+      destroyRemainingClusters();
+      shouldReap = false;
+      reapThread.join();
+    }
+
+    private Thread reapClusters() {
+      Thread t = new Thread(() -> {
+        while(shouldReap || !expiredClusters.isEmpty()) {
+          try {
+            MiniOzoneCluster c = expiredClusters.take();
+            c.shutdown();
+          } catch (InterruptedException e) {
+            break;
+          }
+        }
+      });
+      t.start();
+      return t;
+    }
+
+    private Thread createClusters() {
+      Thread t = new Thread(() -> {
+        while (shouldRun && !Thread.interrupted()) {
+          MiniOzoneCluster cluster = null;
+          try {
+            builder.setClusterId(UUID.randomUUID().toString());
+
+            OzoneConfiguration newConf = new OzoneConfiguration(conf);
+            List<Integer> portList = getFreePortList(4);
+            newConf.set(OMConfigKeys.OZONE_OM_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(0));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTP_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(1));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTPS_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(2));
+            newConf.setInt(OMConfigKeys.OZONE_OM_RATIS_PORT_KEY,
+                portList.get(3));
+            builder.setConf(newConf);
+
+            cluster = builder.build();
+            cluster.waitForClusterToBeReady();
+            clusters.put(cluster);

Review comment:
       I think solving this is tricky. The tests don't really know how many clusters they will need. We can assume its one cluster per test, so if the class has 5 tests, we configure 5. Later, someone will add another test and forget to update the setting and the last test will need to wait on a cluster or get an error. 
   
   Alternatively we could use reflection and check for methods starting "test" but that seems like more hassle than its worth.
   
   I guess if the provider throws an exception if you ask for more clusters than it was configured to provide, it would catch the case of not configuring enough clusters and hence would enforce setting the correct limit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695622945



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;

Review comment:
       I think you are correct. I have refactored a bit to remove these variables as suggested.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-905355727


   @sodonnel Thanks for this work, it looks a really good improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-909094616


   @errose28 @avijayanhwx Have you got any further comments or concerns here, or are you happy for me to commit this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2554:
URL: https://github.com/apache/ozone/pull/2554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695934938



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;
+    private Thread createThread;
+    private Thread reapThread;
+
+    private final BlockingQueue<MiniOzoneCluster> clusters
+        = new ArrayBlockingQueue<>(preCreatedLimit);
+    private final BlockingQueue<MiniOzoneCluster> expiredClusters
+        = new ArrayBlockingQueue<>(1024);
+
+
+    public MiniClusterProvider(OzoneConfiguration conf,
+        MiniOzoneCluster.Builder builder) {
+      this.conf = conf;
+      this.builder = builder;
+      createThread = createClusters();
+      reapThread = reapClusters();
+    }
+
+    public MiniOzoneCluster provide() throws InterruptedException {
+      return clusters.poll(100, SECONDS);
+    }
+
+    public void destroy(MiniOzoneCluster c) throws InterruptedException {
+      expiredClusters.put(c);
+    }
+
+    public void shutdown() throws InterruptedException {
+      shouldRun = false;
+      createThread.interrupt();
+      createThread.join();
+      destroyRemainingClusters();
+      shouldReap = false;
+      reapThread.join();
+    }
+
+    private Thread reapClusters() {
+      Thread t = new Thread(() -> {
+        while(shouldReap || !expiredClusters.isEmpty()) {
+          try {
+            MiniOzoneCluster c = expiredClusters.take();
+            c.shutdown();
+          } catch (InterruptedException e) {
+            break;
+          }
+        }
+      });
+      t.start();
+      return t;
+    }
+
+    private Thread createClusters() {
+      Thread t = new Thread(() -> {
+        while (shouldRun && !Thread.interrupted()) {
+          MiniOzoneCluster cluster = null;
+          try {
+            builder.setClusterId(UUID.randomUUID().toString());
+
+            OzoneConfiguration newConf = new OzoneConfiguration(conf);
+            List<Integer> portList = getFreePortList(4);
+            newConf.set(OMConfigKeys.OZONE_OM_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(0));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTP_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(1));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTPS_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(2));
+            newConf.setInt(OMConfigKeys.OZONE_OM_RATIS_PORT_KEY,
+                portList.get(3));
+            builder.setConf(newConf);
+
+            cluster = builder.build();
+            cluster.waitForClusterToBeReady();
+            clusters.put(cluster);

Review comment:
       Having the provider throw an exception in this case sounds good to me. Will fail fast and be easy to fix when adding new tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-907268194


   @errose28 @avijayanhwx Could you guys have another look? I think I am finished with the changes and have addressed the earlier comments. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] avijayanhwx commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695254418



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;
+    private Thread createThread;
+    private Thread reapThread;
+
+    private final BlockingQueue<MiniOzoneCluster> clusters
+        = new ArrayBlockingQueue<>(preCreatedLimit);
+    private final BlockingQueue<MiniOzoneCluster> expiredClusters
+        = new ArrayBlockingQueue<>(1024);
+
+
+    public MiniClusterProvider(OzoneConfiguration conf,
+        MiniOzoneCluster.Builder builder) {
+      this.conf = conf;
+      this.builder = builder;
+      createThread = createClusters();
+      reapThread = reapClusters();
+    }
+
+    public MiniOzoneCluster provide() throws InterruptedException {
+      return clusters.poll(100, SECONDS);
+    }
+
+    public void destroy(MiniOzoneCluster c) throws InterruptedException {
+      expiredClusters.put(c);
+    }
+
+    public void shutdown() throws InterruptedException {
+      shouldRun = false;
+      createThread.interrupt();
+      createThread.join();
+      destroyRemainingClusters();
+      shouldReap = false;
+      reapThread.join();
+    }
+
+    private Thread reapClusters() {
+      Thread t = new Thread(() -> {
+        while(shouldReap || !expiredClusters.isEmpty()) {
+          try {
+            MiniOzoneCluster c = expiredClusters.take();
+            c.shutdown();
+          } catch (InterruptedException e) {
+            break;
+          }
+        }
+      });
+      t.start();
+      return t;
+    }
+
+    private Thread createClusters() {
+      Thread t = new Thread(() -> {
+        while (shouldRun && !Thread.interrupted()) {
+          MiniOzoneCluster cluster = null;
+          try {
+            builder.setClusterId(UUID.randomUUID().toString());
+
+            OzoneConfiguration newConf = new OzoneConfiguration(conf);
+            List<Integer> portList = getFreePortList(4);
+            newConf.set(OMConfigKeys.OZONE_OM_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(0));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTP_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(1));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTPS_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(2));
+            newConf.setInt(OMConfigKeys.OZONE_OM_RATIS_PORT_KEY,
+                portList.get(3));
+            builder.setConf(newConf);
+
+            cluster = builder.build();
+            cluster.waitForClusterToBeReady();
+            clusters.put(cluster);

Review comment:
       +1 to @errose28's comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #2554:
URL: https://github.com/apache/ozone/pull/2554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695971650



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;
+    private Thread createThread;
+    private Thread reapThread;
+
+    private final BlockingQueue<MiniOzoneCluster> clusters
+        = new ArrayBlockingQueue<>(preCreatedLimit);
+    private final BlockingQueue<MiniOzoneCluster> expiredClusters
+        = new ArrayBlockingQueue<>(1024);
+
+
+    public MiniClusterProvider(OzoneConfiguration conf,
+        MiniOzoneCluster.Builder builder) {
+      this.conf = conf;
+      this.builder = builder;
+      createThread = createClusters();
+      reapThread = reapClusters();
+    }
+
+    public MiniOzoneCluster provide() throws InterruptedException {
+      return clusters.poll(100, SECONDS);
+    }
+
+    public void destroy(MiniOzoneCluster c) throws InterruptedException {
+      expiredClusters.put(c);
+    }
+
+    public void shutdown() throws InterruptedException {
+      shouldRun = false;
+      createThread.interrupt();
+      createThread.join();
+      destroyRemainingClusters();
+      shouldReap = false;
+      reapThread.join();
+    }
+
+    private Thread reapClusters() {
+      Thread t = new Thread(() -> {
+        while(shouldReap || !expiredClusters.isEmpty()) {
+          try {
+            MiniOzoneCluster c = expiredClusters.take();
+            c.shutdown();
+          } catch (InterruptedException e) {
+            break;
+          }
+        }
+      });
+      t.start();
+      return t;
+    }
+
+    private Thread createClusters() {
+      Thread t = new Thread(() -> {
+        while (shouldRun && !Thread.interrupted()) {
+          MiniOzoneCluster cluster = null;
+          try {
+            builder.setClusterId(UUID.randomUUID().toString());
+
+            OzoneConfiguration newConf = new OzoneConfiguration(conf);
+            List<Integer> portList = getFreePortList(4);
+            newConf.set(OMConfigKeys.OZONE_OM_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(0));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTP_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(1));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTPS_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(2));
+            newConf.setInt(OMConfigKeys.OZONE_OM_RATIS_PORT_KEY,
+                portList.get(3));
+            builder.setConf(newConf);
+
+            cluster = builder.build();
+            cluster.waitForClusterToBeReady();
+            clusters.put(cluster);

Review comment:
       OK - I will make this change and see how it looks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 removed a comment on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
JacksonYao287 removed a comment on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-905355727


   @sodonnel Thanks for this work, it looks a really good improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-905736716


   I was experimenting with some of the HA tests, eg TestOzoneManagerHAWithData and one thing I noticed is that there are issues around JMX registration and multiple clusters running at the same time. That might be something that needs fixing where the JMX objects are registered.
   
   I also suspect there may be some OM Ratis problems, but I have not got to the bottom of that yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-909094616


   @errose28 @avijayanhwx Have you got any further comments or concerns here, or are you happy for me to commit this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#discussion_r695052521



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;

Review comment:
       I think shouldRun and shouldReap should be volatile to make sure the create and reap threads see the latest values on each iteration. Do we actually need these variables or can we just interrupt the threads to stop the loops?

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDecommissionAndMaintenance.java
##########
@@ -98,8 +104,123 @@
 
   private ContainerOperationClient scmClient;
 
-  @Before
-  public void setUp() throws Exception {
+  private static MiniClusterProvider clusterProvider;
+
+  /**
+   * Class to create mini-clusters in the background.
+   */
+  public static class MiniClusterProvider {
+
+    private int preCreatedLimit = 1;
+
+    private final OzoneConfiguration conf;
+    private final MiniOzoneCluster.Builder builder;
+    private boolean shouldRun = true;
+    private boolean shouldReap = true;
+    private Thread createThread;
+    private Thread reapThread;
+
+    private final BlockingQueue<MiniOzoneCluster> clusters
+        = new ArrayBlockingQueue<>(preCreatedLimit);
+    private final BlockingQueue<MiniOzoneCluster> expiredClusters
+        = new ArrayBlockingQueue<>(1024);
+
+
+    public MiniClusterProvider(OzoneConfiguration conf,
+        MiniOzoneCluster.Builder builder) {
+      this.conf = conf;
+      this.builder = builder;
+      createThread = createClusters();
+      reapThread = reapClusters();
+    }
+
+    public MiniOzoneCluster provide() throws InterruptedException {
+      return clusters.poll(100, SECONDS);
+    }
+
+    public void destroy(MiniOzoneCluster c) throws InterruptedException {
+      expiredClusters.put(c);
+    }
+
+    public void shutdown() throws InterruptedException {
+      shouldRun = false;
+      createThread.interrupt();
+      createThread.join();
+      destroyRemainingClusters();
+      shouldReap = false;
+      reapThread.join();
+    }
+
+    private Thread reapClusters() {
+      Thread t = new Thread(() -> {
+        while(shouldReap || !expiredClusters.isEmpty()) {
+          try {
+            MiniOzoneCluster c = expiredClusters.take();
+            c.shutdown();
+          } catch (InterruptedException e) {
+            break;
+          }
+        }
+      });
+      t.start();
+      return t;
+    }
+
+    private Thread createClusters() {
+      Thread t = new Thread(() -> {
+        while (shouldRun && !Thread.interrupted()) {
+          MiniOzoneCluster cluster = null;
+          try {
+            builder.setClusterId(UUID.randomUUID().toString());
+
+            OzoneConfiguration newConf = new OzoneConfiguration(conf);
+            List<Integer> portList = getFreePortList(4);
+            newConf.set(OMConfigKeys.OZONE_OM_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(0));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTP_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(1));
+            newConf.set(OMConfigKeys.OZONE_OM_HTTPS_ADDRESS_KEY,
+                "127.0.0.1:" + portList.get(2));
+            newConf.setInt(OMConfigKeys.OZONE_OM_RATIS_PORT_KEY,
+                portList.get(3));
+            builder.setConf(newConf);
+
+            cluster = builder.build();
+            cluster.waitForClusterToBeReady();
+            clusters.put(cluster);

Review comment:
       It looks like for every test suite this will build an extra cluster that never gets used. Since all tests know how many clusters they need, perhaps the number of clusters to make can be configured and then either
   1. The create thread keeps creating clusters and queueing them until that number is reached.
       - Faster, but could result in lots of clusters being up at once, which might be bad for performance.
    or
    2. We only leave one cluster in the queue at a time, but the create thread checks if it has made the required number of clusters before making another one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-902133041


   With this change the decommission tests ran:
   
   ```
   [INFO] Running org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance
   [INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 520.23 s - in org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance
   ```
   
   An older PR:
   
   ```
   [INFO] Running org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance
   [INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 843.013 s - in org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance
   ```
   
   So it looks like this saved 343 seconds (over 5 minutes). Also notable, the CI server runs these tests slower than my laptop locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2554: HDDS-5644. Speed up decommission tests using a background Mini Cluster provider

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2554:
URL: https://github.com/apache/ozone/pull/2554#issuecomment-909094616


   @errose28 @avijayanhwx Have you got any further comments or concerns here, or are you happy for me to commit this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org