You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2021/08/19 13:33:00 UTC

[jira] [Created] (HDDS-5644) Speed up decommission tests using a background Mini Cluster provider

Stephen O'Donnell created HDDS-5644:
---------------------------------------

             Summary: Speed up decommission tests using a background Mini Cluster provider
                 Key: HDDS-5644
                 URL: https://issues.apache.org/jira/browse/HDDS-5644
             Project: Apache Ozone
          Issue Type: Improvement
          Components: SCM
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


The integration (ozone) test suit is the slowest part of the github actions build, taking over 2 hours usually. In a random PR I checked, 2hr16.

Often in integration tests, a large part of the test time is spent creating a new mini-Ozone cluster for each test, which can take 10 - 20 seconds to startup.

I also timed stopping a mini-cluster and found that can take up to 10 seconds.

Changing the tests to reuse the same cluster can be difficult and make the tests less standalone and more brittle, which is not a good thing. Changing the tests is also time consuming work.

Assuming a test runs for longer than the time taken to setup a mini-cluster and stop it, it would make the tests faster if we pre-created a mini-cluster in the background. Then when one test completes, the next cluster is already there, saving the startup time. Obviously this costs more concurrent cpu to reduce the wall clock time.

We could also queue the shutdown of the clusters in another background thread.

The slowest part of the Integration (Ozone) test suit are the decommission tests, taking 843 seconds on the last run I checked.

This PR adds a Mini-Cluster provider to the Decommission tests as an experiment to see if it makes the runtime significantly faster in practice. If it does, this may be something we can roll out across other integration tests.

As a baseline, I ran the decommission tests on my laptop, and it took 8min 37s.

After the changes in this PR, the test suit ran in 3min 53s.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org