You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2019/11/04 14:03:23 UTC

[GitHub] [hadoop-ozone] elek edited a comment on issue #11: HDDS-2291. Acceptance tests for OM HA.

elek edited a comment on issue #11: HDDS-2291. Acceptance tests for OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/11#issuecomment-544410792

Thank you very much the patch @hanishakoneru . Overall I am very to happy have more HA tests with the robot framework and I would be happy to commit it (after clean builds).

_Personally_ I would prefer to use a different approach, but it's only because I may have different thinking. It may not be better or worse. The only thing what I would like to do here is the explain my view, just because this is the fan part: to understand the thinking of each other.

__1. The level of the tests__

To run acceptance test we need to solve two problems:

1.) Create a running ozone cluster (and may restart services during the tests)
2.) Execute commands / check the results (run tests + assert)

Currently these two roles/levels are separated.

The second one is implemented by the [robot tests](https://github.com/apache/hadoop-ozone/tree/master/hadoop-ozone/dist/src/main/smoketest) but the (existing) robot tests don't include any logic to start (or restart) services.

The environments are mainly defined with docker-compose files and the logic to start them is defined by __shell scripts__ (for example [this](https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozone/test.sh) is the simplest one)

The two levels/roles are separated.

__ 2. the flexibility __

The main advantage of this approach that you can run the tests in different environments. For example I can replace the __shell__ script based cluster creation process with anything else.

1. I can create kubernetes clusters and execute the same robot tests inside.
2. Anybody can execute the same robot tests in any commercial Hadoop/Ozone distribution

__ 3. blockade __

Blockade based tests are slightly different. They do both 1 (cluster creation) and 2 (test + assertion). Mainly because they are more interested about the environment setup (creating cluster, shutting down nodes, etc.).

They do all the cluster set up / tear down based on docker-compose and the logic is defined in python scripts.

__ 4. docker + ssh __

This patch follows a different approach. Instead of using docker-compose to start/stop/restart services/nodes it installs an additional ssh daemon inside the containers to make it possible to restart the jvm process instead of the containers. (docker-compose is used to start/stop services and ssh daemons are used to restart)

Usually this is not the way which is suggested to use in containerized environments. With docker usually it's easier to restart the containers and run only one process per container (and it provides better separation and easier management).

__ 5. this patch __

But the previous approach (using docker-compose to start / stop instead of ssh) is not portable at all. It can't be started inside kubernetes with little effort(for example).

On the other hand this *patch can be used very easily* in other environments as the "service restart" part of the environment management is included (with the help of ssh).

**Summary**:

* This is a slightly different approach what we followed in the normal tests and not the mainstream usage of the containers
* But it's very effective and has some clear advantages (easier to re-use tests in different env)
* I have ideas how can it be done in a different way but they have different drawbacks (and different advantages)

With other words: if we separate the _environment creation_ from the _test definitions_ where should we put the restart functionality to. You put it to the place where we have the _test definition_, I described a system where it can be put to the place where we have the _environment creation_.

I think both approach is acceptable, __I will commit this one after a green acceptance test run__.

(And we can continue the thinking about how these tests can be evolved. For example: Do we need to separate these kind of the tests and create more tests where we restart clusters?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org