You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/07/23 14:38:52 UTC
[GitHub] flink pull request #6395: [FLINK-9900][tests] Harden ZooKeeperHighAvailabili...
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/6395
[FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase
## What is the purpose of the change
This PR makes a few modifications to the `ZooKeeperHighAvailabilityITCase` to reduce the chances for intermittent test failures and timeouts.
Changes:
## 1)
The test was moving files out of the HA storage directory with a simple loop using `File#renameTo`. The test enforced that the moving is successful, however since old checkpoints may be deleted asynchronously this may not always be the case.
We now use a `FileVisitor` and ignore `IOExceptions` that occur while moving.
If no checkpoint file could be moved the test will still fail.
## 2)
After the checkpoint files were moved out of the HA storage directory the job is thrown into a restart loop. To verify the restart behavior the test was polling the job state and checked for the `RESTARTING` and `FAILING` states.
Due to the small size the job is in these states only for a short time, effectively adding a race condition. Thus this loop mayrun for longer than anticipated; the largest outlier i got locally was 50 seconds which isn't _that_ for off from the 2 minute timeout. I suspect this to be the failure cause raised in the JIRA, but I can't guarantee it.
Instead we now access the `fullRestarts` metric using a custom reporter to check how many restarts have occurred. The actual _state transitions_ should be irrelevant to the test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 9900
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6395.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6395
----
commit b8827dc3723558c52ad567bf88f24ae34129ea08
Author: zentol <ch...@...>
Date: 2018-07-23T14:21:32Z
[FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase
----
---