You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/07/23 14:38:52 UTC

[GitHub] flink pull request #6395: [FLINK-9900][tests] Harden ZooKeeperHighAvailabili...

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/6395

    [FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase

    ## What is the purpose of the change
    
    This PR makes a few modifications to the `ZooKeeperHighAvailabilityITCase` to reduce the chances for intermittent test failures and timeouts.
    
    Changes:
    ## 1)
    The test was moving files out of the HA storage directory with a simple loop using `File#renameTo`. The test enforced that the moving is successful, however since old checkpoints may be deleted asynchronously this may not always be the case.
    We now use a `FileVisitor` and ignore `IOExceptions` that occur while moving.
    If no checkpoint file could be moved the test will still fail.
    
    ## 2)
    After the checkpoint files were moved out of the HA storage directory the job is thrown into a restart loop. To verify the restart behavior the test was polling the job state and checked for the `RESTARTING` and `FAILING` states.
    Due to the small size the job is in these states only for a short time, effectively adding a race condition. Thus this loop mayrun for longer than anticipated; the largest outlier i got locally was 50 seconds which isn't _that_ for off from the 2 minute timeout. I suspect this to be the failure cause raised in the JIRA, but I can't guarantee it.
    Instead we now access the `fullRestarts` metric using a custom reporter to check how many restarts have occurred. The actual _state transitions_ should be irrelevant to the test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 9900

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6395
    
----
commit b8827dc3723558c52ad567bf88f24ae34129ea08
Author: zentol <ch...@...>
Date:   2018-07-23T14:21:32Z

    [FLINK-9900][tests] Harden ZooKeeperHighAvailabilityITCase

----


---