You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/05/10 12:13:35 UTC

[GitHub] [flink] zentol opened a new pull request #8412: [FLINK-12111][tests] Harden AbstractTaskManagerProcessFailureRecoveryTest

zentol opened a new pull request #8412: [FLINK-12111][tests] Harden AbstractTaskManagerProcessFailureRecoveryTest
URL: https://github.com/apache/flink/pull/8412

## What is the purpose of the change

Assortment of changes to improve/harden the `AbstractTaskManagerProcessFailureRecoveryTest`

## Brief change log

* removed unused field
* no longer sets `taskManagerProcess1` to null so that the process output is printed on failure
* wait until destroyed process has actually shut down
Prevents theoretical scenarios where the job can finish because the destroy() command takes a while to take effect.
* reduce number of initial TMs to 1,
The batch test could still succeed (if ExecutionMode == BATCH) even if the new TM was never used.
Reduce the number of initial TMs to 1 so that once that TM crashes all tasks MUST be moved to the new TM.
Doubled number of slots to compensate the loss of a TM.
* allow 2 restarts
For some reason this test could fail multiple times, instead of just once.

## Verifying this change

The issue with the BATCH execution mode could be reproduced easily (just skip the start of the third TM), and the change should fix this in an obvious way.

The restart fix is basically a shot in the dark + band-aid; ideally we would find the underlying cause.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services