You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/07/17 11:46:20 UTC

[GitHub] flink pull request #6352: [FLINK-8163][yarn][tests] Harden tests against slo...

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/6352

    [FLINK-8163][yarn][tests] Harden tests against slow job shutdowns

    ## What is the purpose of the change
    
    This PR hardens the `YarnTestBase` against jobs that just don't want to shut down that quickly (i.e. within 500ms).
    The maximum waiting time has been increase to 10 seconds, during which we periodically check the state of all applications.
    
    Additionally, the failure condition from `@Before` was moved to the `@After` method.
    
    This change will allow us to better differentiate between simple timing issues and unsuccessful job shutdowns.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 8163

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6352.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6352
    
----
commit 0dd65378f0c9f477bb8f5712bbc0b1f31440f5f0
Author: zentol <ch...@...>
Date:   2018-07-17T11:29:16Z

    [FLINK-8163][yarn][tests] Harden tests against slow job shutdowns

----


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    the test failures may highlight tests that weren't shutting down the last application properly; previously this would've succeeded since the check was done in `@Before`.


---

[GitHub] flink pull request #6352: [FLINK-9815][yarn][tests] Harden tests against slo...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6352#discussion_r203293425
  
    --- Diff: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java ---
    @@ -186,39 +189,56 @@ public static void populateYarnSecureConfigurations(Configuration conf, String p
     		conf.set("hadoop.security.auth_to_local", "RULE:[1:$1] RULE:[2:$1]");
     	}
     
    -	/**
    -	 * Sleep a bit between the tests (we are re-using the YARN cluster for the tests).
    -	 */
    -	@After
    -	public void sleep() {
    -		try {
    -			Thread.sleep(500);
    -		} catch (InterruptedException e) {
    -			Assert.fail("Should not happen");
    -		}
    -	}
    -
     	@Before
    -	public void checkClusterEmpty() throws IOException, YarnException {
    +	public void checkClusterEmpty() {
     		if (yarnClient == null) {
     			yarnClient = YarnClient.createYarnClient();
     			yarnClient.init(getYarnConfiguration());
     			yarnClient.start();
     		}
     
    -		List<ApplicationReport> apps = yarnClient.getApplications();
    -		for (ApplicationReport app : apps) {
    -			if (app.getYarnApplicationState() != YarnApplicationState.FINISHED
    -					&& app.getYarnApplicationState() != YarnApplicationState.KILLED
    -					&& app.getYarnApplicationState() != YarnApplicationState.FAILED) {
    -				Assert.fail("There is at least one application on the cluster is not finished." +
    -						"App " + app.getApplicationId() + " is in state " + app.getYarnApplicationState());
    +		flinkConfiguration = new org.apache.flink.configuration.Configuration(globalConfiguration);
    +
    +		isNewMode = Objects.equals(TestBaseUtils.CodebaseType.NEW, TestBaseUtils.getCodebaseType());
    +	}
    +
    +	/**
    +	 * Sleep a bit between the tests (we are re-using the YARN cluster for the tests).
    +	 */
    +	@After
    +	public void sleep() throws IOException, YarnException {
    +		Deadline deadline = Deadline.now().plus(Duration.ofSeconds(10));
    +
    +		boolean isAnyJobRunning = yarnClient.getApplications().stream()
    +			.anyMatch(YarnTestBase::isApplicationRunning);
    +
    +		while (deadline.hasTimeLeft() && isAnyJobRunning) {
    +			try {
    +				Thread.sleep(500);
    +			} catch (InterruptedException e) {
    +				Assert.fail("Should not happen");
     			}
    +			isAnyJobRunning = yarnClient.getApplications().stream()
    +				.anyMatch(YarnTestBase::isApplicationRunning);
     		}
     
    -		flinkConfiguration = new org.apache.flink.configuration.Configuration(globalConfiguration);
    +		if (isAnyJobRunning) {
    +			final Optional<ApplicationReport> runningApp = yarnClient.getApplications().stream()
    +				.filter(YarnTestBase::isApplicationRunning)
    +				.findAny();
    +			if (runningApp.isPresent()) {
    +				final ApplicationReport app = runningApp.get();
    +				Assert.fail("There is at least one application on the cluster that is not finished." +
    +					"App " + app.getApplicationId() + " is in state " + app.getYarnApplicationState());
    --- End diff --
    
    Could we log all running applications instead of any?


---

[GitHub] flink pull request #6352: [FLINK-9815][yarn][tests] Harden tests against slo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/6352


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    will look into them tomorrow


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    The failing test case seems to be caused by `YARNSessionFIFOITCase#testJavaAPI` not calling `yarnCluster.shutdownCluster`. I've fixed this problem in master and the release branch. Rebasing onto the latest master should hopefully give a green Travis build.


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    The `YARNHighAvailabilityITCase` still has the same problem.


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    Interesting. I thought I would have executed `flink-yarn-tests` locally with all tests. But it looks as if `YARNHighAvailabilityITCase` suffers from the same problem as the `YARNSessionFIFOITcase#testJavaAPI` test.


---

[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6352
  
    yay travis is green.


---