You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2017/08/08 08:08:37 UTC

[GitHub] flink pull request #4497: [FLINK-7240] [tests] Stabilize ExternalizedCheckpo...

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/4497

    [FLINK-7240] [tests] Stabilize ExternalizedCheckpointITCase

    ## What is the purpose of the change
    
    Stabilize `ExternalizedCheckpointITCase`.
    
    ## Brief change log
    
    The problem was that the TestingCluster did not wait properly after canceling the
    job that the job was also completely removed from the cluster before submitting
    the next job. This could lead to a NoResourceAvailableException which ultimately
    made the job fail.
    
    ## Verifying this change
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixExternalizedCheckpointITCase

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4497.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4497
    
----
commit 2a8adef594c965182324d3a02c1979e118ebf850
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-08-08T08:04:34Z

    [FLINK-7240] [tests] Stabilize ExternalizedCheckpointITCase
    
    The problem was that the TestingCluster did not wait properly after canceling the
    job that the job was also completely removed from the cluster before submitting
    the next job. This could lead to a NoResourceAvailableException which ultimately
    made the job fail.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4497: [FLINK-7240] [tests] Stabilize ExternalizedCheckpointITCa...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/4497
  
    Thanks for the review @StefanRRichter. Travis passes now. Merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4497: [FLINK-7240] [tests] Stabilize ExternalizedCheckpointITCa...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/4497
  
    Thanks for looking into this @tillrohrmann ! The fix looks good to me. +1 for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4497: [FLINK-7240] [tests] Stabilize ExternalizedCheckpointITCa...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/4497
  
    I actually think that this does not fully stabilize the test case. The problem is the following: Waiting for all `Tasks` to be in state `RUNNING` is not sufficient but only necessary. The `StreamTask` can still be not running and thus the checkpoint gets rejected. I propose to add a `CountDownLatch` to the source function to signal when all sources are really running. Only then we will trigger the checkpoint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4497: [FLINK-7240] [tests] Stabilize ExternalizedCheckpo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4497


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---