You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by gyfora <gi...@git.apache.org> on 2015/06/09 23:40:37 UTC

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

GitHub user gyfora opened a pull request:

    https://github.com/apache/flink/pull/812

    [streaming] Allow force-enabling checkpoints for iterative jobs

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gyfora/flink iteration

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #812
    
----
commit 3d547048901b05b2bd532bae94dc1bc552320b08
Author: Gyula Fora <gy...@apache.org>
Date:   2015-06-09T21:40:07Z

    [streaming] Allow force-enabling checkpoints for iterative jobs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by gyfora <gi...@git.apache.org>.

Github user gyfora commented on a diff in the pull request:

    https://github.com/apache/flink/pull/812#discussion_r32090193
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java ---
    @@ -241,6 +241,31 @@ public StreamExecutionEnvironment enableCheckpointing(long interval) {
     		streamGraph.setCheckpointingInterval(interval);
     		return this;
     	}
    +	
    +	/**
    +	 * Method for force-enabling fault-tolerance. Activates monitoring and
    +	 * backup of streaming operator states even for jobs containing iterations.
    +	 * 
    +	 * Please note that the checkpoint/restore guarantees for iterative jobs are
    +	 * only best-effort at the moment. Records inside the loops may be lost
    +	 * during failure.
    +	 * <p/>
    +	 * <p/>
    +	 * Setting this option assumes that the job is used in production and thus
    +	 * if not stated explicitly otherwise with calling with the
    +	 * {@link #setNumberOfExecutionRetries(int numberOfExecutionRetries)} method
    +	 * in case of failure the job will be resubmitted to the cluster
    +	 * indefinitely.
    +	 * 
    +	 * @param interval
    +	 *            Time interval between state checkpoints in millis
    +	 */
    +	public StreamExecutionEnvironment enableCheckpointing(long interval, boolean force) {
    --- End diff --
    
    ah you are right , fixing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by rmetzger <gi...@git.apache.org>.

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/812#discussion_r32089514
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java ---
    @@ -241,6 +241,31 @@ public StreamExecutionEnvironment enableCheckpointing(long interval) {
     		streamGraph.setCheckpointingInterval(interval);
     		return this;
     	}
    +	
    +	/**
    +	 * Method for force-enabling fault-tolerance. Activates monitoring and
    +	 * backup of streaming operator states even for jobs containing iterations.
    +	 * 
    +	 * Please note that the checkpoint/restore guarantees for iterative jobs are
    +	 * only best-effort at the moment. Records inside the loops may be lost
    +	 * during failure.
    +	 * <p/>
    +	 * <p/>
    +	 * Setting this option assumes that the job is used in production and thus
    +	 * if not stated explicitly otherwise with calling with the
    +	 * {@link #setNumberOfExecutionRetries(int numberOfExecutionRetries)} method
    +	 * in case of failure the job will be resubmitted to the cluster
    +	 * indefinitely.
    +	 * 
    +	 * @param interval
    +	 *            Time interval between state checkpoints in millis
    +	 */
    +	public StreamExecutionEnvironment enableCheckpointing(long interval, boolean force) {
    --- End diff --
    
    the second argument `force` is not respected by the system


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by mbalassi <gi...@git.apache.org>.

Github user mbalassi commented on the pull request:

    https://github.com/apache/flink/pull/812#issuecomment-110595076
  
    Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/812


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by aljoscha <gi...@git.apache.org>.

Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/812#issuecomment-110644940
  
    I would be against merging this in the current form. What I propose is to analyse the topology to verify that there are no checkpointed operators inside iterations. Operators before and after iterations can be checkpointed and we can safely allow the user to enable checkpointing.
    
    If we have the code to analyse which operators are inside iterations we could also disallow windows inside iterations. I think windows inside iterations don't make sense since elements in different "iterations" would end up in the same window. Maybe I'm wrong here though, then please correct me. :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [streaming] Allow force-enabling checkpoints f...

Posted by gyfora <gi...@git.apache.org>.

Github user gyfora commented on a diff in the pull request:

    https://github.com/apache/flink/pull/812#discussion_r32091115
  
    --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java ---
    @@ -241,6 +241,31 @@ public StreamExecutionEnvironment enableCheckpointing(long interval) {
     		streamGraph.setCheckpointingInterval(interval);
     		return this;
     	}
    +	
    +	/**
    +	 * Method for force-enabling fault-tolerance. Activates monitoring and
    +	 * backup of streaming operator states even for jobs containing iterations.
    +	 * 
    +	 * Please note that the checkpoint/restore guarantees for iterative jobs are
    +	 * only best-effort at the moment. Records inside the loops may be lost
    +	 * during failure.
    +	 * <p/>
    +	 * <p/>
    +	 * Setting this option assumes that the job is used in production and thus
    +	 * if not stated explicitly otherwise with calling with the
    +	 * {@link #setNumberOfExecutionRetries(int numberOfExecutionRetries)} method
    +	 * in case of failure the job will be resubmitted to the cluster
    +	 * indefinitely.
    +	 * 
    +	 * @param interval
    +	 *            Time interval between state checkpoints in millis
    +	 */
    +	public StreamExecutionEnvironment enableCheckpointing(long interval, boolean force) {
    --- End diff --
    
    I added a test for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---