You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/04 11:38:10 UTC

[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

    [ https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361196#comment-15361196 ] 

ASF GitHub Bot commented on FLINK-3397:
---------------------------------------

GitHub user ramkrish86 opened a pull request:

    https://github.com/apache/flink/pull/2195

    FLINK-3397 Failed streaming jobs should fall back to the most recent

    Initial patch to see if this is what is intended out of the JIRA. Thought a PR could help me in getting a better feedback. I tried to tweak and add a test case but I could not. I followed what was done in SavePointITCase and particularly testRestoreFailure(). But am not able to get a flow where there could be a checkpoint and also a save point because this test case allows the notification to happen when the job is removed and that clears all the existing savePoints. So when the test case restores it always goes with the savePoint. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ramkrish86/flink FLINK-3397

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2195
    
----
commit 70e881fba6ab1964600b4fc932a8f7b683e2ff1e
Author: Ramkrishna <ra...@intel.com>
Date:   2016-07-04T11:32:11Z

    FLINK-3397 Failed streaming jobs should fall back to the most recent
    checkpoint/savepoint (Ram)

----


> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Priority: Minor
>
> The current fallback behaviour in case of a streaming job failure is slightly counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even if there were more recent savepoint taken. This means that savepoints are not regarded as checkpoints by the system only points from where a job can be manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints in case of a failure and they will also be used to automatically restore the streaming job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)