You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/12 11:19:04 UTC

[jira] [Commented] (FLINK-6565) Improve error messages for state restore failures

    [ https://issues.apache.org/jira/browse/FLINK-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007979#comment-16007979 ] 

ASF GitHub Bot commented on FLINK-6565:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/3882

    [FLINK-6565] Fail memory-backed state restores with meaningful message if previous serializer is unavailable

    Currently, without eager state registration, if on restore of memory-backed states (`DefaultOperatorStateBackend`, `HeapKeyedStateBackend`) the previous state serializer cannot be loaded (perhaps implementation changed or it was simply removed from classpath), we could only fail the job because there is no serializer to read previous state.
    
    Prior to this PR, the job was failing correctly, but without a meaningful message (only an NPE).
    This PR adds a more meaningful message to the failure. It also adds tests for the memory-backed backends that the failure is as expected.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-6565

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3882
    
----
commit 8f032dae560713af7d5d77dd4b85fb367332fa63
Author: Tzu-Li (Gordon) Tai <tz...@apache.org>
Date:   2017-05-12T11:11:25Z

    [FLINK-6565] Fail memory-backed state restores with meaningful message if previous serializer is unavailable

----


> Improve error messages for state restore failures
> -------------------------------------------------
>
>                 Key: FLINK-6565
>                 URL: https://issues.apache.org/jira/browse/FLINK-6565
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Critical
>
> The error messages thrown when state restore fails needs to be more explicit and clear of the actual reason.
> At least 2 cases we've seen so far:
> 1.
> For example, currently, when restoring an operator state or memory-backed keyed state, the previous serializer must exist. If it doesn't exist, currently only a vague NPE is thrown, without a clear message of the actual reason.
> 2.
> If the restore failure was due to an incompatible version of a serializer's config snapshot, then it should report something more informative then: "Incompatible version: found 1, required 1."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)