You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Xiang Gao (Jira)" <ji...@apache.org> on 2020/09/21 04:10:00 UTC

[jira] [Created] (FLINK-19300) Timer loss after restoring from savepoint

Xiang Gao created FLINK-19300:
---------------------------------

             Summary: Timer loss after restoring from savepoint
                 Key: FLINK-19300
                 URL: https://issues.apache.org/jira/browse/FLINK-19300
             Project: Flink
          Issue Type: Bug
          Components: Runtime / State Backends
            Reporter: Xiang Gao


While using heap-based timers, we are seeing occasional timer loss after restoring program from savepoint, especially when using a remote savepoint storage (s3). 

After some investigation, the issue seems to be related to [this line in deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65]. When try checking the VERSIONED_IDENTIFIER, the input stream may not guarantee filling the byte array, causing timers to be dropped for the affected key group.

Should consider reading until expected number of bytes are read or if end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)