You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias (Jira)" <ji...@apache.org> on 2020/12/28 13:59:00 UTC
[jira] [Comment Edited] (FLINK-20654) Unaligned checkpoint recovery
may lead to corrupted data stream
[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255595#comment-17255595 ]
Matthias edited comment on FLINK-20654 at 12/28/20, 1:58 PM:
-------------------------------------------------------------
For the sake of completeness: [This Build|https://dev.azure.com/mapohl/flink/_build/results?buildId=143&view=logs&j=6e55a443-5252-5db5-c632-109baf464772&t=9df6efca-61d0-513a-97ad-edb76d85786a&l=8813] and [that build|https://dev.azure.com/mapohl/flink/_build/results?buildId=144&view=logs&j=6e55a443-5252-5db5-c632-109baf464772&t=9df6efca-61d0-513a-97ad-edb76d85786a&l=8807]due to {{IndexOutOfBoundsException}} (FLINK-20662).
was (Author: mapohl):
For the sake of completeness: [Build failed|https://dev.azure.com/mapohl/flink/_build/results?buildId=143&view=logs&j=6e55a443-5252-5db5-c632-109baf464772&t=9df6efca-61d0-513a-97ad-edb76d85786a&l=8813] due to {{IndexOutOfBoundsException}} (FLINK-20662).
> Unaligned checkpoint recovery may lead to corrupted data stream
> ---------------------------------------------------------------
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.12.0
> Reporter: Arvid Heise
> Assignee: Roman Khachatryan
> Priority: Critical
> Labels: pull-request-available, test-stability
> Fix For: 1.13.0, 1.12.1
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be [0;100000] has been mis-deserialized)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)