You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (Jira)" <ji...@apache.org> on 2020/01/02 12:29:00 UTC

[jira] [Comment Edited] (FLINK-15403) 'State Migration end-to-end test from 1.6' is unstable on travis.

    [ https://issues.apache.org/jira/browse/FLINK-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006758#comment-17006758 ] 

Piotr Nowojski edited comment on FLINK-15403 at 1/2/20 12:28 PM:
-----------------------------------------------------------------

Yes, indeed this is caused by FLINK-15317. I will prepare a fix for that.

The issue is that here, {{CancelTaskException}} is originating from a {{LocalInputChannel}}, from upstream task task, that was already cancelled, while the downstream task has been (yet?) cancelled.

In FLINK-15317 we made an assumption that {{CancelTaskException}} can not occur in the stack trace of the task, before task was cancelled. Clearly that was wrong assumption.

Code fix will be trivial, it might be more difficult to add a test coverage.


was (Author: pnowojski):
Yes, indeed this is caused by FLINK-15317. I will prepare a fix for that.

> 'State Migration end-to-end test from 1.6' is unstable on travis.
> -----------------------------------------------------------------
>
>                 Key: FLINK-15403
>                 URL: https://issues.apache.org/jira/browse/FLINK-15403
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 1.10.0
>            Reporter: Xintong Song
>            Assignee: Piotr Nowojski
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.10.0
>
>
> -api.travis-ci.org/v3/job/629576631/log.txt-
> https://api.travis-ci.org/v3/job/631346939/log.txt
> The test case fails because the log contains the following error message.
> {code}
> 2019-12-26 09:19:35,537 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask           - Received CancelTaskException while we are not canceled. This is a bug and should be reported
> org.apache.flink.runtime.execution.CancelTaskException: Consumed partition PipelinedSubpartitionView(index: 0) of ResultPartition 3886657fb8cc980139fac67e32d6e380@8cfcbe851fe3bb3fa00e9afc370bd963 has been released.
> 	at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:190)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:509)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:487)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNext(SingleInputGate.java:475)
> 	at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.pollNext(InputGateWithMetrics.java:75)
> 	at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:125)
> 	at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133)
> 	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:311)
> 	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:488)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
> 	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:702)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:527)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)