You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Nico Kruber (JIRA)" <ji...@apache.org> on 2017/02/13 15:52:41 UTC

[jira] [Assigned] (FLINK-5553) Job fails during deployment with IllegalStateException from subpartition request

     [ https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nico Kruber reassigned FLINK-5553:
----------------------------------

    Assignee: Nico Kruber

> Job fails during deployment with IllegalStateException from subpartition request
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-5553
>                 URL: https://issues.apache.org/jira/browse/FLINK-5553
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.3.0
>            Reporter: Robert Metzger
>            Assignee: Nico Kruber
>         Attachments: application-1484132267957-0076
>
>
> While running a test job with Flink 1.3-SNAPSHOT (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception:
> {code}
> 2017-01-18 14:56:27,043 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,059 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,817 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-18 14:56:27,819 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING to FAILING.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is the first exception that is reported to the jobmanager.
> I think this is related to missing network buffers. You see that from the next deployment after the restart, where the deployment fails with the insufficient number of buffers exception.
> I'll add logs to the JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)