You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/13 15:39:41 UTC

[jira] [Commented] (FLINK-5553) Job fails during deployment with IllegalStateException from subpartition request

    [ https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863839#comment-15863839 ] 

ASF GitHub Bot commented on FLINK-5553:
---------------------------------------

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/3299

    [FLINK-5553] keep the original throwable in PartitionRequestClientHandler

    This way, when checking for a previous error in any input channel, we can throw a meaningful exception instead of the inspecific `IllegalStateException("There has been an error in the channel.")` before.
    
    Note that the original `Throwable` (from an existing channel) may or may not(!) have been printed by the `InputGate` yet. Any new input channel, however, did not get the `Throwable` and must fail through the (now enhanced) fallback mechanism.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-5553

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3299
    
----
commit a722d7cd4c4218543c87c2e8a3b3bbc708bddf55
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-02-13T15:30:59Z

    [FLINK-5553] keep the original throwable in PartitionRequestClientHandler
    
    This way, when checking for a previous error in any input channel, we can throw
    a meaningful exception instead of the inspecific
    IllegalStateException("There has been an error in the channel.") before.
    
    Note that the original throwable (from an existing channel) may or may not(!)
    have been printed by the InputGate yet. Any new input channel, however, did not
    get the Throwable and must fail through the (now enhanced) fallback mechanism.

----


> Job fails during deployment with IllegalStateException from subpartition request
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-5553
>                 URL: https://issues.apache.org/jira/browse/FLINK-5553
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.3.0
>            Reporter: Robert Metzger
>         Attachments: application-1484132267957-0076
>
>
> While running a test job with Flink 1.3-SNAPSHOT (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception:
> {code}
> 2017-01-18 14:56:27,043 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,059 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,817 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-18 14:56:27,819 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING to FAILING.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is the first exception that is reported to the jobmanager.
> I think this is related to missing network buffers. You see that from the next deployment after the restart, where the deployment fails with the insufficient number of buffers exception.
> I'll add logs to the JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)