You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Hequn Cheng (JIRA)" <ji...@apache.org> on 2018/10/26 14:27:00 UTC

[jira] [Comment Edited] (FLINK-10668) Streaming File Sink E2E test fails because not all legitimate exceptions are excluded

    [ https://issues.apache.org/jira/browse/FLINK-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665245#comment-16665245 ] 

Hequn Cheng edited comment on FLINK-10668 at 10/26/18 2:26 PM:
---------------------------------------------------------------

[~gjy] Hi, need some advice from you. I see two options to solve the problem:
- Exclude all legitimate errors/exceptions in {{check_logs_for_exceptions}}
- Remove logs in {{test_streaming_file_sink.sh}} when test finished cause `Streaming File Sink E2E test` contains exceptions that shouldn't fail the test. Furthermore, we have already checked the output results in this test.

I'm trying to solve the issue with the first option. However I find that we have to exclude a lot exceptions. I'm not sure whether there are any side effects for other tests.
So, I'm prefer to the second option. It is more clear and will not affect other tests.
What do you think?


was (Author: hequn8128):
[~gjy] Hi, I see two options to solve the problem:
- Exclude all legitimate errors/exceptions in {{check_logs_for_exceptions}}
- Remove logs in {{test_streaming_file_sink.sh}} when test finished cause `Streaming File Sink E2E test` contains exceptions that shouldn't fail the test. Furthermore, we have already checked the output results in this test.

I'm trying to solve the issue with the first option. However I find that we have to exclude a lot exceptions. I'm not sure whether there are any side effects for other tests.
So, I'm prefer to the second option. It is more clear and will not affect other tests.
What do you think?

> Streaming File Sink E2E test fails because not all legitimate exceptions are excluded
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10668
>                 URL: https://issues.apache.org/jira/browse/FLINK-10668
>             Project: Flink
>          Issue Type: Bug
>          Components: E2E Tests
>    Affects Versions: 1.6.1, 1.7.0
>            Reporter: Gary Yao
>            Assignee: Hequn Cheng
>            Priority: Critical
>             Fix For: 1.6.3, 1.7.0
>
>
> Streaming File Sink E2E test fails because not all legitimate exceptions are excluded.
> The stacktrace below can appear in the logs generated by the test but {{check_logs_for_exceptions}} does not exclude all expected exceptions.
> {noformat}
> java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'xxxxxxx/10.0.x.xx:50849' has failed. This might indicate that the remote task manager has been lost.
> 	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196)
> 	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:133)
> 	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85)
> 	at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60)
> 	at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:166)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:494)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:525)
> 	at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:165)
> 	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> 	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager + 'xxxxxxx/10.0.x.xx:50849' has failed. This might indicate that the remote task manager has been lost.
> 	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:219)
> 	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:133)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> 	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> 	... 1 more
> Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxxxxxx/10.0.x.xx:50849
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:325)
> 	at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> 	... 6 more
> Caused by: java.net.ConnectException: Connection refused
> 	... 10 more
> {noformat}
> The presence of this exception should be acceptable because TMs are being killed as part of the test.
> *How to reproduce*
> # Build flink 
> # Run test:
> {code}
> cd flink-end-to-end-tests
> FLINK_DIR=../build-target ./run-single-test.sh test-scripts/test_streaming_file_sink.sh
> {code}
> # Check logs in {{../build-target/log}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)