You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yu Li (JIRA)" <ji...@apache.org> on 2019/07/24 12:09:00 UTC

[jira] [Comment Edited] (FLINK-13228) HadoopRecoverableWriterTest.testCommitAfterNormalClose fails on Travis

    [ https://issues.apache.org/jira/browse/FLINK-13228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891795#comment-16891795 ] 

Yu Li edited comment on FLINK-13228 at 7/24/19 12:08 PM:
---------------------------------------------------------

{noformat}
23:31:07,552 WARN org.apache.hadoop.hdfs.DataStreamer - DataStreamer Exception
java.nio.channels.ClosedByInterruptException at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
{noformat}

After a closer check of the above log, I found the root cause is {{ClosedByInterruptException}} occurred after {{channel.write(buf)}} completes writing data in {{SocketOutputStream#performIO}}, and now I could stably reproduce the issue with the v2 hadoop patch as attached.

Since all data are already written successfully and file also marked as complete by NameNode, we should silently ignore the {{ClosedByInterruptedException}} instead of throwing it out as an error, which IMO is something hadoop should fix. Will file a JIRA for HDFS once find out a proper solution.

As per how to fix the issue here, since the issue is thrown at closing the {{RecoverableFsDataOutputStream}} (easily to confirm after flattening the try-with-resource to a normal try-catch), I think we could directly try-catch the exception and ignore it if failed to close the {{RecoverableFsDataOutputStream}}, because this is irrelative to the target of the test case (checking whether commit after normal close works). Wdyt? [~till.rohrmann] [~Zentol]

Will attach the draft patch here for a straight forward check.

 


was (Author: carp84):
{noformat}
23:31:07,552 WARN org.apache.hadoop.hdfs.DataStreamer - DataStreamer Exception java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
{noformat}

After a closer check of the above log, I found the root cause is {{ClosedByInterruptException}} occurred after {{channel.write(buf)}} completes writing data in {{SocketOutputStream#performIO}}, and now I could stably reproduce the issue with the v2 hadoop patch as attached.

Since all data are already written successfully and file also marked as complete by NameNode, we should silently ignore the {{ClosedByInterruptedException}} instead of throwing it out as an error, which IMO is something hadoop should fix. Will file a JIRA for HDFS once find out a proper solution.

As per how to fix the issue here, since the issue is thrown at closing the {{RecoverableFsDataOutputStream}} (easily to confirm after flattening the try-with-resource to a normal try-catch), I think we could directly try-catch the exception and ignore it if failed to close the {{RecoverableFsDataOutputStream}}, because this is irrelative to the target of the test case (checking whether commit after normal close works). Wdyt? [~till.rohrmann] [~Zentol]

Will attach the draft patch here for a straight forward check.

 

> HadoopRecoverableWriterTest.testCommitAfterNormalClose fails on Travis
> ----------------------------------------------------------------------
>
>                 Key: FLINK-13228
>                 URL: https://issues.apache.org/jira/browse/FLINK-13228
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.9.0
>            Reporter: Till Rohrmann
>            Assignee: Yu Li
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.9.0
>
>         Attachments: FLINK-13228.hadoop.debug.patch, FLINK-13228.hadoop.debug.v2.patch
>
>
> {{HadoopRecoverableWriterTest.testCommitAfterNormalClose}} failed on Travis with
> {code}
> HadoopRecoverableWriterTest.testCommitAfterNormalClose » IO The stream is closed
> {code}
> https://api.travis-ci.org/v3/job/557293706/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)