You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2015/09/01 12:51:46 UTC

[jira] [Commented] (FLINK-2580) HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

    [ https://issues.apache.org/jira/browse/FLINK-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725176#comment-14725176 ] 

Stephan Ewen commented on FLINK-2580:
-------------------------------------

I'll add a temporary solution that allows you to get the original FileSystem and the Original FileSteams

> HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2580
>                 URL: https://issues.apache.org/jira/browse/FLINK-2580
>             Project: Flink
>          Issue Type: Improvement
>          Components: Hadoop Compatibility
>            Reporter: Arnaud Linz
>            Priority: Minor
>
> I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).
>  
> However, FSDataOutputStream exposes many methods like flush,   getPos etc, but HadoopDataOutputStream only wraps write & close.
>  
> For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink.
> I see two options:
> - complete the class to wrap all methods of OutputStream and add a getWrappedStream() to access other stuff like hsync().
> - get rid of the Hadoop wrapping and directly use Hadoop file system objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)