You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "vinoyang (JIRA)" <ji...@apache.org> on 2019/02/26 02:20:00 UTC

[jira] [Commented] (FLINK-11737) Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output

    [ https://issues.apache.org/jira/browse/FLINK-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777496#comment-16777496 ] 

vinoyang commented on FLINK-11737:
----------------------------------

[~StephanEwen] updated. Constructing {{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}} in hadoop requires an instance of the {{TaskInputOutputContext}} interface, and the most common implementation of this interface is {{ReduceContextImpl}}. The Construction of {{ReduceContextImpl}} requires {{RawKeyValueIterator}} (requires an Iterator). The lowest-level {{OutputFormat}} in Flink is a single message output model (OutputFormat#writeRecord). Currently, to use {{MultipleOutputs}}, I can only use an {{MapPartitionFunction}} to get an {{Iterator}}. What do you think of this issue? cc [~fhueske]

> Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output
> ---------------------------------------------------------------------
>
>                 Key: FLINK-11737
>                 URL: https://issues.apache.org/jira/browse/FLINK-11737
>             Project: Flink
>          Issue Type: Improvement
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> This issue is to improve Flink's compatibility with Hadoop. Currently, for the old version of the Hadoop API, there is {{org.apache.hadoop.mapred.lib.MultipleOutputFormat}}, which can be used directly. However, for the new version of the Hadoop API {{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}}, the current Flink cannot be supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)