You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "vinoyang (JIRA)" <ji...@apache.org> on 2019/02/26 02:20:00 UTC
[jira] [Commented] (FLINK-11737) Support
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output
[ https://issues.apache.org/jira/browse/FLINK-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777496#comment-16777496 ]
vinoyang commented on FLINK-11737:
----------------------------------
[~StephanEwen] updated. Constructing {{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}} in hadoop requires an instance of the {{TaskInputOutputContext}} interface, and the most common implementation of this interface is {{ReduceContextImpl}}. The Construction of {{ReduceContextImpl}} requires {{RawKeyValueIterator}} (requires an Iterator). The lowest-level {{OutputFormat}} in Flink is a single message output model (OutputFormat#writeRecord). Currently, to use {{MultipleOutputs}}, I can only use an {{MapPartitionFunction}} to get an {{Iterator}}. What do you think of this issue? cc [~fhueske]
> Support org.apache.hadoop.mapreduce.lib.output.MultipleOutputs output
> ---------------------------------------------------------------------
>
> Key: FLINK-11737
> URL: https://issues.apache.org/jira/browse/FLINK-11737
> Project: Flink
> Issue Type: Improvement
> Components: Batch Connectors and Input/Output Formats
> Reporter: vinoyang
> Assignee: vinoyang
> Priority: Major
>
> This issue is to improve Flink's compatibility with Hadoop. Currently, for the old version of the Hadoop API, there is {{org.apache.hadoop.mapred.lib.MultipleOutputFormat}}, which can be used directly. However, for the new version of the Hadoop API {{org.apache.hadoop.mapreduce.lib.output.MultipleOutputs}}, the current Flink cannot be supported.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)