You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "James Cao (JIRA)" <ji...@apache.org> on 2015/09/09 02:10:45 UTC
[jira] [Commented] (FLINK-1919) Add HCatOutputFormat for Tuple data
types
[ https://issues.apache.org/jira/browse/FLINK-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735873#comment-14735873 ]
James Cao commented on FLINK-1919:
----------------------------------
pull request for this issue:
https://github.com/apache/flink/pull/1079
> Add HCatOutputFormat for Tuple data types
> -----------------------------------------
>
> Key: FLINK-1919
> URL: https://issues.apache.org/jira/browse/FLINK-1919
> Project: Flink
> Issue Type: New Feature
> Components: Java API, Scala API
> Reporter: Fabian Hueske
> Assignee: James Cao
> Priority: Minor
> Labels: starter
>
> It would be good to have an OutputFormat that can write data to HCatalog tables.
> The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these to HCatalog tables. We can do the same thing, by creating these `HCatRecord` object with a Map function that precedes a `HadoopOutputFormat` that wraps the Hadoop `HCatOutputFormat`.
> Better support for Flink Tuples can be added by implementing a custom `HCatOutputFormat` that also depends on the Hadoop `HCatOutputFormat` but internally converts Flink Tuples to `HCatRecords`. This would also include to check if the schema of the HCatalog table and the Flink tuples match. For data types other than tuples, the OutputFormat could either require a preceding Map function that converts to `HCatRecords` or let users specify a MapFunction and invoke that internally.
> We have already a Flink `HCatInputFormat` which does this in the reverse directions, i.e., it emits Flink Tuples from HCatalog tables.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)