You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Philip Lee <ph...@gmail.com> on 2016/01/27 20:44:14 UTC

Reading ORC format on Flink

Hello,

Question about reading ORC format on Flink.

I want to use dataset after loadtesting csv to orc format by Hive.
Can Flink support reading ORC format?

If so, please let me know how to use the dataset in Flink.

Best,
Phil

Re: Reading ORC format on Flink

Posted by Chiwan Park <ch...@apache.org>.

Hi Phil,

I think that you can read ORC file using OrcInputFormat [1] with readHadoopFile method.

There is an example on MapReduce [2] in Stackoveflow. The approach works also on Flink. Maybe you have to use RichMapFunction [3] to initialize OrcSerde and StructObjectInspector object.

Regards,
Chiwan Park

[1]: https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.html
[2]: http://stackoverflow.com/questions/22673222/how-do-you-use-orcfile-input-output-format-in-mapreduce
[3]: https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/functions/RichMapFunction.html

> On Jan 28, 2016, at 4:44 AM, Philip Lee <ph...@gmail.com> wrote:
> 
> Hello, 
> 
> Question about reading ORC format on Flink.
> 
> I want to use dataset after loadtesting csv to orc format by Hive.
> Can Flink support reading ORC format?
> 
> If so, please let me know how to use the dataset in Flink.
> 
> Best,
> Phil
> 
> 
> 
>