You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by rishi <ri...@target.com> on 2017/04/21 05:31:55 UTC

How to write data in ORC format to hdfs instead of text format

Hi,

I have a requirement in which I have to write ORC data directly to HDFS
instead of text using Apex. Please let me know if it is possible through any
of the operator. Went through apex documentation but didn't see any thing
related to writing ORC file directly. 

Any help is highly appreciated.

Thanks
Rishi



--
View this message in context: http://apache-apex-users-list.78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-instead-of-text-format-tp1539.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Re: How to write data in ORC format to hdfs instead of text format

Posted by Priyanka Gugale <pr...@apache.org>.
Oops sorry i read ocr instead of orc in hurry. Apologies for confusion.

-Priyanka

On 21 Apr 2017 8:43 p.m., "Vlad Rozov" <v....@datatorrent.com> wrote:

> Both ORC and Parquet are columnar storage formats and require Output
> operators to understand it (do tuple to ORC column translation).
>
> Thank you,
>
> Vlad
>
> On 4/20/17 22:46, Priyanka Gugale wrote:
>
> You can treat your ocr file as binary file. AbstractFileOutputOperator can
> be used for binary files, it doesn't have to know file format.
> Unless you want operator to understand ocr format and do some processing,
> you can go ahead with AbstractFileOutputOperator.
>
> -Priyanka
>
> On Fri, Apr 21, 2017 at 11:01 AM, rishi <ri...@target.com> wrote:
>
>> Hi,
>>
>> I have a requirement in which I have to write ORC data directly to HDFS
>> instead of text using Apex. Please let me know if it is possible through
>> any
>> of the operator. Went through apex documentation but didn't see any thing
>> related to writing ORC file directly.
>>
>> Any help is highly appreciated.
>>
>> Thanks
>> Rishi
>>
>>
>>
>> --
>> View this message in context: http://apache-apex-users-list.
>> 78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-
>> instead-of-text-format-tp1539.html
>> Sent from the Apache Apex Users list mailing list archive at Nabble.com.
>>
>
>
>

Re: How to write data in ORC format to hdfs instead of text format

Posted by Vlad Rozov <v....@datatorrent.com>.
Hi Rishi,

Problem is that it is not possible to use AbstractFileOutputOperator to 
write to a columnar storage data formats such as Parquet or ORC. 
AbstractFileOutputOperator assumes row data formats. AFAIK, Malhar does 
not have output operators that support columnar storage, so it will be 
necessary to create a new output operators to write to ORC files.

Thank you,

Vlad

On 4/27/17 00:05, rishi wrote:
> Vlad,
>
> Thanks for the reply!
>
> I have code which takes the input tuple and write to HDFS in ORC format. Now
> my challenge is to incorporate same code in one of the operator which
> extends AbstractFileOutputOperator.
>
> I am attaching the code which is writing the ORC file and the operator in
> which I am trying to incorporate the code.
>
> Thanks ORC_Query_Apex.txt
> <http://apache-apex-users-list.78494.x6.nabble.com/file/n1558/ORC_Query_Apex.txt>
> Rishi
>
>
>
> --
> View this message in context: http://apache-apex-users-list.78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-instead-of-text-format-tp1539p1558.html
> Sent from the Apache Apex Users list mailing list archive at Nabble.com.


Re: How to write data in ORC format to hdfs instead of text format

Posted by rishi <ri...@target.com>.
Vlad,

Thanks for the reply!

I have code which takes the input tuple and write to HDFS in ORC format. Now
my challenge is to incorporate same code in one of the operator which
extends AbstractFileOutputOperator.

I am attaching the code which is writing the ORC file and the operator in
which I am trying to incorporate the code.

Thanks ORC_Query_Apex.txt
<http://apache-apex-users-list.78494.x6.nabble.com/file/n1558/ORC_Query_Apex.txt>  
Rishi



--
View this message in context: http://apache-apex-users-list.78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-instead-of-text-format-tp1539p1558.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.

Re: How to write data in ORC format to hdfs instead of text format

Posted by Vlad Rozov <v....@datatorrent.com>.
Both ORC and Parquet are columnar storage formats and require Output 
operators to understand it (do tuple to ORC column translation).

Thank you,

Vlad

On 4/20/17 22:46, Priyanka Gugale wrote:
> You can treat your ocr file as binary file. AbstractFileOutputOperator 
> can be used for binary files, it doesn't have to know file format.
> Unless you want operator to understand ocr format and do some 
> processing, you can go ahead with AbstractFileOutputOperator.
>
> -Priyanka
>
> On Fri, Apr 21, 2017 at 11:01 AM, rishi <rishi.mishra@target.com 
> <ma...@target.com>> wrote:
>
>     Hi,
>
>     I have a requirement in which I have to write ORC data directly to
>     HDFS
>     instead of text using Apex. Please let me know if it is possible
>     through any
>     of the operator. Went through apex documentation but didn't see
>     any thing
>     related to writing ORC file directly.
>
>     Any help is highly appreciated.
>
>     Thanks
>     Rishi
>
>
>
>     --
>     View this message in context:
>     http://apache-apex-users-list.78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-instead-of-text-format-tp1539.html
>     <http://apache-apex-users-list.78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-hdfs-instead-of-text-format-tp1539.html>
>     Sent from the Apache Apex Users list mailing list archive at
>     Nabble.com.
>
>


Re: How to write data in ORC format to hdfs instead of text format

Posted by Priyanka Gugale <pr...@apache.org>.
You can treat your ocr file as binary file. AbstractFileOutputOperator can
be used for binary files, it doesn't have to know file format.
Unless you want operator to understand ocr format and do some processing,
you can go ahead with AbstractFileOutputOperator.

-Priyanka

On Fri, Apr 21, 2017 at 11:01 AM, rishi <ri...@target.com> wrote:

> Hi,
>
> I have a requirement in which I have to write ORC data directly to HDFS
> instead of text using Apex. Please let me know if it is possible through
> any
> of the operator. Went through apex documentation but didn't see any thing
> related to writing ORC file directly.
>
> Any help is highly appreciated.
>
> Thanks
> Rishi
>
>
>
> --
> View this message in context: http://apache-apex-users-list.
> 78494.x6.nabble.com/How-to-write-data-in-ORC-format-to-
> hdfs-instead-of-text-format-tp1539.html
> Sent from the Apache Apex Users list mailing list archive at Nabble.com.
>