You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by "Ganelin, Ilya" <Il...@capitalone.com> on 2017/03/14 00:25:02 UTC

Preferred way to write compressed files to HDFS

What is the recommended way to write compressed data to HDFS? Should I extend AbstractFileOutputOperator or is there existing support for this via a config?

What formats are presently supported and if extending Apex to support Snappy or Lz4 what would that look like?

Thanks in advance!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Preferred way to write compressed files to HDFS

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Perfect, thanks! Testing out the Snappy compressor now and will push it back if it works as planned. Cheers.

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: Munagala Ramanath <ra...@datatorrent.com>
Reply-To: "users@apex.apache.org" <us...@apex.apache.org>
Date: Monday, March 13, 2017 at 6:18 PM
To: "users@apex.apache.org" <us...@apex.apache.org>
Subject: Re: Preferred way to write compressed files to HDFS

Take a look at the testCompression() method in AbstractFileOutputOperatorTest.java for an
example.

Ram

On Mon, Mar 13, 2017 at 5:25 PM, Ganelin, Ilya <Il...@capitalone.com>> wrote:
What is the recommended way to write compressed data to HDFS? Should I extend AbstractFileOutputOperator or is there existing support for this via a config?

What formats are presently supported and if extending Apex to support Snappy or Lz4 what would that look like?

Thanks in advance!

- Ilya Ganelin
[cid:image002.png@01D29C2E.FD29A800]


________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.



--


_______________________________________________________

Munagala V. Ramanath

Software Engineer

E: ram@datatorrent.com<ma...@datatorrent.com> | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com<http://www.datatorrent.com>  |  apex.apache.org<http://apex.apache.org>

[https://lh4.googleusercontent.com/gZ-_bA3cXH8UerggUOqWVpDglJsOiKtL3dpoCCWrX0K1YEYQHbJUoR106D3gztJVPaRyfeuQVMLKM3PCDfZo-wzBaliedZdlaL5YSqmZb_Weje_De0qIPsLSMAgzRZiE5lIECrT4]


________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Preferred way to write compressed files to HDFS

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Take a look at the testCompression() method
in AbstractFileOutputOperatorTest.java for an
example.

Ram

On Mon, Mar 13, 2017 at 5:25 PM, Ganelin, Ilya <Il...@capitalone.com>
wrote:

> What is the recommended way to write compressed data to HDFS? Should I
> extend AbstractFileOutputOperator or is there existing support for this via
> a config?
>
>
>
> What formats are presently supported and if extending Apex to support
> Snappy or Lz4 what would that look like?
>
>
>
> Thanks in advance!
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 

_______________________________________________________

Munagala V. Ramanath

Software Engineer

E: ram@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org