You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Alex Nastetsky <al...@vervemobile.com> on 2014/12/04 05:19:52 UTC

LZOP and pig.tmpfilecompression.codec=lzo

Why does the "lzo" value for "pig.tmpfilecompression.codec" convert to
"com.hadoop.compression.lzo.LzoCodec" instead of
"com.hadoop.compression.lzo.LzopCodec"?

( See org.apache.pig.impl.util.Utils.java in Pig source )

My understanding is that LzopCodec has headers and blocks and is therefore
splittable, while LzoCodec is stream-based without headers/blocks and is
not. Wouldn't we want a splittable compression codec for temp data?

>From "Hadoop in Practice" book:

What’s the difference between LZO and LZOP? Both LZO and LZOP codecs
>
> are supplied for use with Hadoop. LZO is a stream-based compression store
>
> that doesn’t have the notion of blocks or headers. LZOP has the notion of
>
> blocks (that are checksummed), and therefore is the codec you want to use,
>
> especially if you want your compressed output to be splittable.
>> Confusingly,
>
> the Hadoop codecs by default treat files ending with the .lzo extension to
>
> be LZOP-encoded, and files ending with the .lzo_deflate extension to be
>
> LZO-encoded. Also, much of the documentation seems to use LZO and
>
> LZOP interchangeably.
>
>