You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by JAX <ja...@gmail.com> on 2012/04/15 14:32:37 UTC

Snappy question related to last

Hi guys : related to the last snappy question - how does Hadoop detect Snappy compression in the input dataset ( how does Hadoop
Know when to decompress records via snappy ).

Jay Vyas 
MMSB
UCHC

Re: Snappy question related to last

Posted by Harsh J <ha...@cloudera.com>.

This depends on the container you're using. SequenceFiles with Snappy
can be detected easily since the header of such files carry the codec
class used, and hence readers instantiate the right one to decompress
with.

However, since Snappy is just a compression codec and does not provide
a container format
(http://code.google.com/p/snappy/issues/detail?id=34) there's no
present way to "detect" if a file/stream is snappy encoded or not,
unless a full stream is available (to test with, via python's
snappy.isValidCompressed, say).

If you're using Snappy today, its best to be used at map intermediate
level, and within other container formats such as the hadoop
sequencefiles and avro datafiles.

On Sun, Apr 15, 2012 at 6:02 PM, JAX <ja...@gmail.com> wrote:
> Hi guys : related to the last snappy question - how does Hadoop detect Snappy compression in the input dataset ( how does Hadoop
> Know when to decompress records via snappy ).
>
> Jay Vyas
> MMSB
> UCHC

-- 
Harsh J