You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cameron Gandevia <cg...@gmail.com> on 2011/10/14 03:29:20 UTC

Snappy Compression Json Data

Hi

I currently have a bunch of data in json format in hdfs. I would like to use
pig to load it dedupe it and store it back using snappy compression.

Currently I do something like this.

raw = LOAD '$INPUT' USING PigJsonLoader();
uniq = DISTINCT raw;
STORE uniq INTO '$OUTPUT' USING PigStorage();

If I add the following to the pig job it seems to write the files with a
'.snappy' extension

<property>
  <name>mapred.output.compress</name>
  <value>true</value>
</property>
<property>
  <name>mapred.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
 </property>
 <property>
   <name>mapred.output.compression.type</name>
   <value>BLOCK</value>
 </property>

Is this all I need to do? Or do I need to write it in a different format?
and is there a way to load the snappy compressed json data or do I need to
implement a new load function?

any help is much appreciated.

Thanks

Re: Snappy Compression Json Data

Posted by Raghu Angadi <ra...@apache.org>.
if 'STORE' worked, LOAD should work fine too.

On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <cg...@gmail.com>wrote:

> Hi
>
> I currently have a bunch of data in json format in hdfs. I would like to
> use
> pig to load it dedupe it and store it back using snappy compression.
>
> Currently I do something like this.
>
> raw = LOAD '$INPUT' USING PigJsonLoader();
> uniq = DISTINCT raw;
> STORE uniq INTO '$OUTPUT' USING PigStorage();
>
> If I add the following to the pig job it seems to write the files with a
> '.snappy' extension
>
> <property>
>  <name>mapred.output.compress</name>
>  <value>true</value>
> </property>
> <property>
>  <name>mapred.output.compression.codec</name>
>  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>  </property>
>  <property>
>   <name>mapred.output.compression.type</name>
>   <value>BLOCK</value>
>  </property>
>
> Is this all I need to do? Or do I need to write it in a different format?
> and is there a way to load the snappy compressed json data or do I need to
> implement a new load function?
>
> any help is much appreciated.
>
> Thanks
>