You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cameron Gandevia <cg...@gmail.com> on 2011/10/14 03:29:20 UTC
Snappy Compression Json Data
Hi
I currently have a bunch of data in json format in hdfs. I would like to use
pig to load it dedupe it and store it back using snappy compression.
Currently I do something like this.
raw = LOAD '$INPUT' USING PigJsonLoader();
uniq = DISTINCT raw;
STORE uniq INTO '$OUTPUT' USING PigStorage();
If I add the following to the pig job it seems to write the files with a
'.snappy' extension
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
Is this all I need to do? Or do I need to write it in a different format?
and is there a way to load the snappy compressed json data or do I need to
implement a new load function?
any help is much appreciated.
Thanks
Re: Snappy Compression Json Data
Posted by Raghu Angadi <ra...@apache.org>.
if 'STORE' worked, LOAD should work fine too.
On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <cg...@gmail.com>wrote:
> Hi
>
> I currently have a bunch of data in json format in hdfs. I would like to
> use
> pig to load it dedupe it and store it back using snappy compression.
>
> Currently I do something like this.
>
> raw = LOAD '$INPUT' USING PigJsonLoader();
> uniq = DISTINCT raw;
> STORE uniq INTO '$OUTPUT' USING PigStorage();
>
> If I add the following to the pig job it seems to write the files with a
> '.snappy' extension
>
> <property>
> <name>mapred.output.compress</name>
> <value>true</value>
> </property>
> <property>
> <name>mapred.output.compression.codec</name>
> <value>org.apache.hadoop.io.compress.SnappyCodec</value>
> </property>
> <property>
> <name>mapred.output.compression.type</name>
> <value>BLOCK</value>
> </property>
>
> Is this all I need to do? Or do I need to write it in a different format?
> and is there a way to load the snappy compressed json data or do I need to
> implement a new load function?
>
> any help is much appreciated.
>
> Thanks
>