You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Krishnan K <kk...@gmail.com> on 2014/05/26 02:34:19 UTC

Reading SequenceFile

Hi I'm trying to load a sequence file compressed with GZipCodec from HDFS
into Pig USING org.apache.pig.piggybank.storage.SequenceFileLoader() from
the piggybank-0.12.jar file.

*The file format is : *
*SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec*

*This file is in HIVE and I'm able to see the data for all the columns
correctly.*



*A = LOAD '/user/a/test/part-r-00000' USING
org.apache.pig.piggybank.storage.SequenceFileLoader() AS
(user_id:chararray,flwd_id:chararray,intrst_id:chararray,vsblty_id:chararray);*

*STORE A into '/user/a/test/output' using PigStorage(',');*

After I load into a variable and dump/store the variable, I see that the
fields are all concatenated and some records are truncated.

Please let me know if this is the right way to read a sequencefile with
Gzip (created using HIVE) into Pig.

Thanks!!

Re: Reading SequenceFile

Posted by abhishek dodda <ab...@gmail.com>.
Please try this.Elephant bird project for reading sequence files

https://github.com/kevinweil/elephant-bird

You can get this jars from maven central repository

http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar

REGISTER /home/xyz/elephant-bird-pig-4.5.jar;
REGISTER /home/xyz/elephant-bird-pig-4.5-sources.jar;
REGISTER /home/xyz/elephant-bird-core-4.5-sources.jar;
REGISTER /home/xyz/elephant-bird-core-4.5.jar;
REGISTER /home/xyz/elephant-bird-hadoop-compat-4.5.jar;

 A = load '/etl/table=04' using
com.twitter.elephantbird.pig.load.SequenceFileLoader
('-c com.twitter.elephantbird.pig.util.NullWritableConverter','-c
com.twitter.elephantbird.pig.util.TextConverter')
AS (key,value:chararray);



On Sun, May 25, 2014 at 5:34 PM, Krishnan K <kk...@gmail.com> wrote:

> Hi I'm trying to load a sequence file compressed with GZipCodec from HDFS
> into Pig USING org.apache.pig.piggybank.storage.SequenceFileLoader() from
> the piggybank-0.12.jar file.
>
> *The file format is : *
>
> *SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec*
>
> *This file is in HIVE and I'm able to see the data for all the columns
> correctly.*
>
>
>
> *A = LOAD '/user/a/test/part-r-00000' USING
> org.apache.pig.piggybank.storage.SequenceFileLoader() AS
>
> (user_id:chararray,flwd_id:chararray,intrst_id:chararray,vsblty_id:chararray);*
>
> *STORE A into '/user/a/test/output' using PigStorage(',');*
>
> After I load into a variable and dump/store the variable, I see that the
> fields are all concatenated and some records are truncated.
>
> Please let me know if this is the right way to read a sequencefile with
> Gzip (created using HIVE) into Pig.
>
> Thanks!!
>



-- 
Thanks,
Abhishek
2018509769