You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sameer Tilak <ss...@live.com> on 2013/10/23 21:30:17 UTC

XML files and Sequencefile

Hi There,

I have a lot of small (~0.5 MB to 3 MB) XML files that I would like to 
process using Apache Pig. Since dealing with a lot of small files is problematic , I was thinking of creating SeqeunceFiles such that each sequence file between 60 to 64 MB and no XML file is split onto 2 Sequence Files. Is there any utility that does the storing and loading of these files from Pig. I can for example create a Pig job that would read these XML files and generates few large sequence files  such that XML file is split onto 2 Sequence Files. I will then write another Pig job that will load these sequence files and then analyze them. Each of these XML files contains a lot of information for a given entity and the nesting can be quite deep. Any help with this would be great. 

 		 	   		  

Re: XML files and Sequencefile

Posted by Shahab Yunus <sh...@gmail.com>.
Have you looked at SequenceFileLoader for Pig?
http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html

Regards,
Shahab


On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak <ss...@live.com> wrote:

> Hi There,
>
> I have a lot of small (~0.5 MB to 3 MB) XML files that I would like to
> process using Apache Pig. Since dealing with a lot of small files is
> problematic , I was thinking of creating SeqeunceFiles such that each
> sequence file between 60 to 64 MB and no XML file is split onto 2 Sequence
> Files. Is there any utility that does the storing and loading of these
> files from Pig. I can for example create a Pig job that would read these
> XML files and generates few large sequence files  such that XML file is
> split onto 2 Sequence Files. I will then write another Pig job that will
> load these sequence files and then analyze them. Each of these XML files
> contains a lot of information for a given entity and the nesting can be
> quite deep. Any help with this would be great.
>
>