You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "dirk.mst@gmail.com" <di...@gmail.com> on 2011/06/17 10:44:45 UTC

CombinedLogLoader with .gz support?

Hello Pig mailing list,

I have around 10 TB of apache log files (1 TB as .gz compressed files)
and analyze these files with pig.
Obviously apache log files can be compressed pretty good with gzip, so
it would be great if Pig would accept the log files in compressed
form.

Is this possible with the CombinedLogLoader from contrib/piggybank or
is there any other way to do this? It is pretty easy with the normal
TextLoader. It automatically detects if the file is a .gz file.

If there is no way, would the RegExLoader be the correct class to extend?

Regards
Dirk

Re: CombinedLogLoader with .gz support?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Dirk, if you look at the code for pigStorage, you'll see some code in there that looks at file names and chooses the right input format to use based on that. You should just add the same thing to regexloader.  

On Jun 17, 2011, at 1:44 AM, "dirk.mst@gmail.com" <di...@gmail.com> wrote:

> Hello Pig mailing list,
> 
> I have around 10 TB of apache log files (1 TB as .gz compressed files)
> and analyze these files with pig.
> Obviously apache log files can be compressed pretty good with gzip, so
> it would be great if Pig would accept the log files in compressed
> form.
> 
> Is this possible with the CombinedLogLoader from contrib/piggybank or
> is there any other way to do this? It is pretty easy with the normal
> TextLoader. It automatically detects if the file is a .gz file.
> 
> If there is no way, would the RegExLoader be the correct class to extend?
> 
> Regards
> Dirk