You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Earl Cahill <ca...@yahoo.com> on 2008/10/05 10:05:32 UTC

RegExLoader, MyRegExLoader, CommonLogLoader, formatting file

Howdy,

I was well on my way to writing a class that would load logs built from apache's common log format, as has been discussed on this list.  Then it hit me that really, I was just loading based on a regex, so I rearchitected to the following classes

RegExLoader - implements ReversibleLoadStoreFunc and does the heavy lifting of loading.  Is an abstract class, with just one abstract method getPattern() which returns the pattern you would like to load with.  On a successful match, it walks through the find groups and adds each group to a tuple list.

CommonLogLoader - implements RegExLoader with a regex for apache's common log format

MyRegExLoader - implements RegExLoader and allows for arbitrary regexes from pig latin like

A = LOAD 'file:test.txt' USING org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)');

I added unit tests and some simple java docs, but didn't want to go too far till I got a little feedback.  I attached a patch file, and hope I did it right.

I also attached a eclipse formatter file.  A former co-worker went through sun's standards and several others where sun's standard was silent and made an eclipse formatter file.  My team has been using it and I like the idea.  It makes our code look rather similar and there are many style decisions to make besides four space indent.  Anyway, even if all you pig folks don't like this particular file, I definitely suggest having one for the project.

In case the files don't make it, I also put them here

http://pig.holaservers.com/patch.txt
http://pig.holaservers.com/formatter.xml

Sorry for the potential double post, but I don't think I have yet to have something actually make it to pig-dev.

Enjoy!
Earl



      

Re: RegExLoader, MyRegExLoader, CommonLogLoader, formatting file

Posted by Alan Gates <ga...@yahoo-inc.com>.
Earl,

Thanks for the contributions.  The way to do this is open a JIRA at 
http://issues.apache.org/jira/browse/PIG and attach your patch to that.  
We can then download the patch, review it, test it, etc.

Alan.

Earl Cahill wrote:
> Howdy,
>
> I was well on my way to writing a class that would load logs built 
> from apache's common log format, as has been discussed on this list.  
> Then it hit me that really, I was just loading based on a regex, so I 
> rearchitected to the following classes
>
> RegExLoader - implements ReversibleLoadStoreFunc and does the heavy 
> lifting of loading.  Is an abstract class, with just one abstract 
> method getPattern() which returns the pattern you would like to load 
> with.  On a successful match, it walks through the find groups and 
> adds each group to a tuple list.
>
> CommonLogLoader - implements RegExLoader with a regex for apache's 
> common log format
>
> MyRegExLoader - implements RegExLoader and allows for arbitrary 
> regexes from pig latin like
>
> A = LOAD 'file:test.txt' USING 
> org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)');
>
> I added unit tests and some simple java docs, but didn't want to go 
> too far till I got a little feedback.  I attached a patch file, and 
> hope I did it right.
>
> I also attached a eclipse formatter file.  A former co-worker went 
> through sun's standards and several others where sun's standard was 
> silent and made an eclipse formatter file.  My team has been using it 
> and I like the idea.  It makes our code look rather similar and there 
> are many style decisions to make besides four space indent.  Anyway, 
> even if all you pig folks don't like this particular file, I 
> definitely suggest having one for the project.
>
> In case the files don't make it, I also put them here
>
> http://pig.holaservers.com/patch.txt
> http://pig.holaservers.com/formatter.xml
>
> Sorry for the potential double post, but I don't think I have yet to 
> have something actually make it to pig-dev.
>
> Enjoy!
> Earl
>