You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Terry Healy <th...@bnl.gov> on 2013/03/06 23:06:32 UTC

Using CommonLogLoader for Bluecoat logs

Hello-

Using Pig Version  0.10.0

I'm trying to import BlueCoat proxy logs using Pig, and having
difficulty with field delimiters (Multiple spaces, spaces embedded
withing browser description string, etc.) This seems like it would be a
common Pig application.

I found an old example using ...piggybank.evaluation.string.EXTRACT that
seemed to fit the bill, but alas, EXTRACT has been deprecated.

Extract allowed REGEX like parsing, so I checked out
...piggybank.storage.RegExLoader, and then
...piggybank.storage.apachelog.CommonLogLoader.

I am unable to translate what the Usage comments mean. Basically, I
don't see where I set the REGEX pattern string for my needs.

Can anyone explain the usage of either of these classes or point me to
an example?

Thanks,

Terry


RE: Using CommonLogLoader for Bluecoat logs

Posted by Brent Black <Br...@demandmedia.com>.
You basically need to create a new loader function in java that extends the RegExLoader class and then override the getPattern method and it will return the regex Pattern object/string that you want to use.  The CommonLogLoader function in Piggybank provides an example of how to do this.  The source code for that class is below..

http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog/CommonLogLoader.java


You may also be able to just use the REGEX_EXTRACT_ALL function that is built into Pig with the TextLoader function to do this.

http://pig.apache.org/docs/r0.11.0/func.html#regex-extract-all


------------------------------
Brent Black
Sr. Data Warehouse Engineer
------------------------------
Demand Media
5808 Lake Washington Blvd.
Suite 300
Kirkland, WA. 98034
www.demandmedia.com
brent.black@demandmedia.com
425-298-2376
---------------------------------


-----Original Message-----
From: Terry Healy [mailto:thealy@bnl.gov]
Sent: Wednesday, March 06, 2013 2:07 PM
To: user@pig.apache.org
Subject: Using CommonLogLoader for Bluecoat logs


Hello-

Using Pig Version  0.10.0

I'm trying to import BlueCoat proxy logs using Pig, and having difficulty with field delimiters (Multiple spaces, spaces embedded withing browser description string, etc.) This seems like it would be a common Pig application.

I found an old example using ...piggybank.evaluation.string.EXTRACT that seemed to fit the bill, but alas, EXTRACT has been deprecated.

Extract allowed REGEX like parsing, so I checked out ...piggybank.storage.RegExLoader, and then ...piggybank.storage.apachelog.CommonLogLoader.

I am unable to translate what the Usage comments mean. Basically, I don't see where I set the REGEX pattern string for my needs.

Can anyone explain the usage of either of these classes or point me to an example?

Thanks,

Terry


Please NOTE: This electronic message, including any attachments, may include privileged, confidential and/or inside information owned by Demand Media, Inc. Any distribution or use of this communication by anyone other than the intended recipient(s) is strictly prohibited and may be unlawful.  If you are not the intended recipient, please notify the sender by replying to this message and then delete it from your system. Thank you.