You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jr <jo...@io-consulting.net> on 2010/04/08 12:08:14 UTC

SPLIT with matches / not matches

Hello everybody,
I'm trying to split apache log data into two output sets, one for all
uris that match a certain criterie and one for uris that don't match a
certain criteria.
I've been trying 

SPLIT list INTO downloads IF uri matches '/downloads/.*\\.exe', pages if
(NOT uri matches '/downloads/.*\\.exe');

but this doesn't seem to do the trick (downloads is fine, but pages
contains the downloads too :/)
Any hints on how to do this?

Johannes


Re: SPLIT with matches / not matches

Posted by Rekha Joshi <re...@yahoo-inc.com>.
I suppose there are some issues with split. It would be ideal to have your input set, but did you try with 2-step filter operation instead of split and got the correct answer?If not, I would look at the regex expr.

On 4/8/10 3:38 PM, "jr" <jo...@io-consulting.net> wrote:

Hello everybody,
I'm trying to split apache log data into two output sets, one for all
uris that match a certain criterie and one for uris that don't match a
certain criteria.
I've been trying

SPLIT list INTO downloads IF uri matches '/downloads/.*\\.exe', pages if
(NOT uri matches '/downloads/.*\\.exe');

but this doesn't seem to do the trick (downloads is fine, but pages
contains the downloads too :/)