You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Hoover <jh...@yahoo-inc.com> on 2011/04/20 01:00:24 UTC

regex_extract - escaping characters

Hello,

I am having a problem escaping a ":" and a "." in a regular expression within the REGEX_EXTRACT() function shown at http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's a simplified example, though the example in the docs gives me the problem as well. I've tried it without the "\" in front of the ":", but that doesn't work right either (returns the whole line). So, how do I escape the ":", and also I need to escape a "." as well in my actual script.

------INPUT FILE------
hi:1    num1    num2    num3
hi:20   num1    blah    boo
ho:30   num1    blah    foo
bar:30  foo     foo     foo
bar:40  foo     far     away
bar:40  far     far     far

------PIG SCRIPT------
a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
DUMP b;

------WHAT I EXPECT---
(hi:1,num1,num2,num3)
(hi:20,num1,blah,foo)

------ERROR I GET-----
2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 40.  Encountered: ":" (58), after : "\'(.*)\\"

------PIG VERSION-----
Apache Pig version 0.8.0..1103222002 (r1084466)

Re: regex_extract - escaping characters

Posted by Sven Krasser <kr...@gmail.com>.
Hey Jonathan,

You need to escape the backslash as well (it has a meaning in the string
literals in Pig):

b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\:(.*)',1) == 'hi';

If you'd want to escape a single backslash, it'd become '\\\\'.

Best,
-Sven


On Tue, Apr 19, 2011 at 4:00 PM, Jonathan Hoover <jh...@yahoo-inc.com>wrote:

> Hello,
>
> I am having a problem escaping a ":" and a "." in a regular expression
> within the REGEX_EXTRACT() function shown at
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's
> a simplified example, though the example in the docs gives me the problem as
> well. I've tried it without the "\" in front of the ":", but that doesn't
> work right either (returns the whole line). So, how do I escape the ":", and
> also I need to escape a "." as well in my actual script.
>
> ------INPUT FILE------
> hi:1    num1    num2    num3
> hi:20   num1    blah    boo
> ho:30   num1    blah    foo
> bar:30  foo     foo     foo
> bar:40  foo     far     away
> bar:40  far     far     far
>
> ------PIG SCRIPT------
> a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
> b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
> DUMP b;
>
> ------WHAT I EXPECT---
> (hi:1,num1,num2,num3)
> (hi:20,num1,blah,foo)
>
> ------ERROR I GET-----
> 2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Lexical error at line 1, column 40.
>  Encountered: ":" (58), after : "\'(.*)\\"
>
> ------PIG VERSION-----
> Apache Pig version 0.8.0..1103222002 (r1084466)
>