You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Hoover <jh...@yahoo-inc.com> on 2011/04/20 01:00:24 UTC
regex_extract - escaping characters
Hello,
I am having a problem escaping a ":" and a "." in a regular expression within the REGEX_EXTRACT() function shown at http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's a simplified example, though the example in the docs gives me the problem as well. I've tried it without the "\" in front of the ":", but that doesn't work right either (returns the whole line). So, how do I escape the ":", and also I need to escape a "." as well in my actual script.
------INPUT FILE------
hi:1 num1 num2 num3
hi:20 num1 blah boo
ho:30 num1 blah foo
bar:30 foo foo foo
bar:40 foo far away
bar:40 far far far
------PIG SCRIPT------
a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
DUMP b;
------WHAT I EXPECT---
(hi:1,num1,num2,num3)
(hi:20,num1,blah,foo)
------ERROR I GET-----
2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 40. Encountered: ":" (58), after : "\'(.*)\\"
------PIG VERSION-----
Apache Pig version 0.8.0..1103222002 (r1084466)
Re: regex_extract - escaping characters
Posted by Sven Krasser <kr...@gmail.com>.
Hey Jonathan,
You need to escape the backslash as well (it has a meaning in the string
literals in Pig):
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\:(.*)',1) == 'hi';
If you'd want to escape a single backslash, it'd become '\\\\'.
Best,
-Sven
On Tue, Apr 19, 2011 at 4:00 PM, Jonathan Hoover <jh...@yahoo-inc.com>wrote:
> Hello,
>
> I am having a problem escaping a ":" and a "." in a regular expression
> within the REGEX_EXTRACT() function shown at
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's
> a simplified example, though the example in the docs gives me the problem as
> well. I've tried it without the "\" in front of the ":", but that doesn't
> work right either (returns the whole line). So, how do I escape the ":", and
> also I need to escape a "." as well in my actual script.
>
> ------INPUT FILE------
> hi:1 num1 num2 num3
> hi:20 num1 blah boo
> ho:30 num1 blah foo
> bar:30 foo foo foo
> bar:40 foo far away
> bar:40 far far far
>
> ------PIG SCRIPT------
> a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
> b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
> DUMP b;
>
> ------WHAT I EXPECT---
> (hi:1,num1,num2,num3)
> (hi:20,num1,blah,foo)
>
> ------ERROR I GET-----
> 2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Lexical error at line 1, column 40.
> Encountered: ":" (58), after : "\'(.*)\\"
>
> ------PIG VERSION-----
> Apache Pig version 0.8.0..1103222002 (r1084466)
>