You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by John Meek <jo...@aol.com> on 2013/03/10 03:57:54 UTC

Pig Regex Help

hi, 

I m trying to use the following statement in Pig to parse out my data.

B = FOREACH A GENERATE FLATTEN(
REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);

The input is basically a file with values in the following format:
a02s6pq0s1t-dl  20130106-UX    32
johnm-dl  20130106-DX    32

I need the output to be 6 columns like below:

a02s6pq0s1t dl  20130106 U X 32 
johnm dl  20130106 D X 32

Pig is giving me (). Please help.


John M

Re: Pig Regex Help

Posted by John Meek <jo...@aol.com>.
Harsha, thanks for your response. I needed to use USING PigStorage(',' ) in my load statement. Works now. 
 

 

 

-----Original Message-----
From: Harsha <ha...@defun.org>
To: user <us...@pig.apache.org>
Sent: Sat, Mar 9, 2013 10:40 pm
Subject: Re: Pig Regex Help


Hi John, 
     I ran these in pig 0.9.2
     A = LOAD 'data' as line:chararray; 
     B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) 
AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
 dump B; 
gives me following
(a02s6pq0s1t,dl,20130106,U,X,32)
(johnm,dl,20130106,D,X,32)


which version of pig you are running.
--
Harsha


On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:

> B = FOREACH A GENERATE FLATTEN(
> REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS 
(Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
> 


 

Re: Pig Regex Help

Posted by John Meek <jo...@aol.com>.
hi Harsha,

Running release 0.11.0. Thanks.
 

 

 

-----Original Message-----
From: Harsha <ha...@defun.org>
To: user <us...@pig.apache.org>
Sent: Sat, Mar 9, 2013 10:40 pm
Subject: Re: Pig Regex Help


Hi John, 
     I ran these in pig 0.9.2
     A = LOAD 'data' as line:chararray; 
     B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) 
AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
 dump B; 
gives me following
(a02s6pq0s1t,dl,20130106,U,X,32)
(johnm,dl,20130106,D,X,32)


which version of pig you are running.
--
Harsha


On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:

> B = FOREACH A GENERATE FLATTEN(
> REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS 
(Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
> 


 

Re: Pig Regex Help

Posted by Harsha <ha...@defun.org>.
Hi John, 
     I ran these in pig 0.9.2
     A = LOAD 'data' as line:chararray; 
     B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
 dump B; 
gives me following
(a02s6pq0s1t,dl,20130106,U,X,32)
(johnm,dl,20130106,D,X,32)


which version of pig you are running.
--
Harsha


On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:

> B = FOREACH A GENERATE FLATTEN(
> REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
>