You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by James Kebinger <jk...@gmail.com> on 2012/09/27 17:54:23 UTC
Using matches in generate clause?
Hello, I'm having some trouble doing something I thought would be easy: I'd
like to use matches to generate a boolean flag but this seems to not
compile:
FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
wp_match:boolean;
I've tried wrapping it in parens too, with no luck.
Is this possible, or am I out of luck?
thanks
Re: Using matches in generate clause?
Posted by James Kebinger <jk...@gmail.com>.
That was pig 0.10.
This line:
matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*') as wp_match:boolean;
Gives me the error
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 93> Syntax
error, unexpected symbol at or near 'html'
Taking off the parens
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 97>
mismatched input 'matches' expecting SEMI_COLON
and converting to an int as suggested later in the thread:
matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*' ? 1 : 0) as
wp_match:int;
does work. So the int approach is a nice work around
On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <ga...@hortonworks.com> wrote:
> What version of Pig are you using?
>
> Alan.
>
> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>
> > Hello, I'm having some trouble doing something I thought would be easy:
> I'd
> > like to use matches to generate a boolean flag but this seems to not
> > compile:
> >
> > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> > wp_match:boolean;
> >
> > I've tried wrapping it in parens too, with no luck.
> >
> > Is this possible, or am I out of luck?
> >
> > thanks
>
>
Re: Using matches in generate clause?
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
With Pig 0.9 you can do this, though:
FOREACH html_pages GENERATE portal_id, (html matches 'some pattern' ? 1 :
0) as
wp_match:int;
On Thu, Sep 27, 2012 at 10:38 AM, Alan Gates <ga...@hortonworks.com> wrote:
> In Pig 0.9 boolean was not yet a first class data type, so boolean types
> were not allowed in foreach statements. In Pig 0.10 boolean became a first
> class type, so expressions that return booleans (such as matches) should
> work.
>
> Alan.
>
>
> On Sep 27, 2012, at 10:34 AM, pablomar wrote:
>
> > no idea why, but matches works with FILTER but it doesn't with FOREACH
> > I've tried with pig 0.9.2
> >
> > example (this works):
> > b = filter html_pages by html matches 'some pattern';
> >
> >
> > if you still want to do it with foreach, you can write your UDF,
> something
> > like:
> >
> > public class MyMatch extends EvalFunc <Boolean>
> > {
> > public Boolean exec(Tuple input) throws IOException
> > {
> > try
> > {
> > String pattern = (String)input.get(0);
> > String value = (String)input.get(1);
> >
> > return value.matches(pattern);
> > }
> > catch(Exception e)
> > {
> > throw WrappedIOException.wrap("ouch!", e);
> > }
> > }
> > }
> >
> >
> > and use it just like this:
> >
> > b = foreach html_pages generate portal_id, MyMatch('some pattern', html)
> as
> > wp_match;
> >
> >
> >
> >
> > On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <ga...@hortonworks.com>
> wrote:
> >
> >> What version of Pig are you using?
> >>
> >> Alan.
> >>
> >> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
> >>
> >>> Hello, I'm having some trouble doing something I thought would be easy:
> >> I'd
> >>> like to use matches to generate a boolean flag but this seems to not
> >>> compile:
> >>>
> >>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> >>> wp_match:boolean;
> >>>
> >>> I've tried wrapping it in parens too, with no luck.
> >>>
> >>> Is this possible, or am I out of luck?
> >>>
> >>> thanks
> >>
> >>
>
>
Re: Using matches in generate clause?
Posted by Alan Gates <ga...@hortonworks.com>.
In Pig 0.9 boolean was not yet a first class data type, so boolean types were not allowed in foreach statements. In Pig 0.10 boolean became a first class type, so expressions that return booleans (such as matches) should work.
Alan.
On Sep 27, 2012, at 10:34 AM, pablomar wrote:
> no idea why, but matches works with FILTER but it doesn't with FOREACH
> I've tried with pig 0.9.2
>
> example (this works):
> b = filter html_pages by html matches 'some pattern';
>
>
> if you still want to do it with foreach, you can write your UDF, something
> like:
>
> public class MyMatch extends EvalFunc <Boolean>
> {
> public Boolean exec(Tuple input) throws IOException
> {
> try
> {
> String pattern = (String)input.get(0);
> String value = (String)input.get(1);
>
> return value.matches(pattern);
> }
> catch(Exception e)
> {
> throw WrappedIOException.wrap("ouch!", e);
> }
> }
> }
>
>
> and use it just like this:
>
> b = foreach html_pages generate portal_id, MyMatch('some pattern', html) as
> wp_match;
>
>
>
>
> On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
>> What version of Pig are you using?
>>
>> Alan.
>>
>> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>>
>>> Hello, I'm having some trouble doing something I thought would be easy:
>> I'd
>>> like to use matches to generate a boolean flag but this seems to not
>>> compile:
>>>
>>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
>>> wp_match:boolean;
>>>
>>> I've tried wrapping it in parens too, with no luck.
>>>
>>> Is this possible, or am I out of luck?
>>>
>>> thanks
>>
>>
Re: Using matches in generate clause?
Posted by pablomar <pa...@gmail.com>.
no idea why, but matches works with FILTER but it doesn't with FOREACH
I've tried with pig 0.9.2
example (this works):
b = filter html_pages by html matches 'some pattern';
if you still want to do it with foreach, you can write your UDF, something
like:
public class MyMatch extends EvalFunc <Boolean>
{
public Boolean exec(Tuple input) throws IOException
{
try
{
String pattern = (String)input.get(0);
String value = (String)input.get(1);
return value.matches(pattern);
}
catch(Exception e)
{
throw WrappedIOException.wrap("ouch!", e);
}
}
}
and use it just like this:
b = foreach html_pages generate portal_id, MyMatch('some pattern', html) as
wp_match;
On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <ga...@hortonworks.com> wrote:
> What version of Pig are you using?
>
> Alan.
>
> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>
> > Hello, I'm having some trouble doing something I thought would be easy:
> I'd
> > like to use matches to generate a boolean flag but this seems to not
> > compile:
> >
> > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> > wp_match:boolean;
> >
> > I've tried wrapping it in parens too, with no luck.
> >
> > Is this possible, or am I out of luck?
> >
> > thanks
>
>
Re: Using matches in generate clause?
Posted by Alan Gates <ga...@hortonworks.com>.
What version of Pig are you using?
Alan.
On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
> Hello, I'm having some trouble doing something I thought would be easy: I'd
> like to use matches to generate a boolean flag but this seems to not
> compile:
>
> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> wp_match:boolean;
>
> I've tried wrapping it in parens too, with no luck.
>
> Is this possible, or am I out of luck?
>
> thanks