You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Duckworth, Will" <wd...@comscore.com> on 2012/08/17 16:03:48 UTC

Counters from Python UDF

Has anyone poked around to see if there is there a way to create / increment counters from a Python UDFs?  Thanks.



Will Duckworth Senior Vice President, Software Engineering | comScore, Inc. (NASDAQ:SCOR)

o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com<ma...@comscore.com>

...........................................................................................................

Introducing Mobile Metrix 2.0 - The next generation of mobile behavioral measurement
www.comscore.com/MobileMetrix<http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_2.0>



Re: Counters from Python UDF

Posted by Jonathan Coveney <jc...@gmail.com>.
That's good to know! Thanks for following up with that, Will. I guess
there's no reason "incrCounter" can't be static.

2012/8/24 Duckworth, Will <wd...@comscore.com>

> Code below works against trunk.
>
> Apache Pig version 0.11.0-SNAPSHOT (r1372967)
> compiled Aug 14 2012, 15:31:10
>
> pig -f test_counter.pig -p in_path=/path/to/file/test_file.gz -p
> job_name=counter_test
>
> *** test_counter.py
> from org.apache.pig.tools.counters import PigCounterHelper
>
> @outputSchema("line:chararray")
> def testCounter(line):
>         counter = PigCounterHelper()
>         counter.incrCounter("Test","udfcounter",1)
>         return line
>
> *** test_counter.pig
> -- $in_path
> -- $job_name
>
> SET job.name '$job_name';
>
> REGISTER '/path/to/python_file/test_counter.py' USING jython AS udf;
>
> A = load '$in_path' using PigStorage('\n') as (line:chararray);
>
> A2 = foreach A generate udf.testCounter(line) as line;
> A3 = limit A2 10;
> dump A3;
>
>
>
>
> Will Duckworth  Senior Vice President, Software Engineering  | comScore,
> Inc.(NASDAQ:SCOR)
> o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:wduckworth@comscore.com
>
> .....................................................................................................
> -----Original Message-----
> From: Jonathan Coveney [mailto:jcoveney@gmail.com]
> Sent: Friday, August 24, 2012 1:31 PM
> To: user@pig.apache.org
> Subject: Re: Counters from Python UDF
>
> I think adding a method to jython/jruby is absolutely the way to go
>
> 2012/8/24 Aniket Mokashi <an...@gmail.com>
>
> > I used following in my python udf (on pig 0.9) after referring to -
> >
> > http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters
> > -in-apache-pig/
> >
> >
> > from org.apache.pig.tools.pigstats import PigStatusReporter reporter =
> > PigStatusReporter.getInstance();
> >
> > But, looks like, context is not set in pigreporter when udf is
> > invoked, so it fails. I think we need some caching logic similar to
> > PigCountersHelper, until something sets the context in
> > PigCountersHelper. I wonder how this works.
> >
> > We can add a helper udf at JythonScriptingEngine.init (or some such)
> > method to expose these elegantly. Thoughts?
> >
> > ~Aniket
> >
> > On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> >
> > > In trunk this should be possible (it's possible in 0.10 as well, I
> > > just
> > am
> > > not sure if PigCountersHelper is there). Either way, take a look at
> > > PigCountersHelper. All you have to do is instantiate a copy in your
> > > UDF
> > and
> > > use it from there.
> > >
> > > This hinges on all of the static stuff that Pig relies on working...
> > > I think that the way that we invoke these scripting languages should
> > > work, but this will verify that :)
> > >
> > > 2012/8/23 Duckworth, Will <wd...@comscore.com>
> > >
> > > > This may be a better question for the DEV list but ... Is it even
> > > possible
> > > > / feasible.  Could it be done by calling the Java classes from
> > > > within Jython?
> > > >
> > > > I guess I would ask the same about algebraic and accumulator UDF
> > > > which
> > I
> > > > know are available in Ruby.
> > > >
> > > > -----Original Message-----
> > > > From: Aniket Mokashi [mailto:aniket486@gmail.com]
> > > > Sent: Friday, August 17, 2012 5:54 PM
> > > > To: user@pig.apache.org
> > > > Subject: Re: Counters from Python UDF
> > > >
> > > > I dont think there is a way at this point. You may have to open a
> jira.
> > > >
> > > > Thanks,
> > > > Aniket
> > > >
> > > > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> > > wduckworth@comscore.com
> > > > >wrote:
> > > >
> > > > > Has anyone poked around to see if there is there a way to create
> > > > > / increment counters from a Python UDFs?  Thanks.
> > > > >
> > > > >
> > > > >
> > > > > Will Duckworth Senior Vice President, Software Engineering |
> > comScore,
> > > > > Inc. (NASDAQ:SCOR)
> > > > >
> > > > > o +1 (703) 438-2108 | m +1 (301) 606-2977 |
> > > > > wduckworth@comscore.com <ma...@comscore.com>
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ...........................................................................................................
> > > > >
> > > > > Introducing Mobile Metrix 2.0 - The next generation of mobile
> > > > > behavioral measurement www.comscore.com/MobileMetrix<
> > > > >
> > http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> > > > > 2.0>
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > "...:::Aniket:::... Quetzalco@tl"
> > > >
> > >
> >
> >
> >
> > --
> > "...:::Aniket:::... Quetzalco@tl"
> >
>

RE: Counters from Python UDF

Posted by "Duckworth, Will" <wd...@comscore.com>.
Code below works against trunk.

Apache Pig version 0.11.0-SNAPSHOT (r1372967)
compiled Aug 14 2012, 15:31:10

pig -f test_counter.pig -p in_path=/path/to/file/test_file.gz -p job_name=counter_test

*** test_counter.py
from org.apache.pig.tools.counters import PigCounterHelper

@outputSchema("line:chararray")
def testCounter(line):
        counter = PigCounterHelper()
        counter.incrCounter("Test","udfcounter",1)
        return line

*** test_counter.pig
-- $in_path
-- $job_name

SET job.name '$job_name';

REGISTER '/path/to/python_file/test_counter.py' USING jython AS udf;

A = load '$in_path' using PigStorage('\n') as (line:chararray);

A2 = foreach A generate udf.testCounter(line) as line;
A3 = limit A2 10;
dump A3;




Will Duckworth  Senior Vice President, Software Engineering  | comScore, Inc.(NASDAQ:SCOR)
o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:wduckworth@comscore.com
.....................................................................................................
-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com]
Sent: Friday, August 24, 2012 1:31 PM
To: user@pig.apache.org
Subject: Re: Counters from Python UDF

I think adding a method to jython/jruby is absolutely the way to go

2012/8/24 Aniket Mokashi <an...@gmail.com>

> I used following in my python udf (on pig 0.9) after referring to -
>
> http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters
> -in-apache-pig/
>
>
> from org.apache.pig.tools.pigstats import PigStatusReporter reporter =
> PigStatusReporter.getInstance();
>
> But, looks like, context is not set in pigreporter when udf is
> invoked, so it fails. I think we need some caching logic similar to
> PigCountersHelper, until something sets the context in
> PigCountersHelper. I wonder how this works.
>
> We can add a helper udf at JythonScriptingEngine.init (or some such)
> method to expose these elegantly. Thoughts?
>
> ~Aniket
>
> On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > In trunk this should be possible (it's possible in 0.10 as well, I
> > just
> am
> > not sure if PigCountersHelper is there). Either way, take a look at
> > PigCountersHelper. All you have to do is instantiate a copy in your
> > UDF
> and
> > use it from there.
> >
> > This hinges on all of the static stuff that Pig relies on working...
> > I think that the way that we invoke these scripting languages should
> > work, but this will verify that :)
> >
> > 2012/8/23 Duckworth, Will <wd...@comscore.com>
> >
> > > This may be a better question for the DEV list but ... Is it even
> > possible
> > > / feasible.  Could it be done by calling the Java classes from
> > > within Jython?
> > >
> > > I guess I would ask the same about algebraic and accumulator UDF
> > > which
> I
> > > know are available in Ruby.
> > >
> > > -----Original Message-----
> > > From: Aniket Mokashi [mailto:aniket486@gmail.com]
> > > Sent: Friday, August 17, 2012 5:54 PM
> > > To: user@pig.apache.org
> > > Subject: Re: Counters from Python UDF
> > >
> > > I dont think there is a way at this point. You may have to open a jira.
> > >
> > > Thanks,
> > > Aniket
> > >
> > > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> > wduckworth@comscore.com
> > > >wrote:
> > >
> > > > Has anyone poked around to see if there is there a way to create
> > > > / increment counters from a Python UDFs?  Thanks.
> > > >
> > > >
> > > >
> > > > Will Duckworth Senior Vice President, Software Engineering |
> comScore,
> > > > Inc. (NASDAQ:SCOR)
> > > >
> > > > o +1 (703) 438-2108 | m +1 (301) 606-2977 |
> > > > wduckworth@comscore.com <ma...@comscore.com>
> > > >
> > > >
> > > >
> > >
> >
> ...........................................................................................................
> > > >
> > > > Introducing Mobile Metrix 2.0 - The next generation of mobile
> > > > behavioral measurement www.comscore.com/MobileMetrix<
> > > >
> http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> > > > 2.0>
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > "...:::Aniket:::... Quetzalco@tl"
> > >
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Re: Counters from Python UDF

Posted by Jonathan Coveney <jc...@gmail.com>.
I think adding a method to jython/jruby is absolutely the way to go

2012/8/24 Aniket Mokashi <an...@gmail.com>

> I used following in my python udf (on pig 0.9) after referring to -
>
> http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/
>
>
> from org.apache.pig.tools.pigstats import PigStatusReporter
> reporter = PigStatusReporter.getInstance();
>
> But, looks like, context is not set in pigreporter when udf is invoked, so
> it fails. I think we need some caching logic similar to PigCountersHelper,
> until something sets the context in PigCountersHelper. I wonder how this
> works.
>
> We can add a helper udf at JythonScriptingEngine.init (or some such) method
> to expose these elegantly. Thoughts?
>
> ~Aniket
>
> On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > In trunk this should be possible (it's possible in 0.10 as well, I just
> am
> > not sure if PigCountersHelper is there). Either way, take a look at
> > PigCountersHelper. All you have to do is instantiate a copy in your UDF
> and
> > use it from there.
> >
> > This hinges on all of the static stuff that Pig relies on working... I
> > think that the way that we invoke these scripting languages should work,
> > but this will verify that :)
> >
> > 2012/8/23 Duckworth, Will <wd...@comscore.com>
> >
> > > This may be a better question for the DEV list but ... Is it even
> > possible
> > > / feasible.  Could it be done by calling the Java classes from within
> > > Jython?
> > >
> > > I guess I would ask the same about algebraic and accumulator UDF which
> I
> > > know are available in Ruby.
> > >
> > > -----Original Message-----
> > > From: Aniket Mokashi [mailto:aniket486@gmail.com]
> > > Sent: Friday, August 17, 2012 5:54 PM
> > > To: user@pig.apache.org
> > > Subject: Re: Counters from Python UDF
> > >
> > > I dont think there is a way at this point. You may have to open a jira.
> > >
> > > Thanks,
> > > Aniket
> > >
> > > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> > wduckworth@comscore.com
> > > >wrote:
> > >
> > > > Has anyone poked around to see if there is there a way to create /
> > > > increment counters from a Python UDFs?  Thanks.
> > > >
> > > >
> > > >
> > > > Will Duckworth Senior Vice President, Software Engineering |
> comScore,
> > > > Inc. (NASDAQ:SCOR)
> > > >
> > > > o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com
> > > > <ma...@comscore.com>
> > > >
> > > >
> > > >
> > >
> >
> ...........................................................................................................
> > > >
> > > > Introducing Mobile Metrix 2.0 - The next generation of mobile
> > > > behavioral measurement www.comscore.com/MobileMetrix<
> > > >
> http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> > > > 2.0>
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > "...:::Aniket:::... Quetzalco@tl"
> > >
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Re: Counters from Python UDF

Posted by Aniket Mokashi <an...@gmail.com>.
I used following in my python udf (on pig 0.9) after referring to -
http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/


from org.apache.pig.tools.pigstats import PigStatusReporter
reporter = PigStatusReporter.getInstance();

But, looks like, context is not set in pigreporter when udf is invoked, so
it fails. I think we need some caching logic similar to PigCountersHelper,
until something sets the context in PigCountersHelper. I wonder how this
works.

We can add a helper udf at JythonScriptingEngine.init (or some such) method
to expose these elegantly. Thoughts?

~Aniket

On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> In trunk this should be possible (it's possible in 0.10 as well, I just am
> not sure if PigCountersHelper is there). Either way, take a look at
> PigCountersHelper. All you have to do is instantiate a copy in your UDF and
> use it from there.
>
> This hinges on all of the static stuff that Pig relies on working... I
> think that the way that we invoke these scripting languages should work,
> but this will verify that :)
>
> 2012/8/23 Duckworth, Will <wd...@comscore.com>
>
> > This may be a better question for the DEV list but ... Is it even
> possible
> > / feasible.  Could it be done by calling the Java classes from within
> > Jython?
> >
> > I guess I would ask the same about algebraic and accumulator UDF which I
> > know are available in Ruby.
> >
> > -----Original Message-----
> > From: Aniket Mokashi [mailto:aniket486@gmail.com]
> > Sent: Friday, August 17, 2012 5:54 PM
> > To: user@pig.apache.org
> > Subject: Re: Counters from Python UDF
> >
> > I dont think there is a way at this point. You may have to open a jira.
> >
> > Thanks,
> > Aniket
> >
> > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> wduckworth@comscore.com
> > >wrote:
> >
> > > Has anyone poked around to see if there is there a way to create /
> > > increment counters from a Python UDFs?  Thanks.
> > >
> > >
> > >
> > > Will Duckworth Senior Vice President, Software Engineering | comScore,
> > > Inc. (NASDAQ:SCOR)
> > >
> > > o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com
> > > <ma...@comscore.com>
> > >
> > >
> > >
> >
> ...........................................................................................................
> > >
> > > Introducing Mobile Metrix 2.0 - The next generation of mobile
> > > behavioral measurement www.comscore.com/MobileMetrix<
> > > http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> > > 2.0>
> > >
> > >
> > >
> >
> >
> > --
> > "...:::Aniket:::... Quetzalco@tl"
> >
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Counters from Python UDF

Posted by Jonathan Coveney <jc...@gmail.com>.
In trunk this should be possible (it's possible in 0.10 as well, I just am
not sure if PigCountersHelper is there). Either way, take a look at
PigCountersHelper. All you have to do is instantiate a copy in your UDF and
use it from there.

This hinges on all of the static stuff that Pig relies on working... I
think that the way that we invoke these scripting languages should work,
but this will verify that :)

2012/8/23 Duckworth, Will <wd...@comscore.com>

> This may be a better question for the DEV list but ... Is it even possible
> / feasible.  Could it be done by calling the Java classes from within
> Jython?
>
> I guess I would ask the same about algebraic and accumulator UDF which I
> know are available in Ruby.
>
> -----Original Message-----
> From: Aniket Mokashi [mailto:aniket486@gmail.com]
> Sent: Friday, August 17, 2012 5:54 PM
> To: user@pig.apache.org
> Subject: Re: Counters from Python UDF
>
> I dont think there is a way at this point. You may have to open a jira.
>
> Thanks,
> Aniket
>
> On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <wduckworth@comscore.com
> >wrote:
>
> > Has anyone poked around to see if there is there a way to create /
> > increment counters from a Python UDFs?  Thanks.
> >
> >
> >
> > Will Duckworth Senior Vice President, Software Engineering | comScore,
> > Inc. (NASDAQ:SCOR)
> >
> > o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com
> > <ma...@comscore.com>
> >
> >
> >
> ...........................................................................................................
> >
> > Introducing Mobile Metrix 2.0 - The next generation of mobile
> > behavioral measurement www.comscore.com/MobileMetrix<
> > http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> > 2.0>
> >
> >
> >
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

RE: Counters from Python UDF

Posted by "Duckworth, Will" <wd...@comscore.com>.
This may be a better question for the DEV list but ... Is it even possible / feasible.  Could it be done by calling the Java classes from within Jython?

I guess I would ask the same about algebraic and accumulator UDF which I know are available in Ruby.

-----Original Message-----
From: Aniket Mokashi [mailto:aniket486@gmail.com] 
Sent: Friday, August 17, 2012 5:54 PM
To: user@pig.apache.org
Subject: Re: Counters from Python UDF

I dont think there is a way at this point. You may have to open a jira.

Thanks,
Aniket

On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <wd...@comscore.com>wrote:

> Has anyone poked around to see if there is there a way to create / 
> increment counters from a Python UDFs?  Thanks.
>
>
>
> Will Duckworth Senior Vice President, Software Engineering | comScore, 
> Inc. (NASDAQ:SCOR)
>
> o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com 
> <ma...@comscore.com>
>
>
> ...........................................................................................................
>
> Introducing Mobile Metrix 2.0 - The next generation of mobile 
> behavioral measurement www.comscore.com/MobileMetrix< 
> http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_
> 2.0>
>
>
>


--
"...:::Aniket:::... Quetzalco@tl"

Re: Counters from Python UDF

Posted by Aniket Mokashi <an...@gmail.com>.
I dont think there is a way at this point. You may have to open a jira.

Thanks,
Aniket

On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <wd...@comscore.com>wrote:

> Has anyone poked around to see if there is there a way to create /
> increment counters from a Python UDFs?  Thanks.
>
>
>
> Will Duckworth Senior Vice President, Software Engineering | comScore,
> Inc. (NASDAQ:SCOR)
>
> o +1 (703) 438-2108 | m +1 (301) 606-2977 | wduckworth@comscore.com
> <ma...@comscore.com>
>
>
> ...........................................................................................................
>
> Introducing Mobile Metrix 2.0 - The next generation of mobile behavioral
> measurement
> www.comscore.com/MobileMetrix<
> http://www.comscore.com/Products_Services/Product_Index/Mobile_Metrix_2.0>
>
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"