You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/09/25 02:59:26 UTC

Review Request: PIG-2904 Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7217/
-----------------------------------------------------------

Review request for pig and Julien Le Dem.


Description
-------

Currently, it is not possible to pass arguments to scripting udf's constructor via define statements. However, this will be useful to support function closure in scripting udfs. Take the following udf for example,

//-----

# udf that returns a closure
@outputSchema("log:double")
def logn(base):
    def log(x):
        return math.log(x, float(base))
    return log

//-----

With this patch, now we can do:

//-----

REGISTER 'scriptingudf.py' using jython as myfuncs;

DEFINE log2 myfuncs.logn('2');
DEFINE log10 myfuncs.logn('10');

out2 = FOREACH in GENERATE log2($0);
out10 = FOREACH in GENERATE log10($0);

//-----


Please note that constructor arguments specified in define statements are interpreted as closure parameters, so it is not valid to pass constructor arguments to a udf that does not return a closure. For example, the following script will throw an exception:

//-----

# udf that returns a closure
@outputSchema("log:double")
def log2(x):
    return math.log(x, 2)

//-----

DEFINE log2 myfuncs.log2('2');

//-----


The changes include:
- Update LogicalPlanGenerator grammar so that jython functions can be re-registered with constructor parameters in define statement.
- Change JythonFunction's constructor so that it can take a list of string parameters.
- Change JythonFunction's exec method so that it can obtain a closure when constructor parameters are present.
- Update Pig.java so that all this also works with embedded scripts.

Note that this patch only includes work for Jython. Work for other supported scripting languages is yet to be added.


This addresses bug PIG-2904.
    https://issues.apache.org/jira/browse/PIG-2904


Diffs
-----

  src/org/apache/pig/parser/LogicalPlanGenerator.g 9b9c099 
  src/org/apache/pig/scripting/Pig.java 1e4065f 
  src/org/apache/pig/scripting/jython/JythonFunction.java 15f936d 
  test/e2e/pig/tests/nightly.conf ebcd66e 
  test/e2e/pig/tests/turing_jython.conf d800d78 
  test/e2e/pig/udfs/python/scriptingudf.py 54129b6 

Diff: https://reviews.apache.org/r/7217/diff/


Testing
-------

Added 3 test cases to e2e test suite:
1) Test for UDF that returns a closure with closure parameters in define statements. (positive test)
2) Test for UDF that doesn't returns a closure with closure parameters in define statements. (negative test)
3) Test for define statements in embedded scripts.

Ran ant test-commit.
Ran relevant e2e tests: Scripting_*, Jython_*.


Thanks,

Cheolsoo Park


Re: Review Request: PIG-2904 Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by Julien Le Dem <ju...@ledem.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7217/#review12686
-----------------------------------------------------------


That's a pretty good start Cheolsoo.
My main comment is we should avoid to have to test specifically for JythonFunction in the LogicalPlanGenerator. Instead we should look into generalizing how UDFs are resolved.
Thanks for improving the scripting extension!


src/org/apache/pig/parser/LogicalPlanGenerator.g
<https://reviews.apache.org/r/7217/#comment27048>

    I think this logic should leave in the upper layer. The ScriptEngine should deal with this. The LogicPlanGenerator should not have to special case the JythonFunction.
    Possibly the we can change how the LogicalPlanGenerator looks up functions to make it more generic?



src/org/apache/pig/scripting/Pig.java
<https://reviews.apache.org/r/7217/#comment27049>

    make sure you escape correctly in case args contain quote for example.



src/org/apache/pig/scripting/Pig.java
<https://reviews.apache.org/r/7217/#comment27050>

    sb.append(";\n").toString()


- Julien Le Dem


On Sept. 25, 2012, 1:11 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7217/
> -----------------------------------------------------------
> 
> (Updated Sept. 25, 2012, 1:11 a.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> -------
> 
> Currently, it is not possible to pass arguments to scripting udf's constructor via define statements. However, this will be useful to support function closure in scripting udfs. Take the following udf for example,
> 
> //-----
> 
> # udf that returns a closure
> @outputSchema("log:double")
> def logn(base):
>     def log(x):
>         return math.log(x, float(base))
>     return log
> 
> //-----
> 
> With this patch, now we can do:
> 
> //-----
> 
> REGISTER 'scriptingudf.py' using jython as myfuncs;
> 
> DEFINE log2 myfuncs.logn('2');
> DEFINE log10 myfuncs.logn('10');
> 
> out2 = FOREACH in GENERATE log2($0);
> out10 = FOREACH in GENERATE log10($0);
> 
> //-----
> 
> 
> Please note that constructor arguments specified in define statements are interpreted as closure parameters, so it is not valid to pass constructor arguments to a udf that does not return a closure. For example, the following script will throw an exception:
> 
> //-----
> 
> # udf that doesn't return a closure
> @outputSchema("log:double")
> def log2(x):
>     return math.log(x, 2)
> 
> //-----
> 
> DEFINE log2 myfuncs.log2('2');
> 
> //-----
> 
> 
> The changes include:
> - Update LogicalPlanGenerator grammar so that jython functions can be re-registered with constructor parameters in define statement.
> - Change JythonFunction's constructor so that it can take a list of string parameters.
> - Change JythonFunction's exec method so that it can obtain a closure when constructor parameters are present.
> - Update Pig.java so that all this also works with embedded scripts.
> 
> Note that this patch only includes work for Jython. Work for other supported scripting languages is yet to be added.
> 
> 
> This addresses bug PIG-2904.
>     https://issues.apache.org/jira/browse/PIG-2904
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/parser/LogicalPlanGenerator.g 9b9c099 
>   src/org/apache/pig/scripting/Pig.java 1e4065f 
>   src/org/apache/pig/scripting/jython/JythonFunction.java 15f936d 
>   test/e2e/pig/tests/nightly.conf ebcd66e 
>   test/e2e/pig/tests/turing_jython.conf d800d78 
>   test/e2e/pig/udfs/python/scriptingudf.py 54129b6 
> 
> Diff: https://reviews.apache.org/r/7217/diff/
> 
> 
> Testing
> -------
> 
> Added 3 test cases to e2e test suite:
> 1) Test for UDF that returns a closure with closure parameters in define statements. (positive test)
> 2) Test for UDF that doesn't returns a closure with closure parameters in define statements. (negative test)
> 3) Test for define statements in embedded scripts.
> 
> Ran ant test-commit.
> Ran relevant e2e tests: Scripting_*, Jython_*.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request: PIG-2904 Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by Cheolsoo Park <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7217/
-----------------------------------------------------------

(Updated Sept. 25, 2012, 1:11 a.m.)


Review request for pig and Julien Le Dem.


Changes
-------

Fixed a typo in description.


Description (updated)
-------

Currently, it is not possible to pass arguments to scripting udf's constructor via define statements. However, this will be useful to support function closure in scripting udfs. Take the following udf for example,

//-----

# udf that returns a closure
@outputSchema("log:double")
def logn(base):
    def log(x):
        return math.log(x, float(base))
    return log

//-----

With this patch, now we can do:

//-----

REGISTER 'scriptingudf.py' using jython as myfuncs;

DEFINE log2 myfuncs.logn('2');
DEFINE log10 myfuncs.logn('10');

out2 = FOREACH in GENERATE log2($0);
out10 = FOREACH in GENERATE log10($0);

//-----


Please note that constructor arguments specified in define statements are interpreted as closure parameters, so it is not valid to pass constructor arguments to a udf that does not return a closure. For example, the following script will throw an exception:

//-----

# udf that doesn't return a closure
@outputSchema("log:double")
def log2(x):
    return math.log(x, 2)

//-----

DEFINE log2 myfuncs.log2('2');

//-----


The changes include:
- Update LogicalPlanGenerator grammar so that jython functions can be re-registered with constructor parameters in define statement.
- Change JythonFunction's constructor so that it can take a list of string parameters.
- Change JythonFunction's exec method so that it can obtain a closure when constructor parameters are present.
- Update Pig.java so that all this also works with embedded scripts.

Note that this patch only includes work for Jython. Work for other supported scripting languages is yet to be added.


This addresses bug PIG-2904.
    https://issues.apache.org/jira/browse/PIG-2904


Diffs
-----

  src/org/apache/pig/parser/LogicalPlanGenerator.g 9b9c099 
  src/org/apache/pig/scripting/Pig.java 1e4065f 
  src/org/apache/pig/scripting/jython/JythonFunction.java 15f936d 
  test/e2e/pig/tests/nightly.conf ebcd66e 
  test/e2e/pig/tests/turing_jython.conf d800d78 
  test/e2e/pig/udfs/python/scriptingudf.py 54129b6 

Diff: https://reviews.apache.org/r/7217/diff/


Testing
-------

Added 3 test cases to e2e test suite:
1) Test for UDF that returns a closure with closure parameters in define statements. (positive test)
2) Test for UDF that doesn't returns a closure with closure parameters in define statements. (negative test)
3) Test for define statements in embedded scripts.

Ran ant test-commit.
Ran relevant e2e tests: Scripting_*, Jython_*.


Thanks,

Cheolsoo Park