You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2012/09/04 19:37:07 UTC

[jira] [Created] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Julien Le Dem created PIG-2904:
----------------------------------

             Summary: Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
                 Key: PIG-2904
                 URL: https://issues.apache.org/jira/browse/PIG-2904
             Project: Pig
          Issue Type: New Feature
            Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2904:
-------------------------------

    Status: Open  (was: Patch Available)

Cancelling patch available since I need to incorporate Julien's comments.
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park reassigned PIG-2904:
----------------------------------

    Assignee: Cheolsoo Park
    
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2904:
-------------------------------

    Attachment: PIG-2904.patch

Attaching a patch that added this feature for Python UDFs.

Review board:
https://reviews.apache.org/r/7217/

I'd like to know whether this is a good approach or not. If this is OK with everyone, I will go ahead add the same support for other supported scripting languages.

Thanks!
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482495#comment-13482495 ] 

Cheolsoo Park commented on PIG-2904:
------------------------------------

Thank you very much for reviewing it Julien!

I agree with your comments. Please let me address them.
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2904:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450454#comment-13450454 ] 

Cheolsoo Park commented on PIG-2904:
------------------------------------

Hello Julien,

I have a question about this jira.

I've been reading Pig UDF code to understand why I need different DEFINEs in order to pass different constructor parameters to the AvroStorage constructor. What I found is that aliases are mapped with FuncSpec instances that store not only function names but also args specified in DEFINE statements. Later when aliases are expanded by LogicalPlanBuilder, function objects are instantiated from those FuncSpec instances, resulting that args specified in DEFINE statements are used to instantiate function objects instead of ones specified in LOAD statements.

My question is whether this jira is to solve the same problem or not. I am a bit confused because the title says "scripting UDFs", but I thought that scripting UDFs are EvalFuncs, and EvalFuncs take no parameters in their constructors. Please forgive me if I am misunderstanding something here. I am still learning Pig internal.

Thanks!
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457499#comment-13457499 ] 

Cheolsoo Park commented on PIG-2904:
------------------------------------

Given that the goal of this jira is to support function closures in scripting UDFs, here is how I'd like to implement this feature. (This is basically my understanding of Julien's proposal in PIG-928.)

Let's say we have a Python UDF 'logn' as follows:
{code:title=PyScript.py}
def logn(base):
    def log(x):
        return math.log(x, base)
    return log
{code}

Now in Pig script, I can define two aliases 'log2' and 'log10' as follows:
{code:title=PigScript.pig}
REGISTER PyScript.py USING jython AS py;
DEFINE log2 py.logn('2');
DEFINE log10 py.logn('10');
...
log_2 = FOREACH in GENERATE log2($0);   // log2($0) => math.log($0, 2)
log_10 = FOREACH in GENERATE log10($0); // log10($0) => math.log($0, 10)
{code}

To achieve that, I believe that the following work is required (Python only for now):
- Add a non-default constructor with varargs to JythonFunction: e.g. JythonFunction(String filename, String functionName, String...ctorargs)
- Update LogicalPlanGenerater/LogicalPlanBuilder so that FuncSpecs for scripting UDFs are built from DEFINE statements: currently, they are built directly from REGISTER statements using the default constructor, so it is not possible to pass any constructor parameters to them via DEFINE statements.
- Update the exec() method of JythonFunction so that if a function closure is given (i.e. the list of constructor args is not null), we first invoke the function (e.g. logn) to get the 2nd function (e.g. log2), and then invoke the 2nd function with parameters.

I am sure that there will be many more details that I need to sort out, but I believe that this is doable at least for Python. Once I complete implementing function closure support in Python, I am going to add it to other supported scripting languages too.

Please let me know if you have any concerns/questions.

Thanks!
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482485#comment-13482485 ] 

Julien Le Dem commented on PIG-2904:
------------------------------------

That's a pretty good start Cheolsoo.
My main comment is we should avoid to have to test specifically for JythonFunction in the LogicalPlanGenerator. Instead we should look into generalizing how UDFs are resolved.
Thanks for improving the scripting extension!
See: https://reviews.apache.org/r/7217/
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456878#comment-13456878 ] 

Cheolsoo Park commented on PIG-2904:
------------------------------------

Now I understand what this jira is about after reading old jiras (e.g. PIG-928).

For Java UDFs (including EvalFunc), it is possible to define a non-default constructor that takes parameters and pass args to it via DEFINE statements. For example,
{code}
REGISTER MyUDF.jar;
DEFINE func MyUDF.MyFunc('param');
...
foo = FOREACH bar GENERATE func($0); // The arg 'param' is available inside the func().
...
{code}

However, this is not possible with scripting UDFs because:
# Script UDF classes such as JythonFunction only define the default constructors with no parameters.
# There is currently no mechanism via which we can pass args from Java to scripting languages.

While #1 seems trivial, #2 seems more involved (e.g. passing args via function closures, etc). I am wondering whether anyone is working on this; if not, I'd like to give it a try.

Thanks!
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481885#comment-13481885 ] 

Julien Le Dem commented on PIG-2904:
------------------------------------

Hi Cheolsoo
I will look into this
                
> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-2904
>                 URL: https://issues.apache.org/jira/browse/PIG-2904
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Julien Le Dem
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira