You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Richard Lee (JIRA)" <ji...@apache.org> on 2009/02/27 01:56:01 UTC

[jira] Created: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

FunctionRegistry should allow loading UDFs and UDAFs from property file
-----------------------------------------------------------------------

                 Key: HIVE-309
                 URL: https://issues.apache.org/jira/browse/HIVE-309
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Richard Lee


FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  

FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions3.diff

re-applied patch to trunk.  this one should patch cleanly.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680992#action_12680992 ] 

Zheng Shao commented on HIVE-309:
---------------------------------

Hi Richard, sorry for the delay.
It seems the only call to registerUDF in your new patch is: registerUDF(parts[2], clazz, operatorType, false); 

This does not implement what you mentioned:
{code}
udf.IDENTIFIER=FUNCTIONNAME,CLASSNAME[,OPERATORTYPE,[ISOPERATOR,[DISPLAYNAME]]]
{code}

Does it?



> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694207#action_12694207 ] 

Richard Lee commented on HIVE-309:
----------------------------------

has anyone had a chance to look at the new patch?

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions-2.diff

here's a diff off the same revision of trunk as my last one...
This implements the function definition syntax that I described in my previous comment.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions5.diff

How embarrasing. I uploaded the same patch twice.   Too many version control systems...

This patch should have all of the files this time.

As far as the implementation debate, naturally i prefer my implementation to a metastore implementation... but I'm not totally against  a metastore based solution.  My only requirement is that adding/changing UDFs be easily automatable.  Essentially, we have a continuous integration system which produce distributions with a build name.  We "roll out" builds via a simple rollout script.... and at the moment all it does is modify a symlink which changes the directory HIVE_AUX_JARS_PATH points at to a specific build.   If we ever enable the web client, the rollout script will also be responsible for restarting the hive server.



> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681135#action_12681135 ] 

Richard Lee commented on HIVE-309:
----------------------------------

I would love to make the hive-udf.properties part of the hive configuration file, but the functions are presently defined in a static initializer which has no access to a HiveConf object.   That's why I went with a system property for defining what file to look in.



> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681072#action_12681072 ] 

Zheng Shao commented on HIVE-309:
---------------------------------

Had a second look. This looks OK to me now.
Will ping Ashish to see his comment since he worked on the last refactorization of UDF.


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions2.diff

Finally got a unit test working with loading UDFs via property files.  The test verifies that both UDF and UDAF mappings are picked up.

It seems like the classpath situation with junit/ant is a little different than regular execution... so the hive-udf.properties needed to be copied into the build directory where the unit tests are run from.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695603#action_12695603 ] 

Ashish Thusoo commented on HIVE-309:
------------------------------------

I see your use case Richard. I think a more reasonable approach - which Prasad mentioned  is that we could extend the UDF api to give out all the meta information about that particular UDF and use that to create the registration table. That way, jars are truely plug and play. To me creating a .properties is just creating another repository for metadata. I think we should either store the metadata with the object itself (as in the approach mentioned here) or for a repository use the metastore. Does this sound more reasonable Richard?


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694607#action_12694607 ] 

Namit Jain commented on HIVE-309:
---------------------------------

However, if a client has some particular UDF implementation, and wants to use that, he will be forced to add it to the server classpath
if we go with the metastore approach.

Will clients be able to add jar files to the server where the Hive server is running ?

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681566#action_12681566 ] 

Richard Lee commented on HIVE-309:
----------------------------------

I couldn't seem to get the hive-udf.properties to be picked up by the TestCliDriver.. I tried the properties file to the test.classpath, and explicitly including it into the test-udf.jar... but no love.

The contents of my properties file is:
udf.PREFIX.testfoo=testfoo,org.apache.hadoop.hive.ql.udf.UDFConcat
udaf.testbar=testbar,org.apache.hadoop.hive.ql.udf.UDAFSum

and the .q file which  should excersize these test functions contains:

SELECT testfoo(cast(src.key as string), src.value) FROM src WHERE src.key = 86;

SELECT testbar(src.key) FROM src;


If anyone has a suggestion as to how to modify the build process to pick up the properties file... i'd appreciate it :)

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695614#action_12695614 ] 

Richard Lee commented on HIVE-309:
----------------------------------

Adding a registration table with a per-jar registration sounds like a reasonable solution to me.

I think my only issue becomes timeframe. I'm assuming that this issue will get shelved until someone has time to implement the feature?   By next release? release after? I'd like to keep my hive installation as mainline as possible.  

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo reassigned HIVE-309:
----------------------------------

    Assignee: Richard Lee

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677428#action_12677428 ] 

Richard Lee commented on HIVE-309:
----------------------------------

I guess it depends on how complex the syntax should be.  I think I'd prefer to keep each function definition at one line.

What do you think about this:

udf.IDENTIFIER=FUNCTIONNAME,CLASSNAME[,OPERATORTYPE,[ISOPERATOR,[DISPLAYNAME]]]
udaf.IDENTIFIER=FUNCTIONNAME,CLASSNAME[,ISOPERATOR,[DISPLAYNAME]]

here, IDENTIFIER is only used to distinguish keys within the property file.  FUNCTIONNAME is pulled into the value so that non-word characters can be used (otherwise defining operators would be less useful).  The only two parts of the line that are required are FUNCTIONNAME and CLASSNAME.

As an example:
udf.logicaland=and,org.apache.hadoop.hive.ql.udf.UDFOPAnd,INFIX,true
udf.doubleampersand=&&,org.apache.hadoop.hive.ql.udf.UDFOPAnd,INFIX,true,and

for most other non-operator, PREFIX functions, you can simply say:
udf.month=month,org.apache.hadoop.hive.ql.udf.UDFMonth


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694286#action_12694286 ] 

Namit Jain commented on HIVE-309:
---------------------------------

I will take a look.

In the future, can you 'Submit Patch' also when you attach a file - that way, more people might know that you have a patch ready for review ?

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions-0.4.0.diff

patch updated to work on 0.4.0.  This should not be used after permanent function definitions in hiveQL are implemented.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-0.4.0.diff, hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694288#action_12694288 ] 

Namit Jain commented on HIVE-309:
---------------------------------

I got the following error while applying the patch:

[njain@dev119 trunk]$ patch -p0 < /tmp/hive.309.patch 
(Stripping trailing CRs from patch.)
patching file build-common.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/clientpositive/udf_from_properties.q.out
(Stripping trailing CRs from patch.)
patching file ql/src/test/hive-udf.properties
(Stripping trailing CRs from patch.)
patching file ql/src/test/queries/clientpositive/udf_from_properties.q
(Stripping trailing CRs from patch.)
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
Hunk #3 FAILED at 159.
1 out of 3 hunks FAILED -- saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java.rej


Can you refresh and resubmit the patch ?

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677258#action_12677258 ] 

Zheng Shao commented on HIVE-309:
---------------------------------

Thanks Richard.

It seems you also removed displayName, and set isOperator always to false.

Is it possible to keep those 2 parameters?


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694501#action_12694501 ] 

Ashish Thusoo commented on HIVE-309:
------------------------------------

The latest patch does not seem to have all the files as well.

Also I still have some reservations about this approach. The reasons are as follows:

- This approach works quite well as long as the CLI is used to run the queries, however, as soon as you go to other clients such as the web interface this approach will break down. The reason being that in the web client there is a central server running the queries being submitted by the user and so all the users are sharing the same udf.properties file and it will be impossible for the user to actually add his functions to this udf.properties file as he will not have access to the locations where the udf.properties files exists. In this case and in other cases where there are centralized servers that serve up the queries (something that is possible to some extent now with the JDBC drivers), this approach will break as the registerUDF function is only called on the udf side.

- Having said that, this approach does work well if you want to have private UDFs which you do not really share with other folks and you are using the current version of the HiveCli (though that may also be made more client serverish in the future).

To me though, storing all this meta information in the metastore through the 

create function .. command 

would be able to address both those issues...

Thoughts?


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694327#action_12694327 ] 

Namit Jain commented on HIVE-309:
---------------------------------

Your code changes look good - but can you add a few tests also.

You have ignored badly specified function names etc. - please add some good functions as you mentioned above and also some bad functions (which 
will just get ignored)

Once the tests are in, I will accept and commit the patch

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions.diff

This attachment modifies FunctionRegistry so that it looks for a SystemResource 'hive-udf.properties' which is overridable by setting org.apache.hadoop.hive.ql.exec.FunctionRegistry.propertyfile to the name of the resource to search for.  This allows placement of the resource anywhere in the classpath.

The hive-udf.properties file contains definitions for UDF and UDAFs in the following way:

udf.OPERATORTYPE.FUNCTIONNAME=CLASSNAME
udaf.OPERATORTYPE.FUNCTIONNAME=CLASSNAME

OPERATORTYPE is a string that is parsable by Enum.valueOf(FunctionInfo.OperatorType).
CLASSNAME is the fully qualified name of the function's implementation.

note that in UDAF, OPERATORTYPE isn't significant.

This patch was generated from trunk revision 748227

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677435#action_12677435 ] 

Richard Lee commented on HIVE-309:
----------------------------------

for udaf, the syntax should JUST be

udaf.IDENTIFIER=FUNCTIONNAME,CLASSNAME

since registerUDAF only takes 2 params.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769293#action_12769293 ] 

Richard Lee commented on HIVE-309:
----------------------------------

Upgraded to hive 0.4.0.  Will attach a massaged version of my UDF patch against 0.4.0.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694616#action_12694616 ] 

Prasad Chakka commented on HIVE-309:
------------------------------------

I don't think storing UDF jars in metastore is necessary. I like the scheme that Richard proposes since it is a light weight way. It is simple to add UDFs in HIVE CLI mode. If one wants to add UDFs to HiveServer, then the administrator can restart HiveServer. In future a command could be provided to copy the jar from client to server in a well known location that is not overwritten by a reinstall of the server. IMO, This functionality needn't provided in the same patch.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681136#action_12681136 ] 

Ashish Thusoo commented on HIVE-309:
------------------------------------

sorry to chime in a bit late on this but I was out on vaccation.

I think a better place to store this is the metastore as I think the issue here is to allow third parties to write their own udfs which are persistent. We do have the ability to create temporary UDF mappings. I am right in my understanding??


> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681131#action_12681131 ] 

Namit Jain commented on HIVE-309:
---------------------------------

Looks good - I had some minor comments:

Instead of making property file dependent on system property org.apache.hadoop.hive.ql.exec.FunctionRegistry.propertyfile, would it be a good idea to make it a 
configurable parameter with the default being hive-udf.properties in the configuration directory. The users can overwrite this file or add a new one somewhere else,
and point to the new file.

Also, it would be great it you can add a dummy property file with some UDF and UDAF for testing.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Lee updated HIVE-309:
-----------------------------

    Attachment: hive-external-functions4.diff

I forgot to svn add a couple files in my previous diff... so the test cases didn't make it in. this one should fix that.
added some garbage entries into the properties file to verify that broken entries get ignored.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681939#action_12681939 ] 

Richard Lee commented on HIVE-309:
----------------------------------

whoops. i accidentally named the 2nd attempt at a patch the same as my previous... please use the latest upload... the one from 2009-03-13 04:42 PM

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681432#action_12681432 ] 

Richard Lee commented on HIVE-309:
----------------------------------

I think that I would prefer to go the property file route. It's really nice to just add a jar to the classpath and have the functions defined there show up.  I guess i should modify the patch to iterate through all hive-udf.properties files encountered in case more than one jar or directory contains function definitions.

This essentially allows users to mix and match what they include by altering the HIVE_AUX_JARS_PATH. 

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-309) FunctionRegistry should allow loading UDFs and UDAFs from property file

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694700#action_12694700 ] 

Namit Jain commented on HIVE-309:
---------------------------------

Had a discussion with Ashish and Raghu about this.

Since, we already have 'create temporary function', the above approach may not give us anything new. It would be better to go with the
metastore approach. That means that the jar file needs to be copied to a well-known location in the server. Function metadata
needs to contain the location of the jar.

> FunctionRegistry should allow loading UDFs and UDAFs from property file
> -----------------------------------------------------------------------
>
>                 Key: HIVE-309
>                 URL: https://issues.apache.org/jira/browse/HIVE-309
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Richard Lee
>            Assignee: Richard Lee
>         Attachments: hive-external-functions-2.diff, hive-external-functions.diff, hive-external-functions2.diff, hive-external-functions3.diff, hive-external-functions4.diff, hive-external-functions5.diff
>
>
> FunctionRegistry.java hard code all UDF, UDAF definitions in a static initializer.  There is no way to add new functions without directly modifying this file.  
> FunctionRegistry SHOULD look for a property file in which new functions and their implementations can be specified.  This will allow third parties to extend hive without maintaining patches against the codebase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.