You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Furcy Pin <fu...@flaminem.com> on 2015/12/17 13:14:34 UTC

Discussion: permanent UDF with database name

Hi Hive users,

I would like to pursue the discussion that happened during the design of
the feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has
been implemented, some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find
it really annoying not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive
functions with the company's user-defined functions, either because some
useful generic function are missing in the built-in functions or to add
business-specific functions.

In the latter case, I understand very well the necessity of qualifying them
with a business-specific database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist
yet on the Hive version that we were currently running. To use it, all we
had to do was take the UDF's source code from a more recent version of
Hive, built it in a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a
"database_name.function_name", and use it as is. Hence, when upgrading we
need to rename everywhere "database_name.function_name" with
"function_name".

This is just an example, but I would like to emphasize the point that
sometimes we want to create permanent UDFs that are as global as built-in
UDFs and not bother if it is a built-in or user-defined function. As
someone pointed out in HIVE-6167's discussion, imagine if all the built-in
UDFs had to be called with "sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat
them?

Maybe it would make sense to create a feature request for being able to
specify a GLOBAL keyword when creating a permanent UDF, when we really want
it to be global?

What do you think?

Regards,

Furcy

Re: Discussion: permanent UDF with database name

Posted by "jipengzeng@meilishuo.com" <ji...@meilishuo.com>.
@ Furcy Pin
I agree you idea!
when i found after hive-0.13,user can define permanent UDF.but it must bind with database name.
so if we want to use the udf without database name,we must create it at all of the databases name.
it take another problem,when we create a new databases.we need get all of the udfs that we have been defined.
then create them one by one.
This is the biggest problem I have encountered in the use of.

jipengzeng



 
From: Furcy Pin
Date: 2015-12-17 20:14
To: user
Subject: Discussion: permanent UDF with database name
Hi Hive users,

I would like to pursue the discussion that happened during the design of the feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has been implemented, some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find it really annoying not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive functions with the company's user-defined functions, either because some useful generic function are missing in the built-in functions or to add business-specific functions.

In the latter case, I understand very well the necessity of qualifying them with a business-specific database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist yet on the Hive version that we were currently running. To use it, all we had to do was take the UDF's source code from a more recent version of Hive, built it in a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a "database_name.function_name", and use it as is. Hence, when upgrading we need to rename everywhere "database_name.function_name" with "function_name".

This is just an example, but I would like to emphasize the point that sometimes we want to create permanent UDFs that are as global as built-in UDFs and not bother if it is a built-in or user-defined function. As someone pointed out in HIVE-6167's discussion, imagine if all the built-in UDFs had to be called with "sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat them?

Maybe it would make sense to create a feature request for being able to specify a GLOBAL keyword when creating a permanent UDF, when we really want it to be global?

What do you think?

Regards,

Furcy