You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by xiaoqing gao <ga...@gmail.com> on 2022/11/07 08:41:47 UTC

CREATE GLOBAL FUNCTION works on all databases

Hi team!
When I execute the CREATE FUNCTION statement, It can only work on one
database that I specified.
I hope to support a feature when I execute the following statement, it can
work on all databases. The Syntax:
CREATE GLOBAL FUNCTION [IF NOT EXISTS] [db_name.]function_name([arg_type[,
arg_type...])
  RETURNS return_type
  LOCATION 'hdfs_path_to_dot_so'
  SYMBOL='symbol_name'

It'll need a default database named _impala_global. The global function
will be related to _impala_global.

Do you have any ideas?

Best Regards,
Xiaoqing Gao

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by Csaba Ringhofer <cs...@cloudera.com>.
Hi!

I also like the idea of fallback database for functions, it seems like a
fairly simple but very useful feature.
One thing I would consider is adding this as a query option instead of a
flag, but it is probably harder to implement, so I am ok with adding a flag
now, and possibly later adding a query option that overrides it.

Regards,
Csaba

On Wed, Nov 9, 2022 at 11:51 AM Johan du Plessis <jo...@gmail.com>
wrote:

> Hi,
>
> I think this is a good idea. It might allow the possibility to
> separate functions from the core of Impala and have "function packs" with
> their own release schedule and not depend on an upgrade to add those
> functions. E.g. imagine a "geometry function pack" that implements ST_xxxx
> functions. It will lower the barrier of entry and speed of development of
> additional functionality and will speed up adoption because there might not
> be any need to upgrade impala to get new functions.
>
> Regards,
> Johan du Plessis
>
>
>
> On Tue, 8 Nov 2022 at 08:32, Quanlong Huang <hu...@gmail.com>
> wrote:
>
> > Hi Xiaoqing,
> >
> > Thanks for raising this request! This requires creating a
> "_impala_global"
> > database in Hive when installing Impala, since each function is
> associated
> > with a db in HMS. Also need planner changes in resolving function names.
> >
> > Why not just create these "global" UDFs in a util db and use their fully
> > qualified names (<database>.<func>)? Queries won't be lengthy if a short
> db
> > name is used.
> >
> > Regards,
> > Quanlong
> >
> > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com> wrote:
> >
> > > Hi team!
> > > When I execute the CREATE FUNCTION statement, It can only work on one
> > > database that I specified.
> > > I hope to support a feature when I execute the following statement, it
> > can
> > > work on all databases. The Syntax:
> > > CREATE GLOBAL FUNCTION [IF NOT EXISTS]
> > [db_name.]function_name([arg_type[,
> > > arg_type...])
> > >   RETURNS return_type
> > >   LOCATION 'hdfs_path_to_dot_so'
> > >   SYMBOL='symbol_name'
> > >
> > > It'll need a default database named _impala_global. The global function
> > > will be related to _impala_global.
> > >
> > > Do you have any ideas?
> > >
> > > Best Regards,
> > > Xiaoqing Gao
> > >
> >
>

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by Johan du Plessis <jo...@gmail.com>.
Hi,

I think this is a good idea. It might allow the possibility to
separate functions from the core of Impala and have "function packs" with
their own release schedule and not depend on an upgrade to add those
functions. E.g. imagine a "geometry function pack" that implements ST_xxxx
functions. It will lower the barrier of entry and speed of development of
additional functionality and will speed up adoption because there might not
be any need to upgrade impala to get new functions.

Regards,
Johan du Plessis



On Tue, 8 Nov 2022 at 08:32, Quanlong Huang <hu...@gmail.com> wrote:

> Hi Xiaoqing,
>
> Thanks for raising this request! This requires creating a "_impala_global"
> database in Hive when installing Impala, since each function is associated
> with a db in HMS. Also need planner changes in resolving function names.
>
> Why not just create these "global" UDFs in a util db and use their fully
> qualified names (<database>.<func>)? Queries won't be lengthy if a short db
> name is used.
>
> Regards,
> Quanlong
>
> On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com> wrote:
>
> > Hi team!
> > When I execute the CREATE FUNCTION statement, It can only work on one
> > database that I specified.
> > I hope to support a feature when I execute the following statement, it
> can
> > work on all databases. The Syntax:
> > CREATE GLOBAL FUNCTION [IF NOT EXISTS]
> [db_name.]function_name([arg_type[,
> > arg_type...])
> >   RETURNS return_type
> >   LOCATION 'hdfs_path_to_dot_so'
> >   SYMBOL='symbol_name'
> >
> > It'll need a default database named _impala_global. The global function
> > will be related to _impala_global.
> >
> > Do you have any ideas?
> >
> > Best Regards,
> > Xiaoqing Gao
> >
>

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by xiaoqing gao <ga...@gmail.com>.
Hi Quanlong,

Yes, it can be understood this way.
Upper layers implement these udf in libimpala.so, because these udfs are
business-aligned functions. Implementing built-in functions in impala is
not appropriate. These functions are maintained by the upper layer.

When executed "use _impala_builtins; create function udf() returns string
location '/libimpala.so' symbol='xxx'" It will throw an exception "Cannot
modify system database".

I'll add a fallback db for resolving functions and add a jira in the hive.
Thanks for your help.

Regards,
Xiaoqing

Quanlong Huang <hu...@gmail.com> 于2022年11月9日周三 10:29写道:

> Hi Xiaoqing,
>
> Just curious, are they migrating from other systems to Impala? and those
> missing functions are built-in functions in that system? We can add those
> missing built-in functions in Impala as well.
>
> Regarding the code change, I think it's harmless to add a fallback db for
> resolving functions. This solution is more lightweight than introducing a
> global function type which might need design for new privileges.
>
> BTW, it'd be nice if Hive can add this feature too. So we don't introduce a
> new feature gap between Impala and Hive. Feel free to file JIRAs if there
> are no objections in this thread.
>
> Thanks,
> Quanlong
>
> On Tue, Nov 8, 2022 at 3:45 PM xiaoqing gao <ga...@gmail.com> wrote:
>
> > Hi Quanlong,
> >
> > Thanks for your advice. I think it's a good way.
> > But there were hundreds of queries at least persistenced in scripts. It's
> > unfriendly to let customers change queries. So we have no choice but to
> be
> > compatible.
> >
> > If I add a global flag, --global_function_database_name="util_db".
> > In
> >
> >
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126
> > First find the function function name in _impala_builtins, then find
> > function name in global_function_database_name, at last find in analyzer.
> > getDefaultDb().
> >
> > I test it works. What do you think?
> >
> > Regards,
> > Xiaoqing
> >
> >
> >
> > Quanlong Huang <hu...@gmail.com> 于2022年11月8日周二 14:32写道:
> >
> > > Hi Xiaoqing,
> > >
> > > Thanks for raising this request! This requires creating a
> > "_impala_global"
> > > database in Hive when installing Impala, since each function is
> > associated
> > > with a db in HMS. Also need planner changes in resolving function
> names.
> > >
> > > Why not just create these "global" UDFs in a util db and use their
> fully
> > > qualified names (<database>.<func>)? Queries won't be lengthy if a
> short
> > db
> > > name is used.
> > >
> > > Regards,
> > > Quanlong
> > >
> > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com>
> wrote:
> > >
> > > > Hi team!
> > > > When I execute the CREATE FUNCTION statement, It can only work on one
> > > > database that I specified.
> > > > I hope to support a feature when I execute the following statement,
> it
> > > can
> > > > work on all databases. The Syntax:
> > > > CREATE GLOBAL FUNCTION [IF NOT EXISTS]
> > > [db_name.]function_name([arg_type[,
> > > > arg_type...])
> > > >   RETURNS return_type
> > > >   LOCATION 'hdfs_path_to_dot_so'
> > > >   SYMBOL='symbol_name'
> > > >
> > > > It'll need a default database named _impala_global. The global
> function
> > > > will be related to _impala_global.
> > > >
> > > > Do you have any ideas?
> > > >
> > > > Best Regards,
> > > > Xiaoqing Gao
> > > >
> > >
> >
>

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by Quanlong Huang <hu...@gmail.com>.
Hi Xiaoqing,

Just curious, are they migrating from other systems to Impala? and those
missing functions are built-in functions in that system? We can add those
missing built-in functions in Impala as well.

Regarding the code change, I think it's harmless to add a fallback db for
resolving functions. This solution is more lightweight than introducing a
global function type which might need design for new privileges.

BTW, it'd be nice if Hive can add this feature too. So we don't introduce a
new feature gap between Impala and Hive. Feel free to file JIRAs if there
are no objections in this thread.

Thanks,
Quanlong

On Tue, Nov 8, 2022 at 3:45 PM xiaoqing gao <ga...@gmail.com> wrote:

> Hi Quanlong,
>
> Thanks for your advice. I think it's a good way.
> But there were hundreds of queries at least persistenced in scripts. It's
> unfriendly to let customers change queries. So we have no choice but to be
> compatible.
>
> If I add a global flag, --global_function_database_name="util_db".
> In
>
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126
> First find the function function name in _impala_builtins, then find
> function name in global_function_database_name, at last find in analyzer.
> getDefaultDb().
>
> I test it works. What do you think?
>
> Regards,
> Xiaoqing
>
>
>
> Quanlong Huang <hu...@gmail.com> 于2022年11月8日周二 14:32写道:
>
> > Hi Xiaoqing,
> >
> > Thanks for raising this request! This requires creating a
> "_impala_global"
> > database in Hive when installing Impala, since each function is
> associated
> > with a db in HMS. Also need planner changes in resolving function names.
> >
> > Why not just create these "global" UDFs in a util db and use their fully
> > qualified names (<database>.<func>)? Queries won't be lengthy if a short
> db
> > name is used.
> >
> > Regards,
> > Quanlong
> >
> > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com> wrote:
> >
> > > Hi team!
> > > When I execute the CREATE FUNCTION statement, It can only work on one
> > > database that I specified.
> > > I hope to support a feature when I execute the following statement, it
> > can
> > > work on all databases. The Syntax:
> > > CREATE GLOBAL FUNCTION [IF NOT EXISTS]
> > [db_name.]function_name([arg_type[,
> > > arg_type...])
> > >   RETURNS return_type
> > >   LOCATION 'hdfs_path_to_dot_so'
> > >   SYMBOL='symbol_name'
> > >
> > > It'll need a default database named _impala_global. The global function
> > > will be related to _impala_global.
> > >
> > > Do you have any ideas?
> > >
> > > Best Regards,
> > > Xiaoqing Gao
> > >
> >
>

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by xiaoqing gao <ga...@gmail.com>.
Hi Quanlong,

Thanks for your advice. I think it's a good way.
But there were hundreds of queries at least persistenced in scripts. It's
unfriendly to let customers change queries. So we have no choice but to be
compatible.

If I add a global flag, --global_function_database_name="util_db".
In
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126
First find the function function name in _impala_builtins, then find
function name in global_function_database_name, at last find in analyzer.
getDefaultDb().

I test it works. What do you think?

Regards,
Xiaoqing



Quanlong Huang <hu...@gmail.com> 于2022年11月8日周二 14:32写道:

> Hi Xiaoqing,
>
> Thanks for raising this request! This requires creating a "_impala_global"
> database in Hive when installing Impala, since each function is associated
> with a db in HMS. Also need planner changes in resolving function names.
>
> Why not just create these "global" UDFs in a util db and use their fully
> qualified names (<database>.<func>)? Queries won't be lengthy if a short db
> name is used.
>
> Regards,
> Quanlong
>
> On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com> wrote:
>
> > Hi team!
> > When I execute the CREATE FUNCTION statement, It can only work on one
> > database that I specified.
> > I hope to support a feature when I execute the following statement, it
> can
> > work on all databases. The Syntax:
> > CREATE GLOBAL FUNCTION [IF NOT EXISTS]
> [db_name.]function_name([arg_type[,
> > arg_type...])
> >   RETURNS return_type
> >   LOCATION 'hdfs_path_to_dot_so'
> >   SYMBOL='symbol_name'
> >
> > It'll need a default database named _impala_global. The global function
> > will be related to _impala_global.
> >
> > Do you have any ideas?
> >
> > Best Regards,
> > Xiaoqing Gao
> >
>

Re: CREATE GLOBAL FUNCTION works on all databases

Posted by Quanlong Huang <hu...@gmail.com>.
Hi Xiaoqing,

Thanks for raising this request! This requires creating a "_impala_global"
database in Hive when installing Impala, since each function is associated
with a db in HMS. Also need planner changes in resolving function names.

Why not just create these "global" UDFs in a util db and use their fully
qualified names (<database>.<func>)? Queries won't be lengthy if a short db
name is used.

Regards,
Quanlong

On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao <ga...@gmail.com> wrote:

> Hi team!
> When I execute the CREATE FUNCTION statement, It can only work on one
> database that I specified.
> I hope to support a feature when I execute the following statement, it can
> work on all databases. The Syntax:
> CREATE GLOBAL FUNCTION [IF NOT EXISTS] [db_name.]function_name([arg_type[,
> arg_type...])
>   RETURNS return_type
>   LOCATION 'hdfs_path_to_dot_so'
>   SYMBOL='symbol_name'
>
> It'll need a default database named _impala_global. The global function
> will be related to _impala_global.
>
> Do you have any ideas?
>
> Best Regards,
> Xiaoqing Gao
>