You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Bowen Li <bo...@gmail.com> on 2019/08/27 18:49:39 UTC

[DISCUSS] FLIP-57 - Rework FunctionCatalog

Hi folks,

I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
It's critically helpful to improve function usability in SQL.

https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing

In short, it:
- adds support for precise function reference with fully/partially
qualified name
- redefines function resolution order for ambiguous function reference
- adds support for Hive's rich built-in functions (support for Hive user
defined functions was already added in 1.9.0)
- clarifies the concept of temporary functions

Would love to hear your thoughts.

Bowen

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by JingsongLee <lz...@aliyun.com.INVALID>.
Thank you for your wonderful points.

I like timo's proposal to enrich built-in functions to flexible function
 modules (For example, the financial model is useful to bank system).
 But I agree with bowen, I don't think hive functions deserves be a
 function module. I think all function modules should be flink built-in
 functions. In this way, we can control their standardization, rather
 than some controversial functions like hive.

About Kurt's concern, yes, every addition of flink's function
 changes user behavior. But in the near future, we'll cover all
 of hive's functions (in the white list). So, if the final form does
 not have hive functions. sooner or later, this behavioral change will
 come. So do we need to let users choose?

Back to the goal of hive built-in, I always thought it was just an
 intermediate solution. Do we need to provide hive built-in
 functions mode to users in the future?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <yk...@gmail.com>
Send Time:2019年9月4日(星期三) 10:11
To:dev <de...@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
same
as Bowen's. But after thinking about it, I'm currently lean to Timo's
suggestion.

The reason is backward compatibility. If we follow Bowen's approach, let's
say we
first find function in Flink's built-in functions, and then hive's
built-in. For example, `foo`
is not supported by Flink, but hive has such built-in function. So user
will have hive's
behavior for function `foo`. And in next release, Flink realize this is a
very popular function
and add it into Flink's built-in functions, but with different behavior as
hive's. So in next
release, the behavior changes.

With Timo's approach, IIUC user have to tell the framework explicitly what
kind of
built-in functions he would like to use. He can just tell framework to
abandon Flink's built-in
functions, and use hive's instead. User can only choose between them, but
not use
them at the same time. I think this approach is more predictable.

Best,
Kurt


On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:

> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal] section
> in the google doc was updated, please take a look first and let me know if
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi Timo,
> >
> > Re> 1) We should not have the restriction "hive built-in functions can
> > only
> > > be used when current catalog is hive catalog". Switching a catalog
> > > should only have implications on the cat.db.object resolution but not
> > > functions. It would be quite convinient for users to use Hive built-ins
> > > even if they use a Confluent schema registry or just the in-memory
> > catalog.
> >
> > There might be a misunderstanding here.
> >
> > First of all, Hive built-in functions are not part of Flink built-in
> > functions, they are catalog functions, thus if the current catalog is
> not a
> > HiveCatalog but, say, a schema registry catalog, ambiguous functions
> > reference just shouldn't be resolved to a different catalog.
> >
> > Second, Hive built-in functions can potentially be referenced across
> > catalog, but it doesn't have db namespace and we currently just don't
> have
> > a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
> > e.g. "catalog::function", but it's out of scope of this FLIP.
> >
> > 2) I would propose to have separate concepts for catalog and built-in
> > functions. In particular it would be nice to modularize built-in
> > functions. Some built-in functions are very crucial (like AS, CAST,
> > MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
> > we add more experimental functions in the future or function for some
> > special application area (Geo functions, ML functions). A data platform
> > team might not want to make every built-in function available. Or a
> > function module like ML functions is in a different Maven module.
> >
> > I think this is orthogonal to this FLIP, especially we don't have the
> > "external built-in functions" anymore and currently the built-in function
> > category remains untouched.
> >
> > But just to share some thoughts on the proposal, I'm not sure about it:
> > - I don't know if any other databases handle built-in functions like
> that.
> > Maybe you can give some examples? IMHO, built-in functions are system
> info
> > and should be deterministic, not depending on loaded libraries. Geo
> > functions should be either built-in already or just libraries functions,
> > and library functions can be adapted to catalog APIs or of some other
> > syntax to use
> > - I don't know if all use cases stand, and many can be achieved by other
> > approaches too. E.g. experimental functions can be taken good care of by
> > documentations, annotations, etc
> > - the proposal basically introduces some concept like a pluggable
> built-in
> > function catalog, despite the already existing catalog APIs
> > - it brings in even more complicated scenarios to the design. E.g. how do
> > you handle built-in functions in different modules but different names?
> >
> > In short, I'm not sure if it really stands and it looks like an overkill
> > to me. I'd rather not go to that route. Related discussion can be on its
> > own thread.
> >
> > 3) Following the suggestion above, we can have a separate discovery
> > mechanism for built-in functions. Instead of just going through a static
> > list like in BuiltInFunctionDefinitions, a platform team should be able
> > to select function modules like
> > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > HiveFunctions) or via service discovery;
> >
> > Same as above. I'll leave it to its own thread.
> >
> > re > 3) Dawid and I discussed the resulution order again. I agree with
> > Kurt
> > > that we should unify built-in function (external or internal) under a
> > > common layer. However, the resolution order should be:
> > >   1. built-in functions
> > >   2. temporary functions
> > >   3. regular catalog resolution logic
> > > Otherwise a temporary function could cause clashes with Flink's
> built-in
> > > functions. If you take a look at other vendors, like SQL Server they
> > > also do not allow to overwrite built-in functions.
> >
> > ”I agree with Kurt that we should unify built-in function (external or
> > internal) under a common layer.“ <- I don't think this is what Kurt
> means.
> > Kurt and I are in favor of unifying built-in functions of external
> systems
> > and catalog functions. Did you type a mistake?
> >
> > Besides, I'm not sure about the resolution order you proposed. Temporary
> > functions have a lifespan over a session and are only visible to the
> > session owner, they are unique to each user, and users create them on
> > purpose to be the highest priority in order to overwrite system info
> > (built-in functions in this case).
> >
> > In your case, why would users name a temporary function the same as a
> > built-in function then? Since using that name in ambiguous function
> > reference will always be resolved to built-in functions, creating a
> > same-named temp function would be meaningless in the end.
> >
> >
> > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
> >
> >> Hi Jingsong,
> >>
> >> Re> 1.Hive built-in functions is an intermediate solution. So we should
> >> > not introduce interfaces to influence the framework. To make
> >> > Flink itself more powerful, we should implement the functions
> >> > we need to add.
> >>
> >> Yes, please see the doc.
> >>
> >> Re> 2.Non-flink built-in functions are easy for users to change their
> >> > behavior. If we support some flink built-in functions in the
> >> > future but act differently from non-flink built-in, this will lead to
> >> > changes in user behavior.
> >>
> >> There's no such concept as "external built-in functions" any more.
> >> Built-in functions of external systems will be treated as special
> catalog
> >> functions.
> >>
> >> Re> Another question is, does this fallback include all
> >> > hive built-in functions? As far as I know, some hive functions
> >> > have some hacky. If possible, can we start with a white list?
> >> > Once we implement some functions to flink built-in, we can
> >> > also update the whitelist.
> >>
> >> Yes, that's something we thought of too. I don't think it's super
> >> critical to the scope of this FLIP, thus I'd like to leave it to future
> >> efforts as a nice-to-have feature.
> >>
> >>
> >> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> Re: > What I want to propose is we can merge #3 and #4, make them both
> >>> under
> >>> >"catalog" concept, by extending catalog function to make it have
> >>> ability to
> >>> >have built-in catalog functions. Some benefits I can see from this
> >>> approach:
> >>> >1. We don't have to introduce new concept like external built-in
> >>> functions.
> >>> >Actually I don't see a full story about how to treat a built-in
> >>> functions, and it
> >>> >seems a little bit disrupt with catalog. As a result, you have to make
> >>> some restriction
> >>> >like "hive built-in functions can only be used when current catalog is
> >>> hive catalog".
> >>>
> >>> Yes, I've unified #3 and #4 but it seems I didn't update some part of
> >>> the doc. I've modified those sections, and they are up to date now.
> >>>
> >>> In short, now built-in function of external systems are defined as a
> >>> special kind of catalog function in Flink, and handled by Flink as
> >>> following:
> >>> - An external built-in function must be associated with a catalog for
> >>> the purpose of decoupling flink-table and external systems.
> >>> - It always resides in front of catalog functions in ambiguous function
> >>> reference order, just like in its own external system
> >>> - It is a special catalog function that doesn’t have a schema/database
> >>> namespace
> >>> - It goes thru the same instantiation logic as other user defined
> >>> catalog functions in the external system
> >>>
> >>> Please take another look at the doc, and let me know if you have more
> >>> questions.
> >>>
> >>>
> >>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> wrote:
> >>>
> >>>> Hi Kurt,
> >>>>
> >>>> it should not affect the functions and operations we currently have in
> >>>> SQL. It just categorizes the available built-in functions. It is kind
> >>>> of
> >>>> an orthogonal concept to the catalog API but built-in functions
> deserve
> >>>> this special kind of treatment. CatalogFunction still fits perfectly
> in
> >>>> there because the regular catalog object resolution logic is not
> >>>> affected. So tables and functions are resolved in the same way but
> with
> >>>> built-in functions that have priority as in the original design.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 03.09.19 15:26, Kurt Young wrote:
> >>>> > Does this only affect the functions and operations we currently have
> >>>> in SQL
> >>>> > and
> >>>> > have no effect on tables, right? Looks like this is an orthogonal
> >>>> concept
> >>>> > with Catalog?
> >>>> > If the answer are both yes, then the catalog function will be a
> weird
> >>>> > concept?
> >>>> >
> >>>> > Best,
> >>>> > Kurt
> >>>> >
> >>>> >
> >>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> The way you proposed are basically the same as what Calcite does, I
> >>>> think
> >>>> >> we are in the same line.
> >>>> >>
> >>>> >> Best,
> >>>> >> Danny Chan
> >>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> >>>> >>> This sounds exactly as the module approach I mentioned, no?
> >>>> >>>
> >>>> >>> Regards,
> >>>> >>> Timo
> >>>> >>>
> >>>> >>> On 03.09.19 13:42, Danny Chan wrote:
> >>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> >>>> >> refactoring to make our function usage more user friendly.
> >>>> >>>> For the topic of how to organize the builtin operators and
> >>>> operators
> >>>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is
> >>>> to make
> >>>> >> every dialect operators a “Library”, user can specify which
> >>>> libraries they
> >>>> >> want to use for a sql query. The builtin operators always comes as
> >>>> the
> >>>> >> first class objects and the others are used from the order they
> >>>> appears.
> >>>> >> Maybe you can take a reference.
> >>>> >>>> [1]
> >>>> >>
> >>>>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >>>> >>>> Best,
> >>>> >>>> Danny Chan
> >>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> >>>> >>>>> Hi folks,
> >>>> >>>>>
> >>>> >>>>> I'd like to kick off a discussion on reworking Flink's
> >>>> >> FunctionCatalog.
> >>>> >>>>> It's critically helpful to improve function usability in SQL.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>
> >>>>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >>>> >>>>> In short, it:
> >>>> >>>>> - adds support for precise function reference with
> fully/partially
> >>>> >>>>> qualified name
> >>>> >>>>> - redefines function resolution order for ambiguous function
> >>>> >> reference
> >>>> >>>>> - adds support for Hive's rich built-in functions (support for
> >>>> Hive
> >>>> >> user
> >>>> >>>>> defined functions was already added in 1.9.0)
> >>>> >>>>> - clarifies the concept of temporary functions
> >>>> >>>>>
> >>>> >>>>> Would love to hear your thoughts.
> >>>> >>>>>
> >>>> >>>>> Bowen
> >>>> >>>
> >>>>
> >>>>
>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Thanks all for your input!

I've updated FLIP-57 accordingly. To summarize the changes:

   - introduced new concept of "Temporary system functions", which has no
   namespace and override built-in functions
   - repositioned "temporary functions" to be those with namespaces and
   override catalog functions
   - updated FunctionCatalog APIs
   - redefined the ambiguous function resolution order to be:


   1. temporary system functions
      2. builtin functions
      3. temporary functions, of the current catalog/db
      4. catalog functions, in the current catalog/db

Since we've reached consensus on several most critical pieces of the FLIP,
I've started a separate voting thread on it.

Cheers,
Bowen

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Jark Wu <im...@gmail.com>.
"SYSTEM" sounds good to me too.

Best,
Jark

On Mon, 23 Sep 2019 at 19:04, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for CREATE TEMPORARY SYSTEM FUNCTION xxx
>
> Cheers, Fabian
>
> Am Sa., 21. Sept. 2019 um 06:58 Uhr schrieb Bowen Li <bowenli86@gmail.com
> >:
>
> > "SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
> > SQL function stack and won't actually involve any DDL, thus I will just
> > document the decision and we should keep it in mind when it's time to
> > implement the DDLs.
> >
> > I'm in the process of updating the FLIP to reflect changes required for
> > option #2, will send a new version for review soon.
> >
> >
> >
> > On Fri, Sep 20, 2019 at 4:02 PM Dawid Wysakowicz <dwysakowicz@apache.org
> >
> > wrote:
> >
> > > I also like the 'System' keyword. I think we can assume we reached
> > > consensus on this topic.
> > >
> > > On Sat, 21 Sep 2019, 06:37 Xuefu Z, <us...@gmail.com> wrote:
> > >
> > > > +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
> > > >
> > > > --Xuefu
> > > >
> > > > On Fri, Sep 20, 2019 at 3:28 PM Timo Walther <tw...@apache.org>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > sorry, for the late replay. I give also +1 for option #2. Thus, I
> > guess
> > > > > we have a clear winner.
> > > > >
> > > > > I would also like to find a better keyword/syntax for this
> statement.
> > > > > Esp. the BUILTIN keyword can confuse people, because it could be
> > > written
> > > > > as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> > > > > introduce a new reserved keyword in the parser which affects also
> > > > > non-DDL queries. How about:
> > > > >
> > > > > CREATE TEMPORARY SYSTEM FUNCTION xxx
> > > > >
> > > > > The SYSTEM keyword is already a reserved keyword and in FLIP-66 we
> > are
> > > > > discussing to prefix some of the function with a SYSTEM_ prefix
> like
> > > > > SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS
> > OF".
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 20.09.19 05:45, Bowen Li wrote:
> > > > > > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over
> > > "ALTER
> > > > > > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop
> > the
> > > > > > temporary built-in function in the same session? With the former
> > one,
> > > > > they
> > > > > > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With
> the
> > > > latter
> > > > > > one, I'm not sure how users can "restore" the original builtin
> > > function
> > > > > > easily from an "altered" function without introducing further
> > > > nonstandard
> > > > > > SQL syntax.
> > > > > >
> > > > > > Also please pardon me as I realized using net may not be a good
> > > idea...
> > > > > I'm
> > > > > > trying to fit this vote into cases listed in Flink Bylaw [1].
> > > > > >
> > > > > > >From the following result, the majority seems to be #2 too as it
> > has
> > > > the
> > > > > > most approval so far and doesn't have strong "-1".
> > > > > >
> > > > > > #1:3 (+1), 1 (0), 4(-1)
> > > > > > #2:4(0), 3 (+1), 1(+0.5)
> > > > > >         * Dawid -1/0 depending on keyword
> > > > > > #3:2(+1), 3(-1), 3(0)
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > > > >
> > > > > > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >>
> > > > > >> Thanks everyone for your votes. I summarized the result as
> > > following:
> > > > > >>
> > > > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > > > >> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
> > > > > >>          Dawid -1/0 depending on keyword
> > > > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > > > >>
> > > > > >> Given the result, I'd like to change my vote for #2 from 0 to
> +1,
> > to
> > > > > make
> > > > > >> it a stronger case with net +3.5. So the votes so far are:
> > > > > >>
> > > > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > > > >> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
> > > > > >>          Dawid -1/0 depending on keyword
> > > > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > > > >>
> > > > > >> What do you think? Do you think we can conclude with this
> result?
> > Or
> > > > > would
> > > > > >> you like to take it as a formal FLIP vote with 3 days voting
> > period?
> > > > > >>
> > > > > >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > > > BUILTIN
> > > > > >> FUNCTION xxx TEMPORARILY" because
> > > > > >> 1. the syntax is more consistent with "CREATE FUNCTION" and
> > "CREATE
> > > > > >> TEMPORARY FUNCTION"
> > > > > >> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a
> > > > built-in
> > > > > >> function but it actually doesn't, the logic only creates a temp
> > > > function
> > > > > >> with higher priority than that built-in function in ambiguous
> > > > resolution
> > > > > >> order; and it would behave inconsistently with "ALTER FUNCTION".
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <
> fhueske@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> I agree, it's very similar from the implementation point of
> view
> > > and
> > > > > the
> > > > > >>> implications.
> > > > > >>>
> > > > > >>> IMO, the difference is mostly on the mental model for the user.
> > > > > >>> Instead of having a special class of temporary functions that
> > have
> > > > > >>> precedence over builtin functions it suggests to temporarily
> > change
> > > > > >>> built-in functions.
> > > > > >>>
> > > > > >>> Fabian
> > > > > >>>
> > > > > >>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <
> > > > > ykt836@gmail.com
> > > > > >>>> :
> > > > > >>>> Hi Fabian,
> > > > > >>>>
> > > > > >>>> I think it's almost the same with #2 with different keyword:
> > > > > >>>>
> > > > > >>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Kurt
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <
> > fhueske@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>> Hi,
> > > > > >>>>>
> > > > > >>>>> I thought about it a bit more and think that there is some
> good
> > > > value
> > > > > >>> in
> > > > > >>>> my
> > > > > >>>>> last proposal.
> > > > > >>>>>
> > > > > >>>>> A lot of complexity comes from the fact that we want to allow
> > > > > >>> overriding
> > > > > >>>>> built-in functions which are differently addressed as other
> > > > functions
> > > > > >>>> (and
> > > > > >>>>> db objects).
> > > > > >>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the
> > > same
> > > > > >>> thing
> > > > > >>>> as
> > > > > >>>>> "CREATE FUNCTION" and treat both functions exactly the same
> > > except
> > > > > >>> that:
> > > > > >>>>> 1) temp functions disappear at the end of the session
> > > > > >>>>> 2) temp function are resolved before other functions
> > > > > >>>>>
> > > > > >>>>> This would be Dawid's proposal from the beginning of this
> > thread
> > > > (in
> > > > > >>> case
> > > > > >>>>> you still remember... ;-) )
> > > > > >>>>>
> > > > > >>>>> Temporarily overriding built-in functions would be supported
> > with
> > > > an
> > > > > >>>>> explicit command like
> > > > > >>>>>
> > > > > >>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> > > > > >>>>>
> > > > > >>>>> This would also address the concerns about accidentally
> > changing
> > > > the
> > > > > >>>>> semantics of built-in functions.
> > > > > >>>>> IMO, it can't get much more explicit than the above command.
> > > > > >>>>>
> > > > > >>>>> Sorry for bringing up a new option in the middle of the
> > > discussion,
> > > > > >>> but
> > > > > >>>> as
> > > > > >>>>> I said, I think it has a bunch of benefits and I don't see
> > major
> > > > > >>>> drawbacks
> > > > > >>>>> (maybe you do?).
> > > > > >>>>>
> > > > > >>>>> What do you think?
> > > > > >>>>>
> > > > > >>>>> Fabian
> > > > > >>>>>
> > > > > >>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > > > > >>>>> fhueske@gmail.com
> > > > > >>>>>> :
> > > > > >>>>>> Hi everyone,
> > > > > >>>>>>
> > > > > >>>>>> I thought again about option #1 and something that I don't
> > like
> > > is
> > > > > >>> that
> > > > > >>>>>> the resolved address of xyz is different in "CREATE FUNCTION
> > > xyz"
> > > > > >>> and
> > > > > >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> > > > > >>>>>> IMO, adding the keyword "TEMPORARY" should only change the
> > > > > >>> lifecycle of
> > > > > >>>>>> the function, but not where it is located. This implicitly
> > > changed
> > > > > >>>>> location
> > > > > >>>>>> might be confusing for users.
> > > > > >>>>>> After all, a temp function should behave pretty much like
> any
> > > > other
> > > > > >>>>>> function, except for the fact that it disappears when the
> > > session
> > > > is
> > > > > >>>>> closed.
> > > > > >>>>>> Approach #2 with the additional keyword would make that
> pretty
> > > > > >>> clear,
> > > > > >>>>> IMO.
> > > > > >>>>>> However, I neither like GLOBAL (for reasons mentioned by
> > Dawid)
> > > or
> > > > > >>>>> BUILDIN
> > > > > >>>>>> (we are not adding a built-in function).
> > > > > >>>>>> So I'd be OK with #2 if we find a good keyword. In fact,
> > > approach
> > > > #2
> > > > > >>>>> could
> > > > > >>>>>> also be an alias for approach #3 to avoid explicit
> > specification
> > > > of
> > > > > >>> the
> > > > > >>>>>> system catalog/db.
> > > > > >>>>>>
> > > > > >>>>>> Approach #3 would be consistent with other db objects and
> the
> > > > > >>> "CREATE
> > > > > >>>>>> FUNCTION" statement.
> > > > > >>>>>> Adding system catalog/db seems rather complex, but then
> again
> > > how
> > > > > >>> often
> > > > > >>>>> do
> > > > > >>>>>> we expect users to override built-in functions? If this
> > becomes
> > > a
> > > > > >>> major
> > > > > >>>>>> issue, we can still add option #2 as an alias.
> > > > > >>>>>>
> > > > > >>>>>> Not sure what's the best approach from an internal point of
> > > view,
> > > > > >>> but I
> > > > > >>>>>> certainly think that consistent behavior is important.
> > > > > >>>>>> Hence my votes are:
> > > > > >>>>>>
> > > > > >>>>>> -1 for #1
> > > > > >>>>>> 0 for #2
> > > > > >>>>>> 0 for #3
> > > > > >>>>>>
> > > > > >>>>>> Btw. Did we consider a completely separate command for
> > > overriding
> > > > > >>>>> built-in
> > > > > >>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS
> > ..."?
> > > > > >>>>>>
> > > > > >>>>>> Cheers, Fabian
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > > > >>>>>> <lz...@aliyun.com.invalid>:
> > > > > >>>>>>
> > > > > >>>>>>> I know Hive and Spark can shadow built-in functions by
> > > temporary
> > > > > >>>>> function.
> > > > > >>>>>>> Mysql, Oracle, Sql server can not shadow.
> > > > > >>>>>>> User can use full names to access functions instead of
> > > shadowing.
> > > > > >>>>>>>
> > > > > >>>>>>> So I think it is a completely new thing, and the direct way
> > to
> > > > deal
> > > > > >>>> with
> > > > > >>>>>>> new things is to add new grammar. So,
> > > > > >>>>>>> +1 for #2, +0 for #3, -1 for #1
> > > > > >>>>>>>
> > > > > >>>>>>> Best,
> > > > > >>>>>>> Jingsong Lee
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > ------------------------------------------------------------------
> > > > > >>>>>>> From:Kurt Young <yk...@gmail.com>
> > > > > >>>>>>> Send Time:2019年9月19日(星期四) 16:43
> > > > > >>>>>>> To:dev <de...@flink.apache.org>
> > > > > >>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > > > > >>>>>>>
> > > > > >>>>>>> And let me make my vote complete:
> > > > > >>>>>>>
> > > > > >>>>>>> -1 for #1
> > > > > >>>>>>> +1 for #2 with different keyword
> > > > > >>>>>>> -0 for #3
> > > > > >>>>>>>
> > > > > >>>>>>> Best,
> > > > > >>>>>>> Kurt
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <
> ykt836@gmail.com
> > >
> > > > > >>> wrote:
> > > > > >>>>>>>> Looks like I'm the only person who is willing to +1 to #2
> > for
> > > > now
> > > > > >>>> :-)
> > > > > >>>>>>>> But I would suggest to change the keyword from GLOBAL to
> > > > > >>>>>>>> something like BUILTIN.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I think #2 and #3 are almost the same proposal, just with
> > > > > >>> different
> > > > > >>>>>>>> format to indicate whether it want to override built-in
> > > > > >>> functions.
> > > > > >>>>>>>> My biggest reason to choose it is I want this behavior be
> > > > > >>> consistent
> > > > > >>>>>>>> with temporal tables. I will give some examples to show
> the
> > > > > >>> behavior
> > > > > >>>>>>>> and also make sure I'm not misunderstanding anything here.
> > > > > >>>>>>>>
> > > > > >>>>>>>> For most DBs, when user create a temporary table with:
> > > > > >>>>>>>>
> > > > > >>>>>>>> CREATE TEMPORARY TABLE t1
> > > > > >>>>>>>>
> > > > > >>>>>>>> It's actually equivalent with:
> > > > > >>>>>>>>
> > > > > >>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
> > > > > >>>>>>>>
> > > > > >>>>>>>> If user change current database, they will not be able to
> > > access
> > > > > >>> t1
> > > > > >>>>>>> without
> > > > > >>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
> > > > > >>> database
> > > > > >>>>> when
> > > > > >>>>>>>> this temporary table is created).
> > > > > >>>>>>>>
> > > > > >>>>>>>> Only #2 and #3 followed this behavior and I would vote for
> > > this
> > > > > >>>> since
> > > > > >>>>>>> this
> > > > > >>>>>>>> makes such behavior consistent through temporal tables and
> > > > > >>>> functions.
> > > > > >>>>>>>> Why I'm not voting for #3 is a special catalog and
> database
> > > just
> > > > > >>>> looks
> > > > > >>>>>>> very
> > > > > >>>>>>>> hacky to me. It gave a imply that our built-in functions
> > saved
> > > > > >>> at a
> > > > > >>>>>>>> special
> > > > > >>>>>>>> catalog and database, which is actually not. Introducing a
> > > > > >>> dedicated
> > > > > >>>>>>>> keyword
> > > > > >>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear
> and
> > > > > >>>>>>>> straightforward. One can argue that we should avoid
> > > introducing
> > > > > >>> new
> > > > > >>>>>>>> keyword,
> > > > > >>>>>>>> but it's also very rare that a system can overwrite
> built-in
> > > > > >>>>> functions.
> > > > > >>>>>>>> Since we
> > > > > >>>>>>>> decided to support this, introduce a new keyword is not a
> > big
> > > > > >>> deal
> > > > > >>>>> IMO.
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Kurt
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> > > > > >>> piotr@ververica.com
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> It is a quite long discussion to follow and I hope I
> didn’t
> > > > > >>>>>>> misunderstand
> > > > > >>>>>>>>> anything. From the proposals presented by Xuefu I would
> > vote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> -1 for #1 and #2
> > > > > >>>>>>>>> +1 for #3
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Besides #3 being IMO more general and more consistent,
> > having
> > > > > >>>>> qualified
> > > > > >>>>>>>>> names (#3) would help/make easier for someone to use
> cross
> > > > > >>>>>>>>> databases/catalogs queries (joining multiple data
> > > > sets/streams).
> > > > > >>>> For
> > > > > >>>>>>>>> example with some functions to manipulate/clean
> up/convert
> > > the
> > > > > >>>> stored
> > > > > >>>>>>> data
> > > > > >>>>>>>>> in different catalogs registered in the respective
> > catalogs.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Piotrek
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com>
> > wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I agree with Xuefu that inconsistent handling with all
> the
> > > > > >>> other
> > > > > >>>>>>>>> objects is
> > > > > >>>>>>>>>> not a big problem.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Regarding to option#3, the special "system.system"
> > namespace
> > > > > >>> may
> > > > > >>>>>>> confuse
> > > > > >>>>>>>>>> users.
> > > > > >>>>>>>>>> Users need to know the set of built-in function names to
> > > know
> > > > > >>>> when
> > > > > >>>>> to
> > > > > >>>>>>>>> use
> > > > > >>>>>>>>>> "system.system" namespace.
> > > > > >>>>>>>>>> What will happen if user registers a non-builtin
> function
> > > name
> > > > > >>>>> under
> > > > > >>>>>>> the
> > > > > >>>>>>>>>> "system.system" namespace?
> > > > > >>>>>>>>>> Besides, I think it doesn't solve the "explode" problem
> I
> > > > > >>>> mentioned
> > > > > >>>>>>> at
> > > > > >>>>>>>>> the
> > > > > >>>>>>>>>> beginning of this thread.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> So here is my vote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> +1 for #1
> > > > > >>>>>>>>>> 0 for #2
> > > > > >>>>>>>>>> -1 for #3
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Jark
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <
> usxuefu@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>>>>> @Dawid, Re: we also don't need additional referencing
> the
> > > > > >>>>>>>>> specialcatalog
> > > > > >>>>>>>>>>> anywhere.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> True. But once we allow such reference, then user can
> do
> > so
> > > > > >>> in
> > > > > >>>> any
> > > > > >>>>>>>>> possible
> > > > > >>>>>>>>>>> place where a function name is expected, for which we
> > have
> > > to
> > > > > >>>>>>> handle.
> > > > > >>>>>>>>>>> That's a big difference, I think.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>> Xuefu
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > > > > >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> @Bowen I am not suggesting introducing additional
> > > catalog. I
> > > > > >>>>> think
> > > > > >>>>>>> we
> > > > > >>>>>>>>>>> need
> > > > > >>>>>>>>>>>> to get rid of the current built-in catalog.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
> > > > > >>> referencing
> > > > > >>>> the
> > > > > >>>>>>>>> special
> > > > > >>>>>>>>>>>> catalog anywhere else besides in the CREATE statement.
> > The
> > > > > >>>>>>> resolution
> > > > > >>>>>>>>>>>> behaviour is exactly the same in both options.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <
> usxuefu@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>>>>>>> Hi Dawid,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
> > > > > >>>> approach.
> > > > > >>>>> It
> > > > > >>>>>>>>> can
> > > > > >>>>>>>>>>> be
> > > > > >>>>>>>>>>>>> changed to something else for better.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> The difference between this and the #3 approach is
> that
> > > we
> > > > > >>>> only
> > > > > >>>>>>> need
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>> keyword for this create DDL. For other places (such
> as
> > > > > >>>> function
> > > > > >>>>>>>>>>>>> referencing), no keyword or special namespace is
> > needed.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>> Xuefu
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > > > >>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > > >>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>>>> I think it makes sense to start voting at this
> point.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Option 1: Only 1-part identifiers
> > > > > >>>>>>>>>>>>>> PROS:
> > > > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > > > >>>>>>>>>>>>>> CONS:
> > > > > >>>>>>>>>>>>>> - incosistent with all the other objects, both
> > > permanent &
> > > > > >>>>>>> temporary
> > > > > >>>>>>>>>>>>>> - does not allow shadowing catalog functions
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Option 2: Special keyword for built-in function
> > > > > >>>>>>>>>>>>>> I think this is quite similar to the special
> > catalog/db.
> > > > > >>> The
> > > > > >>>>>>> thing I
> > > > > >>>>>>>>>>> am
> > > > > >>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL
> > keyword.
> > > > > >>> This
> > > > > >>>>>>>>> keyword
> > > > > >>>>>>>>>>>>> has a
> > > > > >>>>>>>>>>>>>> meaning in rdbms systems and means a function that
> is
> > > > > >>> present
> > > > > >>>>>>> for a
> > > > > >>>>>>>>>>>>>> lifetime of a session in which it was created, but
> > > > > >>> available
> > > > > >>>> in
> > > > > >>>>>>> all
> > > > > >>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
> > > > > >>> keyword
> > > > > >>>> in
> > > > > >>>>> a
> > > > > >>>>>>>>>>>>> different
> > > > > >>>>>>>>>>>>>> context.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Option 3: Special catalog/db
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> PROS:
> > > > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > > > >>>>>>>>>>>>>> - allows shadowing catalog functions
> > > > > >>>>>>>>>>>>>> - consistent with other objects
> > > > > >>>>>>>>>>>>>> CONS:
> > > > > >>>>>>>>>>>>>> - we introduce a special namespace for built-in
> > > functions
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I don't see a problem with introducing the special
> > > > > >>> namespace.
> > > > > >>>>> In
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>> end
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>> is very similar to the keyword approach. In this
> case
> > > the
> > > > > >>>>>>> catalog/db
> > > > > >>>>>>>>>>>>>> combination would be the "keyword"
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Therefore my votes:
> > > > > >>>>>>>>>>>>>> Option 1: -0
> > > > > >>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up
> > > with
> > > > > >>> a
> > > > > >>>>>>> better
> > > > > >>>>>>>>>>>>> keyword)
> > > > > >>>>>>>>>>>>>> Option 3: +1
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>>> Dawid
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <
> > usxuefu@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>>>>>>>>>>>> Hi Aljoscha,
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thanks for the summary and these are great
> questions
> > to
> > > > > >>> be
> > > > > >>>>>>>>>>> answered.
> > > > > >>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>> answer to your first question is clear: there is a
> > > > > >>> general
> > > > > >>>>>>>>>>> agreement
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> override built-in functions with temp functions.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> However, your second and third questions are sort
> of
> > > > > >>>> related,
> > > > > >>>>>>> as a
> > > > > >>>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>> reference can be either just function name (like
> > > "func")
> > > > > >>> or
> > > > > >>>> in
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>> form
> > > > > >>>>>>>>>>>>>> or
> > > > > >>>>>>>>>>>>>>> "cat.db.func". When a reference is just function
> > name,
> > > it
> > > > > >>>> can
> > > > > >>>>>>> mean
> > > > > >>>>>>>>>>>>>> either a
> > > > > >>>>>>>>>>>>>>> built-in function or a function defined in the
> > current
> > > > > >>>> cat/db.
> > > > > >>>>>>> If
> > > > > >>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>> support overriding a built-in function with a temp
> > > > > >>> function,
> > > > > >>>>>>> such
> > > > > >>>>>>>>>>>>>>> overriding can also cover a function in the current
> > > > > >>> cat/db.
> > > > > >>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
> > > > > >>>> function"
> > > > > >>>>>>>>>>> means a
> > > > > >>>>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a
> catalog
> > > > > >>>> function
> > > > > >>>>>>>>>>> "func"
> > > > > >>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support
> > this,
> > > > > >>> temp
> > > > > >>>>>>>>>>> function
> > > > > >>>>>>>>>>>>> has
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that
> the
> > > 2nd
> > > > > >>>> and
> > > > > >>>>>>> 3rd
> > > > > >>>>>>>>>>>>>> questions
> > > > > >>>>>>>>>>>>>>> are related. The problem with such support is the
> > > > > >>> ambiguity
> > > > > >>>>> when
> > > > > >>>>>>>>>>> user
> > > > > >>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> > > > > >>> FUNCTION
> > > > > >>>>>>> func
> > > > > >>>>>>>>>>>> ...".
> > > > > >>>>>>>>>>>>>>> Here "func" can means a global temp function, or a
> > temp
> > > > > >>>>>>> function in
> > > > > >>>>>>>>>>>>>> current
> > > > > >>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates
> an
> > > > > >>>>>>> inconsistency
> > > > > >>>>>>>>>>>>>> because
> > > > > >>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
> > > > > >>> current
> > > > > >>>>>>> cat/db.
> > > > > >>>>>>>>>>>> If
> > > > > >>>>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>> assume the latter, then there is no way for user to
> > > > > >>> create a
> > > > > >>>>>>> global
> > > > > >>>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>> function.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Giving a special namespace for built-in functions
> may
> > > > > >>> solve
> > > > > >>>>> the
> > > > > >>>>>>>>>>>>> ambiguity
> > > > > >>>>>>>>>>>>>>> problem above, but it also introduces artificial
> > > > > >>>>>>> catalog/database
> > > > > >>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness
> of
> > > > > >>> the
> > > > > >>>>>>> code. I
> > > > > >>>>>>>>>>>>> would
> > > > > >>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the
> > problem,
> > > > > >>> like
> > > > > >>>>>>> "CREATE
> > > > > >>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate
> proposals
> > > for
> > > > > >>>>> voting
> > > > > >>>>>>>>>>>>>> purposes:
> > > > > >>>>>>>>>>>>>>> 1. Support only global, temporary functions without
> > > > > >>>> namespace.
> > > > > >>>>>>> Such
> > > > > >>>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>> functions overrides built-in functions and catalog
> > > > > >>> functions
> > > > > >>>>> in
> > > > > >>>>>>>>>>>> current
> > > > > >>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
> > > > > >>> built-in
> > > > > >>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>> ->
> > > > > >>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
> > > > > >>> functions
> > > > > >>>> has
> > > > > >>>>>>> no
> > > > > >>>>>>>>>>>>>>> ambiguity!)
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> 2. In addition to #1, support creating and
> > referencing
> > > > > >>>>> temporary
> > > > > >>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in
> > DDL
> > > > > >>> for
> > > > > >>>>>>> global
> > > > > >>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>> functions. The resolution order is: global temp
> > > > > >>> functions ->
> > > > > >>>>>>>>>>> built-in
> > > > > >>>>>>>>>>>>>>> functions -> temp functions in current cat/db ->
> > > catalog
> > > > > >>>>>>> function.
> > > > > >>>>>>>>>>>>>>> (Resolution for partially or fully qualified
> function
> > > > > >>>>> reference
> > > > > >>>>>>> is:
> > > > > >>>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>> functions -> persistent functions.)
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> 3. In addition to #1, support creating and
> > referencing
> > > > > >>>>> temporary
> > > > > >>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>> associated with a cat/db with a special namespace
> for
> > > > > >>>> built-in
> > > > > >>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>> and global temp functions. The resolution is the
> same
> > > as
> > > > > >>> #2,
> > > > > >>>>>>> except
> > > > > >>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>> the special namespace might be prefixed to a
> > reference
> > > > > >>> to a
> > > > > >>>>>>>>>>> built-in
> > > > > >>>>>>>>>>>>>>> function or global temp function. (In absence of
> the
> > > > > >>> special
> > > > > >>>>>>>>>>>> namespace,
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> resolution order is the same as in #2.)
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> My personal preference is #1, given the unknown use
> > > case
> > > > > >>> and
> > > > > >>>>>>>>>>>> introduced
> > > > > >>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an
> > acceptable
> > > > > >>>>>>> alternative.
> > > > > >>>>>>>>>>>>> Thus,
> > > > > >>>>>>>>>>>>>>> my votes are:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> +1 for #1
> > > > > >>>>>>>>>>>>>>> +0 for #2
> > > > > >>>>>>>>>>>>>>> -1 for #3
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
> > > > > >>> please!),
> > > > > >>>> or
> > > > > >>>>>>> let
> > > > > >>>>>>>>>>> me
> > > > > >>>>>>>>>>>>> know
> > > > > >>>>>>>>>>>>>>> if you have more questions or other candidates.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>> Xuefu
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > > > > >>>>>>>>>>>> aljoscha@apache.org>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64
> are
> > > very
> > > > > >>>>>>>>>>> connected.
> > > > > >>>>>>>>>>>>> To
> > > > > >>>>>>>>>>>>>>>> resolve the differences, think we have to think
> > about
> > > > > >>> the
> > > > > >>>>> basic
> > > > > >>>>>>>>>>>>>>> principles
> > > > > >>>>>>>>>>>>>>>> and find consensus there. The basic questions I
> see
> > > are:
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> - Do we want to support overriding builtin
> > functions?
> > > > > >>>>>>>>>>>>>>>> - Do we want to support overriding catalog
> > functions?
> > > > > >>>>>>>>>>>>>>>> - And then later: should temporary functions be
> tied
> > > to
> > > > > >>> a
> > > > > >>>>>>>>>>>>>>>> catalog/database?
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I don’t have much to say about these, except that
> we
> > > > > >>> should
> > > > > >>>>>>>>>>>> somewhat
> > > > > >>>>>>>>>>>>>>> stick
> > > > > >>>>>>>>>>>>>>>> to what the industry does. But I also understand
> > that
> > > > > >>> the
> > > > > >>>>>>>>>>> industry
> > > > > >>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>> already very divided on this.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>>>>> Aljoscha
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <
> > imjark@gmail.com
> > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the
> > remaining
> > > > > >>>> topics.
> > > > > >>>>>>> We
> > > > > >>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if
> > we
> > > > > >>>> resume
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>>> topic
> > > > > >>>>>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>>> time later.
> > > > > >>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with
> > Timo’s
> > > > > >>>>>>>>>>>> “cat.db.fun”
> > > > > >>>>>>>>>>>>>> way
> > > > > >>>>>>>>>>>>>>>> to override a catalog function.
> > > > > >>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it
> > > introduces a
> > > > > >>>>>>>>>>>> nonexistent
> > > > > >>>>>>>>>>>>>> cat
> > > > > >>>>>>>>>>>>>>>> & db? And we still need to do special treatment
> for
> > > the
> > > > > >>>>>>> dedicated
> > > > > >>>>>>>>>>>>>>>> system.system cat & db?
> > > > > >>>>>>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>>>>>> Jark
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <
> > twalthr@apache.org
> > > >
> > > > > >>> 写道:
> > > > > >>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many
> > things
> > > > > >>>>>>>>>>>>> incrementally.
> > > > > >>>>>>>>>>>>>>>> Users should be able to override all catalog
> objects
> > > > > >>>>>>> consistently
> > > > > >>>>>>>>>>>>>>> according
> > > > > >>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> > > > > >>> module).
> > > > > >>>>> If
> > > > > >>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>>> are treated completely different, we need more
> code
> > > and
> > > > > >>>>> special
> > > > > >>>>>>>>>>>>> cases.
> > > > > >>>>>>>>>>>>>>> From
> > > > > >>>>>>>>>>>>>>>> an implementation perspective, this topic only
> > affects
> > > > > >>> the
> > > > > >>>>>>> lookup
> > > > > >>>>>>>>>>>>> logic
> > > > > >>>>>>>>>>>>>>>> which is rather low implementation effort which is
> > > why I
> > > > > >>>>> would
> > > > > >>>>>>>>>>> like
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have
> a
> > > > > >>> slight
> > > > > >>>>>>>>>>> consenus
> > > > > >>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>> overriding built-in functions; we should also
> strive
> > > for
> > > > > >>>>>>> reaching
> > > > > >>>>>>>>>>>>>>> consensus
> > > > > >>>>>>>>>>>>>>>> on the remaining topics.
> > > > > >>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures
> registering
> > > > > >>>> catalog
> > > > > >>>>>>>>>>>> objects
> > > > > >>>>>>>>>>>>>>>> consistent and the overriding of built-in
> functions
> > > more
> > > > > >>>>>>>>>>> explicit.
> > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>> Timo
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > > > > >>>>>>>>>>>>>>>>>>> hi, everyone
> > > > > >>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it
> supports
> > > > > >>>>> functions
> > > > > >>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing
> > the
> > > > > >>>>>>>>>>> duplication
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>> functions.
> > > > > >>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
> > > > > >>> implements
> > > > > >>>>>>>>>>> create
> > > > > >>>>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata
> and
> > > > > >>> schema
> > > > > >>>>> into
> > > > > >>>>>>>>>>>>> mysql,
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes
> > sql-client
> > > to
> > > > > >>>>>>> support
> > > > > >>>>>>>>>>>>>> custom
> > > > > >>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function
> > is
> > > > > >>>>> currently
> > > > > >>>>>>>>>>>>>> global,
> > > > > >>>>>>>>>>>>>>>> and is
> > > > > >>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in
> > the
> > > > > >>>>>>>>>>> development
> > > > > >>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the
> > > community,
> > > > > >>> but
> > > > > >>>>>>>>>>> found
> > > > > >>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>> difficult to join.
> > > > > >>>>>>>>>>>>>>>>>>> thank you.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二
> > > 上午11:19写道:
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> It seems to me that there is a general
> consensus
> > > on
> > > > > >>>>> having
> > > > > >>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
> > > > > >>>> functions.
> > > > > >>>>>>>>>>> (As
> > > > > >>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> side
> > > > > >>>>>>>>>>>>>>>> note
> > > > > >>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
> > > > > >>> functions
> > > > > >>>> are
> > > > > >>>>>>>>>>> all
> > > > > >>>>>>>>>>>>>>>> temporary and
> > > > > >>>>>>>>>>>>>>>>>>>> having no namespaces.)
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of
> having
> > > > > >>>>> namespaced
> > > > > >>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a
> > specific
> > > > > >>>>> cat/db.
> > > > > >>>>>>>>>>>>>> However,
> > > > > >>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can
> be
> > > > > >>> added
> > > > > >>>>>>>>>>>>>>> incrementally.
> > > > > >>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced
> temp
> > > > > >>>>> functions
> > > > > >>>>>>>>>>> now
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> leave
> > > > > >>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
> > > > > >>> releases as
> > > > > >>>>> the
> > > > > >>>>>>>>>>>>>>>> requirement
> > > > > >>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps
> > shorten
> > > > > >>> the
> > > > > >>>>>>>>>>> debate
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> allow us
> > > > > >>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated
> cat/db
> > to
> > > > > >>> host
> > > > > >>>>> the
> > > > > >>>>>>>>>>>>>>> temporary
> > > > > >>>>>>>>>>>>>>>> temp
> > > > > >>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
> > > > > >>> concern
> > > > > >>>> is
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>>>> special
> > > > > >>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
> > > > > >>> clean, as
> > > > > >>>>>>>>>>>> evident
> > > > > >>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>> treating
> > > > > >>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>> Xuefiu
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid
> > Wysakowicz <
> > > > > >>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
> > > > > >>> suggestion.
> > > > > >>>>> How
> > > > > >>>>>>>>>>>> about
> > > > > >>>>>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>>> have a
> > > > > >>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for
> > > built-in
> > > > > >>>>>>>>>>> objects?
> > > > > >>>>>>>>>>>>> This
> > > > > >>>>>>>>>>>>>>>> catalog
> > > > > >>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
> > > > > >>> suggesting.
> > > > > >>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
> > > > > >>> functions, if
> > > > > >>>>>>> they
> > > > > >>>>>>>>>>>>> fully
> > > > > >>>>>>>>>>>>>>>> qualify
> > > > > >>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by
> > > default
> > > > > >>> the
> > > > > >>>>>>>>>>> common
> > > > > >>>>>>>>>>>>>> logic
> > > > > >>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > > > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat &
> > dB
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func
> > ...
> > > > > >>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary
> > > function
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be
> > > writable
> > > > > >>>> for
> > > > > >>>>>>>>>>>>> permanent
> > > > > >>>>>>>>>>>>>>>>>>>> objects.
> > > > > >>>>>>>>>>>>>>>>>>>>> WDYT?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
> > > > > >>>> solutions.
> > > > > >>>>>>>>>>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>>>>>>>>>> Dawid
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > > > > >>>>>>>>>>> twalthr@apache.org
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of
> > overriding
> > > > > >>>>> certain
> > > > > >>>>>>>>>>>>>> built-in
> > > > > >>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if
> many
> > > > > >>> people
> > > > > >>>>>>>>>>> agree.
> > > > > >>>>>>>>>>>>>>>> However, it
> > > > > >>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding
> > > catalog
> > > > > >>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a
> > > query
> > > > > >>>> even
> > > > > >>>>>>>>>>>> though
> > > > > >>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available
> > > currently
> > > > > >>> or
> > > > > >>>>>>>>>>> should
> > > > > >>>>>>>>>>>>> not
> > > > > >>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both
> cases?
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > > > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and
> > > never
> > > > > >>>>>>>>>>>> consideres
> > > > > >>>>>>>>>>>>>>>> current
> > > > > >>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with
> other
> > > DDL
> > > > > >>> but
> > > > > >>>>>>>>>>>>> acceptable
> > > > > >>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>> functions I guess.
> > > > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other
> built-in
> > > > > >>>> objects
> > > > > >>>>>>>>>>>>> (tables,
> > > > > >>>>>>>>>>>>>>>> views)
> > > > > >>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the
> > near
> > > > > >>>>> future.
> > > > > >>>>>>>>>>>> Take
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > https://issues.apache.org/jira/browse/FLINK-13900
> > > > > >>> as
> > > > > >>>>> an
> > > > > >>>>>>>>>>>>>> example.
> > > > > >>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>>>>>>>>>>>>> Timo
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the
> least
> > > > > >>>> favorable
> > > > > >>>>>>>>>>>> thus I
> > > > > >>>>>>>>>>>>>>>> didn't
> > > > > >>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
> > > > > >>> discussion
> > > > > >>>> is
> > > > > >>>>>>>>>>>> mainly
> > > > > >>>>>>>>>>>>>>>> between
> > > > > >>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not
> > override
> > > > > >>>>> builtin.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions
> > are
> > > > > >>>>>>>>>>> differently
> > > > > >>>>>>>>>>>>>>> treated
> > > > > >>>>>>>>>>>>>>>>>>>> than
> > > > > >>>>>>>>>>>>>>>>>>>>>>> other db objects.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from
> > the
> > > > > >>> fact
> > > > > >>>>> that
> > > > > >>>>>>>>>>>>>>> functions
> > > > > >>>>>>>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink
> > don't
> > > > > >>> have
> > > > > >>>>> any
> > > > > >>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>>>>> built-in
> > > > > >>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Bowen
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>> Xuefu Zhang
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>> Xuefu Zhang
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> "In Honey We Trust!"
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>> Xuefu Zhang
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> "In Honey We Trust!"
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>> Xuefu Zhang
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> "In Honey We Trust!"
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>
> > > > >
> > > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
+1 for CREATE TEMPORARY SYSTEM FUNCTION xxx

Cheers, Fabian

Am Sa., 21. Sept. 2019 um 06:58 Uhr schrieb Bowen Li <bo...@gmail.com>:

> "SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
> SQL function stack and won't actually involve any DDL, thus I will just
> document the decision and we should keep it in mind when it's time to
> implement the DDLs.
>
> I'm in the process of updating the FLIP to reflect changes required for
> option #2, will send a new version for review soon.
>
>
>
> On Fri, Sep 20, 2019 at 4:02 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > I also like the 'System' keyword. I think we can assume we reached
> > consensus on this topic.
> >
> > On Sat, 21 Sep 2019, 06:37 Xuefu Z, <us...@gmail.com> wrote:
> >
> > > +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
> > >
> > > --Xuefu
> > >
> > > On Fri, Sep 20, 2019 at 3:28 PM Timo Walther <tw...@apache.org>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > sorry, for the late replay. I give also +1 for option #2. Thus, I
> guess
> > > > we have a clear winner.
> > > >
> > > > I would also like to find a better keyword/syntax for this statement.
> > > > Esp. the BUILTIN keyword can confuse people, because it could be
> > written
> > > > as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> > > > introduce a new reserved keyword in the parser which affects also
> > > > non-DDL queries. How about:
> > > >
> > > > CREATE TEMPORARY SYSTEM FUNCTION xxx
> > > >
> > > > The SYSTEM keyword is already a reserved keyword and in FLIP-66 we
> are
> > > > discussing to prefix some of the function with a SYSTEM_ prefix like
> > > > SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS
> OF".
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > >
> > > > On 20.09.19 05:45, Bowen Li wrote:
> > > > > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over
> > "ALTER
> > > > > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop
> the
> > > > > temporary built-in function in the same session? With the former
> one,
> > > > they
> > > > > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the
> > > latter
> > > > > one, I'm not sure how users can "restore" the original builtin
> > function
> > > > > easily from an "altered" function without introducing further
> > > nonstandard
> > > > > SQL syntax.
> > > > >
> > > > > Also please pardon me as I realized using net may not be a good
> > idea...
> > > > I'm
> > > > > trying to fit this vote into cases listed in Flink Bylaw [1].
> > > > >
> > > > > >From the following result, the majority seems to be #2 too as it
> has
> > > the
> > > > > most approval so far and doesn't have strong "-1".
> > > > >
> > > > > #1:3 (+1), 1 (0), 4(-1)
> > > > > #2:4(0), 3 (+1), 1(+0.5)
> > > > >         * Dawid -1/0 depending on keyword
> > > > > #3:2(+1), 3(-1), 3(0)
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > > >
> > > > > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> Thanks everyone for your votes. I summarized the result as
> > following:
> > > > >>
> > > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > > >> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
> > > > >>          Dawid -1/0 depending on keyword
> > > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > > >>
> > > > >> Given the result, I'd like to change my vote for #2 from 0 to +1,
> to
> > > > make
> > > > >> it a stronger case with net +3.5. So the votes so far are:
> > > > >>
> > > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > > >> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
> > > > >>          Dawid -1/0 depending on keyword
> > > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > > >>
> > > > >> What do you think? Do you think we can conclude with this result?
> Or
> > > > would
> > > > >> you like to take it as a formal FLIP vote with 3 days voting
> period?
> > > > >>
> > > > >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > > BUILTIN
> > > > >> FUNCTION xxx TEMPORARILY" because
> > > > >> 1. the syntax is more consistent with "CREATE FUNCTION" and
> "CREATE
> > > > >> TEMPORARY FUNCTION"
> > > > >> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a
> > > built-in
> > > > >> function but it actually doesn't, the logic only creates a temp
> > > function
> > > > >> with higher priority than that built-in function in ambiguous
> > > resolution
> > > > >> order; and it would behave inconsistently with "ALTER FUNCTION".
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> I agree, it's very similar from the implementation point of view
> > and
> > > > the
> > > > >>> implications.
> > > > >>>
> > > > >>> IMO, the difference is mostly on the mental model for the user.
> > > > >>> Instead of having a special class of temporary functions that
> have
> > > > >>> precedence over builtin functions it suggests to temporarily
> change
> > > > >>> built-in functions.
> > > > >>>
> > > > >>> Fabian
> > > > >>>
> > > > >>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <
> > > > ykt836@gmail.com
> > > > >>>> :
> > > > >>>> Hi Fabian,
> > > > >>>>
> > > > >>>> I think it's almost the same with #2 with different keyword:
> > > > >>>>
> > > > >>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Kurt
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <
> fhueske@gmail.com>
> > > > >>> wrote:
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>> I thought about it a bit more and think that there is some good
> > > value
> > > > >>> in
> > > > >>>> my
> > > > >>>>> last proposal.
> > > > >>>>>
> > > > >>>>> A lot of complexity comes from the fact that we want to allow
> > > > >>> overriding
> > > > >>>>> built-in functions which are differently addressed as other
> > > functions
> > > > >>>> (and
> > > > >>>>> db objects).
> > > > >>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the
> > same
> > > > >>> thing
> > > > >>>> as
> > > > >>>>> "CREATE FUNCTION" and treat both functions exactly the same
> > except
> > > > >>> that:
> > > > >>>>> 1) temp functions disappear at the end of the session
> > > > >>>>> 2) temp function are resolved before other functions
> > > > >>>>>
> > > > >>>>> This would be Dawid's proposal from the beginning of this
> thread
> > > (in
> > > > >>> case
> > > > >>>>> you still remember... ;-) )
> > > > >>>>>
> > > > >>>>> Temporarily overriding built-in functions would be supported
> with
> > > an
> > > > >>>>> explicit command like
> > > > >>>>>
> > > > >>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> > > > >>>>>
> > > > >>>>> This would also address the concerns about accidentally
> changing
> > > the
> > > > >>>>> semantics of built-in functions.
> > > > >>>>> IMO, it can't get much more explicit than the above command.
> > > > >>>>>
> > > > >>>>> Sorry for bringing up a new option in the middle of the
> > discussion,
> > > > >>> but
> > > > >>>> as
> > > > >>>>> I said, I think it has a bunch of benefits and I don't see
> major
> > > > >>>> drawbacks
> > > > >>>>> (maybe you do?).
> > > > >>>>>
> > > > >>>>> What do you think?
> > > > >>>>>
> > > > >>>>> Fabian
> > > > >>>>>
> > > > >>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > > > >>>>> fhueske@gmail.com
> > > > >>>>>> :
> > > > >>>>>> Hi everyone,
> > > > >>>>>>
> > > > >>>>>> I thought again about option #1 and something that I don't
> like
> > is
> > > > >>> that
> > > > >>>>>> the resolved address of xyz is different in "CREATE FUNCTION
> > xyz"
> > > > >>> and
> > > > >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> > > > >>>>>> IMO, adding the keyword "TEMPORARY" should only change the
> > > > >>> lifecycle of
> > > > >>>>>> the function, but not where it is located. This implicitly
> > changed
> > > > >>>>> location
> > > > >>>>>> might be confusing for users.
> > > > >>>>>> After all, a temp function should behave pretty much like any
> > > other
> > > > >>>>>> function, except for the fact that it disappears when the
> > session
> > > is
> > > > >>>>> closed.
> > > > >>>>>> Approach #2 with the additional keyword would make that pretty
> > > > >>> clear,
> > > > >>>>> IMO.
> > > > >>>>>> However, I neither like GLOBAL (for reasons mentioned by
> Dawid)
> > or
> > > > >>>>> BUILDIN
> > > > >>>>>> (we are not adding a built-in function).
> > > > >>>>>> So I'd be OK with #2 if we find a good keyword. In fact,
> > approach
> > > #2
> > > > >>>>> could
> > > > >>>>>> also be an alias for approach #3 to avoid explicit
> specification
> > > of
> > > > >>> the
> > > > >>>>>> system catalog/db.
> > > > >>>>>>
> > > > >>>>>> Approach #3 would be consistent with other db objects and the
> > > > >>> "CREATE
> > > > >>>>>> FUNCTION" statement.
> > > > >>>>>> Adding system catalog/db seems rather complex, but then again
> > how
> > > > >>> often
> > > > >>>>> do
> > > > >>>>>> we expect users to override built-in functions? If this
> becomes
> > a
> > > > >>> major
> > > > >>>>>> issue, we can still add option #2 as an alias.
> > > > >>>>>>
> > > > >>>>>> Not sure what's the best approach from an internal point of
> > view,
> > > > >>> but I
> > > > >>>>>> certainly think that consistent behavior is important.
> > > > >>>>>> Hence my votes are:
> > > > >>>>>>
> > > > >>>>>> -1 for #1
> > > > >>>>>> 0 for #2
> > > > >>>>>> 0 for #3
> > > > >>>>>>
> > > > >>>>>> Btw. Did we consider a completely separate command for
> > overriding
> > > > >>>>> built-in
> > > > >>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS
> ..."?
> > > > >>>>>>
> > > > >>>>>> Cheers, Fabian
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > > >>>>>> <lz...@aliyun.com.invalid>:
> > > > >>>>>>
> > > > >>>>>>> I know Hive and Spark can shadow built-in functions by
> > temporary
> > > > >>>>> function.
> > > > >>>>>>> Mysql, Oracle, Sql server can not shadow.
> > > > >>>>>>> User can use full names to access functions instead of
> > shadowing.
> > > > >>>>>>>
> > > > >>>>>>> So I think it is a completely new thing, and the direct way
> to
> > > deal
> > > > >>>> with
> > > > >>>>>>> new things is to add new grammar. So,
> > > > >>>>>>> +1 for #2, +0 for #3, -1 for #1
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Jingsong Lee
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > ------------------------------------------------------------------
> > > > >>>>>>> From:Kurt Young <yk...@gmail.com>
> > > > >>>>>>> Send Time:2019年9月19日(星期四) 16:43
> > > > >>>>>>> To:dev <de...@flink.apache.org>
> > > > >>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > > > >>>>>>>
> > > > >>>>>>> And let me make my vote complete:
> > > > >>>>>>>
> > > > >>>>>>> -1 for #1
> > > > >>>>>>> +1 for #2 with different keyword
> > > > >>>>>>> -0 for #3
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Kurt
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <ykt836@gmail.com
> >
> > > > >>> wrote:
> > > > >>>>>>>> Looks like I'm the only person who is willing to +1 to #2
> for
> > > now
> > > > >>>> :-)
> > > > >>>>>>>> But I would suggest to change the keyword from GLOBAL to
> > > > >>>>>>>> something like BUILTIN.
> > > > >>>>>>>>
> > > > >>>>>>>> I think #2 and #3 are almost the same proposal, just with
> > > > >>> different
> > > > >>>>>>>> format to indicate whether it want to override built-in
> > > > >>> functions.
> > > > >>>>>>>> My biggest reason to choose it is I want this behavior be
> > > > >>> consistent
> > > > >>>>>>>> with temporal tables. I will give some examples to show the
> > > > >>> behavior
> > > > >>>>>>>> and also make sure I'm not misunderstanding anything here.
> > > > >>>>>>>>
> > > > >>>>>>>> For most DBs, when user create a temporary table with:
> > > > >>>>>>>>
> > > > >>>>>>>> CREATE TEMPORARY TABLE t1
> > > > >>>>>>>>
> > > > >>>>>>>> It's actually equivalent with:
> > > > >>>>>>>>
> > > > >>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
> > > > >>>>>>>>
> > > > >>>>>>>> If user change current database, they will not be able to
> > access
> > > > >>> t1
> > > > >>>>>>> without
> > > > >>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
> > > > >>> database
> > > > >>>>> when
> > > > >>>>>>>> this temporary table is created).
> > > > >>>>>>>>
> > > > >>>>>>>> Only #2 and #3 followed this behavior and I would vote for
> > this
> > > > >>>> since
> > > > >>>>>>> this
> > > > >>>>>>>> makes such behavior consistent through temporal tables and
> > > > >>>> functions.
> > > > >>>>>>>> Why I'm not voting for #3 is a special catalog and database
> > just
> > > > >>>> looks
> > > > >>>>>>> very
> > > > >>>>>>>> hacky to me. It gave a imply that our built-in functions
> saved
> > > > >>> at a
> > > > >>>>>>>> special
> > > > >>>>>>>> catalog and database, which is actually not. Introducing a
> > > > >>> dedicated
> > > > >>>>>>>> keyword
> > > > >>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > > > >>>>>>>> straightforward. One can argue that we should avoid
> > introducing
> > > > >>> new
> > > > >>>>>>>> keyword,
> > > > >>>>>>>> but it's also very rare that a system can overwrite built-in
> > > > >>>>> functions.
> > > > >>>>>>>> Since we
> > > > >>>>>>>> decided to support this, introduce a new keyword is not a
> big
> > > > >>> deal
> > > > >>>>> IMO.
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Kurt
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> > > > >>> piotr@ververica.com
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>>
> > > > >>>>>>>>> It is a quite long discussion to follow and I hope I didn’t
> > > > >>>>>>> misunderstand
> > > > >>>>>>>>> anything. From the proposals presented by Xuefu I would
> vote:
> > > > >>>>>>>>>
> > > > >>>>>>>>> -1 for #1 and #2
> > > > >>>>>>>>> +1 for #3
> > > > >>>>>>>>>
> > > > >>>>>>>>> Besides #3 being IMO more general and more consistent,
> having
> > > > >>>>> qualified
> > > > >>>>>>>>> names (#3) would help/make easier for someone to use cross
> > > > >>>>>>>>> databases/catalogs queries (joining multiple data
> > > sets/streams).
> > > > >>>> For
> > > > >>>>>>>>> example with some functions to manipulate/clean up/convert
> > the
> > > > >>>> stored
> > > > >>>>>>> data
> > > > >>>>>>>>> in different catalogs registered in the respective
> catalogs.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Piotrek
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com>
> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I agree with Xuefu that inconsistent handling with all the
> > > > >>> other
> > > > >>>>>>>>> objects is
> > > > >>>>>>>>>> not a big problem.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Regarding to option#3, the special "system.system"
> namespace
> > > > >>> may
> > > > >>>>>>> confuse
> > > > >>>>>>>>>> users.
> > > > >>>>>>>>>> Users need to know the set of built-in function names to
> > know
> > > > >>>> when
> > > > >>>>> to
> > > > >>>>>>>>> use
> > > > >>>>>>>>>> "system.system" namespace.
> > > > >>>>>>>>>> What will happen if user registers a non-builtin function
> > name
> > > > >>>>> under
> > > > >>>>>>> the
> > > > >>>>>>>>>> "system.system" namespace?
> > > > >>>>>>>>>> Besides, I think it doesn't solve the "explode" problem I
> > > > >>>> mentioned
> > > > >>>>>>> at
> > > > >>>>>>>>> the
> > > > >>>>>>>>>> beginning of this thread.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> So here is my vote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> +1 for #1
> > > > >>>>>>>>>> 0 for #2
> > > > >>>>>>>>>> -1 for #3
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Jark
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
> > > > >>> wrote:
> > > > >>>>>>>>>>> @Dawid, Re: we also don't need additional referencing the
> > > > >>>>>>>>> specialcatalog
> > > > >>>>>>>>>>> anywhere.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> True. But once we allow such reference, then user can do
> so
> > > > >>> in
> > > > >>>> any
> > > > >>>>>>>>> possible
> > > > >>>>>>>>>>> place where a function name is expected, for which we
> have
> > to
> > > > >>>>>>> handle.
> > > > >>>>>>>>>>> That's a big difference, I think.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>> Xuefu
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > > > >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> @Bowen I am not suggesting introducing additional
> > catalog. I
> > > > >>>>> think
> > > > >>>>>>> we
> > > > >>>>>>>>>>> need
> > > > >>>>>>>>>>>> to get rid of the current built-in catalog.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
> > > > >>> referencing
> > > > >>>> the
> > > > >>>>>>>>> special
> > > > >>>>>>>>>>>> catalog anywhere else besides in the CREATE statement.
> The
> > > > >>>>>>> resolution
> > > > >>>>>>>>>>>> behaviour is exactly the same in both options.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
> > > > >>> wrote:
> > > > >>>>>>>>>>>>> Hi Dawid,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
> > > > >>>> approach.
> > > > >>>>> It
> > > > >>>>>>>>> can
> > > > >>>>>>>>>>> be
> > > > >>>>>>>>>>>>> changed to something else for better.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The difference between this and the #3 approach is that
> > we
> > > > >>>> only
> > > > >>>>>>> need
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>> keyword for this create DDL. For other places (such as
> > > > >>>> function
> > > > >>>>>>>>>>>>> referencing), no keyword or special namespace is
> needed.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>> Xuefu
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > > >>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > >>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>>>> I think it makes sense to start voting at this point.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Option 1: Only 1-part identifiers
> > > > >>>>>>>>>>>>>> PROS:
> > > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > > >>>>>>>>>>>>>> CONS:
> > > > >>>>>>>>>>>>>> - incosistent with all the other objects, both
> > permanent &
> > > > >>>>>>> temporary
> > > > >>>>>>>>>>>>>> - does not allow shadowing catalog functions
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Option 2: Special keyword for built-in function
> > > > >>>>>>>>>>>>>> I think this is quite similar to the special
> catalog/db.
> > > > >>> The
> > > > >>>>>>> thing I
> > > > >>>>>>>>>>> am
> > > > >>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL
> keyword.
> > > > >>> This
> > > > >>>>>>>>> keyword
> > > > >>>>>>>>>>>>> has a
> > > > >>>>>>>>>>>>>> meaning in rdbms systems and means a function that is
> > > > >>> present
> > > > >>>>>>> for a
> > > > >>>>>>>>>>>>>> lifetime of a session in which it was created, but
> > > > >>> available
> > > > >>>> in
> > > > >>>>>>> all
> > > > >>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
> > > > >>> keyword
> > > > >>>> in
> > > > >>>>> a
> > > > >>>>>>>>>>>>> different
> > > > >>>>>>>>>>>>>> context.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Option 3: Special catalog/db
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> PROS:
> > > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > > >>>>>>>>>>>>>> - allows shadowing catalog functions
> > > > >>>>>>>>>>>>>> - consistent with other objects
> > > > >>>>>>>>>>>>>> CONS:
> > > > >>>>>>>>>>>>>> - we introduce a special namespace for built-in
> > functions
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I don't see a problem with introducing the special
> > > > >>> namespace.
> > > > >>>>> In
> > > > >>>>>>> the
> > > > >>>>>>>>>>>> end
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>> is very similar to the keyword approach. In this case
> > the
> > > > >>>>>>> catalog/db
> > > > >>>>>>>>>>>>>> combination would be the "keyword"
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Therefore my votes:
> > > > >>>>>>>>>>>>>> Option 1: -0
> > > > >>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up
> > with
> > > > >>> a
> > > > >>>>>>> better
> > > > >>>>>>>>>>>>> keyword)
> > > > >>>>>>>>>>>>>> Option 3: +1
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>>> Dawid
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <
> usxuefu@gmail.com>
> > > > >>>> wrote:
> > > > >>>>>>>>>>>>>>> Hi Aljoscha,
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thanks for the summary and these are great questions
> to
> > > > >>> be
> > > > >>>>>>>>>>> answered.
> > > > >>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>> answer to your first question is clear: there is a
> > > > >>> general
> > > > >>>>>>>>>>> agreement
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> override built-in functions with temp functions.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> However, your second and third questions are sort of
> > > > >>>> related,
> > > > >>>>>>> as a
> > > > >>>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>> reference can be either just function name (like
> > "func")
> > > > >>> or
> > > > >>>> in
> > > > >>>>>>> the
> > > > >>>>>>>>>>>> form
> > > > >>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>> "cat.db.func". When a reference is just function
> name,
> > it
> > > > >>>> can
> > > > >>>>>>> mean
> > > > >>>>>>>>>>>>>> either a
> > > > >>>>>>>>>>>>>>> built-in function or a function defined in the
> current
> > > > >>>> cat/db.
> > > > >>>>>>> If
> > > > >>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>> support overriding a built-in function with a temp
> > > > >>> function,
> > > > >>>>>>> such
> > > > >>>>>>>>>>>>>>> overriding can also cover a function in the current
> > > > >>> cat/db.
> > > > >>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
> > > > >>>> function"
> > > > >>>>>>>>>>> means a
> > > > >>>>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a catalog
> > > > >>>> function
> > > > >>>>>>>>>>> "func"
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support
> this,
> > > > >>> temp
> > > > >>>>>>>>>>> function
> > > > >>>>>>>>>>>>> has
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that the
> > 2nd
> > > > >>>> and
> > > > >>>>>>> 3rd
> > > > >>>>>>>>>>>>>> questions
> > > > >>>>>>>>>>>>>>> are related. The problem with such support is the
> > > > >>> ambiguity
> > > > >>>>> when
> > > > >>>>>>>>>>> user
> > > > >>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> > > > >>> FUNCTION
> > > > >>>>>>> func
> > > > >>>>>>>>>>>> ...".
> > > > >>>>>>>>>>>>>>> Here "func" can means a global temp function, or a
> temp
> > > > >>>>>>> function in
> > > > >>>>>>>>>>>>>> current
> > > > >>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates an
> > > > >>>>>>> inconsistency
> > > > >>>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
> > > > >>> current
> > > > >>>>>>> cat/db.
> > > > >>>>>>>>>>>> If
> > > > >>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>> assume the latter, then there is no way for user to
> > > > >>> create a
> > > > >>>>>>> global
> > > > >>>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>> function.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Giving a special namespace for built-in functions may
> > > > >>> solve
> > > > >>>>> the
> > > > >>>>>>>>>>>>> ambiguity
> > > > >>>>>>>>>>>>>>> problem above, but it also introduces artificial
> > > > >>>>>>> catalog/database
> > > > >>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness of
> > > > >>> the
> > > > >>>>>>> code. I
> > > > >>>>>>>>>>>>> would
> > > > >>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the
> problem,
> > > > >>> like
> > > > >>>>>>> "CREATE
> > > > >>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate proposals
> > for
> > > > >>>>> voting
> > > > >>>>>>>>>>>>>> purposes:
> > > > >>>>>>>>>>>>>>> 1. Support only global, temporary functions without
> > > > >>>> namespace.
> > > > >>>>>>> Such
> > > > >>>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>> functions overrides built-in functions and catalog
> > > > >>> functions
> > > > >>>>> in
> > > > >>>>>>>>>>>> current
> > > > >>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
> > > > >>> built-in
> > > > >>>>>>>>>>> functions
> > > > >>>>>>>>>>>>> ->
> > > > >>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
> > > > >>> functions
> > > > >>>> has
> > > > >>>>>>> no
> > > > >>>>>>>>>>>>>>> ambiguity!)
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> 2. In addition to #1, support creating and
> referencing
> > > > >>>>> temporary
> > > > >>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in
> DDL
> > > > >>> for
> > > > >>>>>>> global
> > > > >>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>> functions. The resolution order is: global temp
> > > > >>> functions ->
> > > > >>>>>>>>>>> built-in
> > > > >>>>>>>>>>>>>>> functions -> temp functions in current cat/db ->
> > catalog
> > > > >>>>>>> function.
> > > > >>>>>>>>>>>>>>> (Resolution for partially or fully qualified function
> > > > >>>>> reference
> > > > >>>>>>> is:
> > > > >>>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>> functions -> persistent functions.)
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> 3. In addition to #1, support creating and
> referencing
> > > > >>>>> temporary
> > > > >>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>> associated with a cat/db with a special namespace for
> > > > >>>> built-in
> > > > >>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>> and global temp functions. The resolution is the same
> > as
> > > > >>> #2,
> > > > >>>>>>> except
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>> the special namespace might be prefixed to a
> reference
> > > > >>> to a
> > > > >>>>>>>>>>> built-in
> > > > >>>>>>>>>>>>>>> function or global temp function. (In absence of the
> > > > >>> special
> > > > >>>>>>>>>>>> namespace,
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> resolution order is the same as in #2.)
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> My personal preference is #1, given the unknown use
> > case
> > > > >>> and
> > > > >>>>>>>>>>>> introduced
> > > > >>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an
> acceptable
> > > > >>>>>>> alternative.
> > > > >>>>>>>>>>>>> Thus,
> > > > >>>>>>>>>>>>>>> my votes are:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> +1 for #1
> > > > >>>>>>>>>>>>>>> +0 for #2
> > > > >>>>>>>>>>>>>>> -1 for #3
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
> > > > >>> please!),
> > > > >>>> or
> > > > >>>>>>> let
> > > > >>>>>>>>>>> me
> > > > >>>>>>>>>>>>> know
> > > > >>>>>>>>>>>>>>> if you have more questions or other candidates.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>> Xuefu
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > > > >>>>>>>>>>>> aljoscha@apache.org>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64 are
> > very
> > > > >>>>>>>>>>> connected.
> > > > >>>>>>>>>>>>> To
> > > > >>>>>>>>>>>>>>>> resolve the differences, think we have to think
> about
> > > > >>> the
> > > > >>>>> basic
> > > > >>>>>>>>>>>>>>> principles
> > > > >>>>>>>>>>>>>>>> and find consensus there. The basic questions I see
> > are:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> - Do we want to support overriding builtin
> functions?
> > > > >>>>>>>>>>>>>>>> - Do we want to support overriding catalog
> functions?
> > > > >>>>>>>>>>>>>>>> - And then later: should temporary functions be tied
> > to
> > > > >>> a
> > > > >>>>>>>>>>>>>>>> catalog/database?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I don’t have much to say about these, except that we
> > > > >>> should
> > > > >>>>>>>>>>>> somewhat
> > > > >>>>>>>>>>>>>>> stick
> > > > >>>>>>>>>>>>>>>> to what the industry does. But I also understand
> that
> > > > >>> the
> > > > >>>>>>>>>>> industry
> > > > >>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>> already very divided on this.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>>>>> Aljoscha
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <
> imjark@gmail.com
> > >
> > > > >>>>> wrote:
> > > > >>>>>>>>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the
> remaining
> > > > >>>> topics.
> > > > >>>>>>> We
> > > > >>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if
> we
> > > > >>>> resume
> > > > >>>>>>> the
> > > > >>>>>>>>>>>>> topic
> > > > >>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>> time later.
> > > > >>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with
> Timo’s
> > > > >>>>>>>>>>>> “cat.db.fun”
> > > > >>>>>>>>>>>>>> way
> > > > >>>>>>>>>>>>>>>> to override a catalog function.
> > > > >>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it
> > introduces a
> > > > >>>>>>>>>>>> nonexistent
> > > > >>>>>>>>>>>>>> cat
> > > > >>>>>>>>>>>>>>>> & db? And we still need to do special treatment for
> > the
> > > > >>>>>>> dedicated
> > > > >>>>>>>>>>>>>>>> system.system cat & db?
> > > > >>>>>>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>>>>>> Jark
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <
> twalthr@apache.org
> > >
> > > > >>> 写道:
> > > > >>>>>>>>>>>>>>>>>> Hi everyone,
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many
> things
> > > > >>>>>>>>>>>>> incrementally.
> > > > >>>>>>>>>>>>>>>> Users should be able to override all catalog objects
> > > > >>>>>>> consistently
> > > > >>>>>>>>>>>>>>> according
> > > > >>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> > > > >>> module).
> > > > >>>>> If
> > > > >>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>> are treated completely different, we need more code
> > and
> > > > >>>>> special
> > > > >>>>>>>>>>>>> cases.
> > > > >>>>>>>>>>>>>>> From
> > > > >>>>>>>>>>>>>>>> an implementation perspective, this topic only
> affects
> > > > >>> the
> > > > >>>>>>> lookup
> > > > >>>>>>>>>>>>> logic
> > > > >>>>>>>>>>>>>>>> which is rather low implementation effort which is
> > why I
> > > > >>>>> would
> > > > >>>>>>>>>>> like
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have a
> > > > >>> slight
> > > > >>>>>>>>>>> consenus
> > > > >>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>> overriding built-in functions; we should also strive
> > for
> > > > >>>>>>> reaching
> > > > >>>>>>>>>>>>>>> consensus
> > > > >>>>>>>>>>>>>>>> on the remaining topics.
> > > > >>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering
> > > > >>>> catalog
> > > > >>>>>>>>>>>> objects
> > > > >>>>>>>>>>>>>>>> consistent and the overriding of built-in functions
> > more
> > > > >>>>>>>>>>> explicit.
> > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>> Timo
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > > > >>>>>>>>>>>>>>>>>>> hi, everyone
> > > > >>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it supports
> > > > >>>>> functions
> > > > >>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing
> the
> > > > >>>>>>>>>>> duplication
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> functions.
> > > > >>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
> > > > >>> implements
> > > > >>>>>>>>>>> create
> > > > >>>>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata and
> > > > >>> schema
> > > > >>>>> into
> > > > >>>>>>>>>>>>> mysql,
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes
> sql-client
> > to
> > > > >>>>>>> support
> > > > >>>>>>>>>>>>>> custom
> > > > >>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function
> is
> > > > >>>>> currently
> > > > >>>>>>>>>>>>>> global,
> > > > >>>>>>>>>>>>>>>> and is
> > > > >>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in
> the
> > > > >>>>>>>>>>> development
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the
> > community,
> > > > >>> but
> > > > >>>>>>>>>>> found
> > > > >>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>> difficult to join.
> > > > >>>>>>>>>>>>>>>>>>> thank you.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二
> > 上午11:19写道:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> It seems to me that there is a general consensus
> > on
> > > > >>>>> having
> > > > >>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
> > > > >>>> functions.
> > > > >>>>>>>>>>> (As
> > > > >>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> side
> > > > >>>>>>>>>>>>>>>> note
> > > > >>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
> > > > >>> functions
> > > > >>>> are
> > > > >>>>>>>>>>> all
> > > > >>>>>>>>>>>>>>>> temporary and
> > > > >>>>>>>>>>>>>>>>>>>> having no namespaces.)
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having
> > > > >>>>> namespaced
> > > > >>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a
> specific
> > > > >>>>> cat/db.
> > > > >>>>>>>>>>>>>> However,
> > > > >>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can be
> > > > >>> added
> > > > >>>>>>>>>>>>>>> incrementally.
> > > > >>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced temp
> > > > >>>>> functions
> > > > >>>>>>>>>>> now
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> leave
> > > > >>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
> > > > >>> releases as
> > > > >>>>> the
> > > > >>>>>>>>>>>>>>>> requirement
> > > > >>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps
> shorten
> > > > >>> the
> > > > >>>>>>>>>>> debate
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> allow us
> > > > >>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db
> to
> > > > >>> host
> > > > >>>>> the
> > > > >>>>>>>>>>>>>>> temporary
> > > > >>>>>>>>>>>>>>>> temp
> > > > >>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
> > > > >>> concern
> > > > >>>> is
> > > > >>>>>>> the
> > > > >>>>>>>>>>>>>> special
> > > > >>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
> > > > >>> clean, as
> > > > >>>>>>>>>>>> evident
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>> treating
> > > > >>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>> Xuefiu
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid
> Wysakowicz <
> > > > >>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
> > > > >>> suggestion.
> > > > >>>>> How
> > > > >>>>>>>>>>>> about
> > > > >>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>> have a
> > > > >>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for
> > built-in
> > > > >>>>>>>>>>> objects?
> > > > >>>>>>>>>>>>> This
> > > > >>>>>>>>>>>>>>>> catalog
> > > > >>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
> > > > >>> suggesting.
> > > > >>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
> > > > >>> functions, if
> > > > >>>>>>> they
> > > > >>>>>>>>>>>>> fully
> > > > >>>>>>>>>>>>>>>> qualify
> > > > >>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by
> > default
> > > > >>> the
> > > > >>>>>>>>>>> common
> > > > >>>>>>>>>>>>>> logic
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat &
> dB
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func
> ...
> > > > >>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary
> > function
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be
> > writable
> > > > >>>> for
> > > > >>>>>>>>>>>>> permanent
> > > > >>>>>>>>>>>>>>>>>>>> objects.
> > > > >>>>>>>>>>>>>>>>>>>>> WDYT?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
> > > > >>>> solutions.
> > > > >>>>>>>>>>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>>>>>>>>>> Dawid
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > > > >>>>>>>>>>> twalthr@apache.org
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of
> overriding
> > > > >>>>> certain
> > > > >>>>>>>>>>>>>> built-in
> > > > >>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many
> > > > >>> people
> > > > >>>>>>>>>>> agree.
> > > > >>>>>>>>>>>>>>>> However, it
> > > > >>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding
> > catalog
> > > > >>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a
> > query
> > > > >>>> even
> > > > >>>>>>>>>>>> though
> > > > >>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available
> > currently
> > > > >>> or
> > > > >>>>>>>>>>> should
> > > > >>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and
> > never
> > > > >>>>>>>>>>>> consideres
> > > > >>>>>>>>>>>>>>>> current
> > > > >>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with other
> > DDL
> > > > >>> but
> > > > >>>>>>>>>>>>> acceptable
> > > > >>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>> functions I guess.
> > > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> > > > >>>> objects
> > > > >>>>>>>>>>>>> (tables,
> > > > >>>>>>>>>>>>>>>> views)
> > > > >>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the
> near
> > > > >>>>> future.
> > > > >>>>>>>>>>>> Take
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > https://issues.apache.org/jira/browse/FLINK-13900
> > > > >>> as
> > > > >>>>> an
> > > > >>>>>>>>>>>>>> example.
> > > > >>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>>>>>>>> Timo
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> > > > >>>> favorable
> > > > >>>>>>>>>>>> thus I
> > > > >>>>>>>>>>>>>>>> didn't
> > > > >>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
> > > > >>> discussion
> > > > >>>> is
> > > > >>>>>>>>>>>> mainly
> > > > >>>>>>>>>>>>>>>> between
> > > > >>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not
> override
> > > > >>>>> builtin.
> > > > >>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions
> are
> > > > >>>>>>>>>>> differently
> > > > >>>>>>>>>>>>>>> treated
> > > > >>>>>>>>>>>>>>>>>>>> than
> > > > >>>>>>>>>>>>>>>>>>>>>>> other db objects.
> > > > >>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from
> the
> > > > >>> fact
> > > > >>>>> that
> > > > >>>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink
> don't
> > > > >>> have
> > > > >>>>> any
> > > > >>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>>> built-in
> > > > >>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>>>>>>>>>>>> Bowen
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>> Xuefu Zhang
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>> Xuefu Zhang
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> "In Honey We Trust!"
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>> Xuefu Zhang
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> "In Honey We Trust!"
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> Xuefu Zhang
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> "In Honey We Trust!"
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>
> > > >
> > > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
"SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
SQL function stack and won't actually involve any DDL, thus I will just
document the decision and we should keep it in mind when it's time to
implement the DDLs.

I'm in the process of updating the FLIP to reflect changes required for
option #2, will send a new version for review soon.



On Fri, Sep 20, 2019 at 4:02 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> I also like the 'System' keyword. I think we can assume we reached
> consensus on this topic.
>
> On Sat, 21 Sep 2019, 06:37 Xuefu Z, <us...@gmail.com> wrote:
>
> > +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
> >
> > --Xuefu
> >
> > On Fri, Sep 20, 2019 at 3:28 PM Timo Walther <tw...@apache.org> wrote:
> >
> > > Hi everyone,
> > >
> > > sorry, for the late replay. I give also +1 for option #2. Thus, I guess
> > > we have a clear winner.
> > >
> > > I would also like to find a better keyword/syntax for this statement.
> > > Esp. the BUILTIN keyword can confuse people, because it could be
> written
> > > as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> > > introduce a new reserved keyword in the parser which affects also
> > > non-DDL queries. How about:
> > >
> > > CREATE TEMPORARY SYSTEM FUNCTION xxx
> > >
> > > The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are
> > > discussing to prefix some of the function with a SYSTEM_ prefix like
> > > SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 20.09.19 05:45, Bowen Li wrote:
> > > > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over
> "ALTER
> > > > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
> > > > temporary built-in function in the same session? With the former one,
> > > they
> > > > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the
> > latter
> > > > one, I'm not sure how users can "restore" the original builtin
> function
> > > > easily from an "altered" function without introducing further
> > nonstandard
> > > > SQL syntax.
> > > >
> > > > Also please pardon me as I realized using net may not be a good
> idea...
> > > I'm
> > > > trying to fit this vote into cases listed in Flink Bylaw [1].
> > > >
> > > > >From the following result, the majority seems to be #2 too as it has
> > the
> > > > most approval so far and doesn't have strong "-1".
> > > >
> > > > #1:3 (+1), 1 (0), 4(-1)
> > > > #2:4(0), 3 (+1), 1(+0.5)
> > > >         * Dawid -1/0 depending on keyword
> > > > #3:2(+1), 3(-1), 3(0)
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > >
> > > > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com>
> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Thanks everyone for your votes. I summarized the result as
> following:
> > > >>
> > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > >> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
> > > >>          Dawid -1/0 depending on keyword
> > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > >>
> > > >> Given the result, I'd like to change my vote for #2 from 0 to +1, to
> > > make
> > > >> it a stronger case with net +3.5. So the votes so far are:
> > > >>
> > > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > > >> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
> > > >>          Dawid -1/0 depending on keyword
> > > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > > >>
> > > >> What do you think? Do you think we can conclude with this result? Or
> > > would
> > > >> you like to take it as a formal FLIP vote with 3 days voting period?
> > > >>
> > > >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > BUILTIN
> > > >> FUNCTION xxx TEMPORARILY" because
> > > >> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
> > > >> TEMPORARY FUNCTION"
> > > >> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a
> > built-in
> > > >> function but it actually doesn't, the logic only creates a temp
> > function
> > > >> with higher priority than that built-in function in ambiguous
> > resolution
> > > >> order; and it would behave inconsistently with "ALTER FUNCTION".
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com>
> > > wrote:
> > > >>
> > > >>> I agree, it's very similar from the implementation point of view
> and
> > > the
> > > >>> implications.
> > > >>>
> > > >>> IMO, the difference is mostly on the mental model for the user.
> > > >>> Instead of having a special class of temporary functions that have
> > > >>> precedence over builtin functions it suggests to temporarily change
> > > >>> built-in functions.
> > > >>>
> > > >>> Fabian
> > > >>>
> > > >>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <
> > > ykt836@gmail.com
> > > >>>> :
> > > >>>> Hi Fabian,
> > > >>>>
> > > >>>> I think it's almost the same with #2 with different keyword:
> > > >>>>
> > > >>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
> > > >>>>
> > > >>>> Best,
> > > >>>> Kurt
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com>
> > > >>> wrote:
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> I thought about it a bit more and think that there is some good
> > value
> > > >>> in
> > > >>>> my
> > > >>>>> last proposal.
> > > >>>>>
> > > >>>>> A lot of complexity comes from the fact that we want to allow
> > > >>> overriding
> > > >>>>> built-in functions which are differently addressed as other
> > functions
> > > >>>> (and
> > > >>>>> db objects).
> > > >>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the
> same
> > > >>> thing
> > > >>>> as
> > > >>>>> "CREATE FUNCTION" and treat both functions exactly the same
> except
> > > >>> that:
> > > >>>>> 1) temp functions disappear at the end of the session
> > > >>>>> 2) temp function are resolved before other functions
> > > >>>>>
> > > >>>>> This would be Dawid's proposal from the beginning of this thread
> > (in
> > > >>> case
> > > >>>>> you still remember... ;-) )
> > > >>>>>
> > > >>>>> Temporarily overriding built-in functions would be supported with
> > an
> > > >>>>> explicit command like
> > > >>>>>
> > > >>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> > > >>>>>
> > > >>>>> This would also address the concerns about accidentally changing
> > the
> > > >>>>> semantics of built-in functions.
> > > >>>>> IMO, it can't get much more explicit than the above command.
> > > >>>>>
> > > >>>>> Sorry for bringing up a new option in the middle of the
> discussion,
> > > >>> but
> > > >>>> as
> > > >>>>> I said, I think it has a bunch of benefits and I don't see major
> > > >>>> drawbacks
> > > >>>>> (maybe you do?).
> > > >>>>>
> > > >>>>> What do you think?
> > > >>>>>
> > > >>>>> Fabian
> > > >>>>>
> > > >>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > > >>>>> fhueske@gmail.com
> > > >>>>>> :
> > > >>>>>> Hi everyone,
> > > >>>>>>
> > > >>>>>> I thought again about option #1 and something that I don't like
> is
> > > >>> that
> > > >>>>>> the resolved address of xyz is different in "CREATE FUNCTION
> xyz"
> > > >>> and
> > > >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> > > >>>>>> IMO, adding the keyword "TEMPORARY" should only change the
> > > >>> lifecycle of
> > > >>>>>> the function, but not where it is located. This implicitly
> changed
> > > >>>>> location
> > > >>>>>> might be confusing for users.
> > > >>>>>> After all, a temp function should behave pretty much like any
> > other
> > > >>>>>> function, except for the fact that it disappears when the
> session
> > is
> > > >>>>> closed.
> > > >>>>>> Approach #2 with the additional keyword would make that pretty
> > > >>> clear,
> > > >>>>> IMO.
> > > >>>>>> However, I neither like GLOBAL (for reasons mentioned by Dawid)
> or
> > > >>>>> BUILDIN
> > > >>>>>> (we are not adding a built-in function).
> > > >>>>>> So I'd be OK with #2 if we find a good keyword. In fact,
> approach
> > #2
> > > >>>>> could
> > > >>>>>> also be an alias for approach #3 to avoid explicit specification
> > of
> > > >>> the
> > > >>>>>> system catalog/db.
> > > >>>>>>
> > > >>>>>> Approach #3 would be consistent with other db objects and the
> > > >>> "CREATE
> > > >>>>>> FUNCTION" statement.
> > > >>>>>> Adding system catalog/db seems rather complex, but then again
> how
> > > >>> often
> > > >>>>> do
> > > >>>>>> we expect users to override built-in functions? If this becomes
> a
> > > >>> major
> > > >>>>>> issue, we can still add option #2 as an alias.
> > > >>>>>>
> > > >>>>>> Not sure what's the best approach from an internal point of
> view,
> > > >>> but I
> > > >>>>>> certainly think that consistent behavior is important.
> > > >>>>>> Hence my votes are:
> > > >>>>>>
> > > >>>>>> -1 for #1
> > > >>>>>> 0 for #2
> > > >>>>>> 0 for #3
> > > >>>>>>
> > > >>>>>> Btw. Did we consider a completely separate command for
> overriding
> > > >>>>> built-in
> > > >>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> > > >>>>>>
> > > >>>>>> Cheers, Fabian
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > >>>>>> <lz...@aliyun.com.invalid>:
> > > >>>>>>
> > > >>>>>>> I know Hive and Spark can shadow built-in functions by
> temporary
> > > >>>>> function.
> > > >>>>>>> Mysql, Oracle, Sql server can not shadow.
> > > >>>>>>> User can use full names to access functions instead of
> shadowing.
> > > >>>>>>>
> > > >>>>>>> So I think it is a completely new thing, and the direct way to
> > deal
> > > >>>> with
> > > >>>>>>> new things is to add new grammar. So,
> > > >>>>>>> +1 for #2, +0 for #3, -1 for #1
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Jingsong Lee
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > ------------------------------------------------------------------
> > > >>>>>>> From:Kurt Young <yk...@gmail.com>
> > > >>>>>>> Send Time:2019年9月19日(星期四) 16:43
> > > >>>>>>> To:dev <de...@flink.apache.org>
> > > >>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > > >>>>>>>
> > > >>>>>>> And let me make my vote complete:
> > > >>>>>>>
> > > >>>>>>> -1 for #1
> > > >>>>>>> +1 for #2 with different keyword
> > > >>>>>>> -0 for #3
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Kurt
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
> > > >>> wrote:
> > > >>>>>>>> Looks like I'm the only person who is willing to +1 to #2 for
> > now
> > > >>>> :-)
> > > >>>>>>>> But I would suggest to change the keyword from GLOBAL to
> > > >>>>>>>> something like BUILTIN.
> > > >>>>>>>>
> > > >>>>>>>> I think #2 and #3 are almost the same proposal, just with
> > > >>> different
> > > >>>>>>>> format to indicate whether it want to override built-in
> > > >>> functions.
> > > >>>>>>>> My biggest reason to choose it is I want this behavior be
> > > >>> consistent
> > > >>>>>>>> with temporal tables. I will give some examples to show the
> > > >>> behavior
> > > >>>>>>>> and also make sure I'm not misunderstanding anything here.
> > > >>>>>>>>
> > > >>>>>>>> For most DBs, when user create a temporary table with:
> > > >>>>>>>>
> > > >>>>>>>> CREATE TEMPORARY TABLE t1
> > > >>>>>>>>
> > > >>>>>>>> It's actually equivalent with:
> > > >>>>>>>>
> > > >>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
> > > >>>>>>>>
> > > >>>>>>>> If user change current database, they will not be able to
> access
> > > >>> t1
> > > >>>>>>> without
> > > >>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
> > > >>> database
> > > >>>>> when
> > > >>>>>>>> this temporary table is created).
> > > >>>>>>>>
> > > >>>>>>>> Only #2 and #3 followed this behavior and I would vote for
> this
> > > >>>> since
> > > >>>>>>> this
> > > >>>>>>>> makes such behavior consistent through temporal tables and
> > > >>>> functions.
> > > >>>>>>>> Why I'm not voting for #3 is a special catalog and database
> just
> > > >>>> looks
> > > >>>>>>> very
> > > >>>>>>>> hacky to me. It gave a imply that our built-in functions saved
> > > >>> at a
> > > >>>>>>>> special
> > > >>>>>>>> catalog and database, which is actually not. Introducing a
> > > >>> dedicated
> > > >>>>>>>> keyword
> > > >>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > > >>>>>>>> straightforward. One can argue that we should avoid
> introducing
> > > >>> new
> > > >>>>>>>> keyword,
> > > >>>>>>>> but it's also very rare that a system can overwrite built-in
> > > >>>>> functions.
> > > >>>>>>>> Since we
> > > >>>>>>>> decided to support this, introduce a new keyword is not a big
> > > >>> deal
> > > >>>>> IMO.
> > > >>>>>>>> Best,
> > > >>>>>>>> Kurt
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> > > >>> piotr@ververica.com
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> It is a quite long discussion to follow and I hope I didn’t
> > > >>>>>>> misunderstand
> > > >>>>>>>>> anything. From the proposals presented by Xuefu I would vote:
> > > >>>>>>>>>
> > > >>>>>>>>> -1 for #1 and #2
> > > >>>>>>>>> +1 for #3
> > > >>>>>>>>>
> > > >>>>>>>>> Besides #3 being IMO more general and more consistent, having
> > > >>>>> qualified
> > > >>>>>>>>> names (#3) would help/make easier for someone to use cross
> > > >>>>>>>>> databases/catalogs queries (joining multiple data
> > sets/streams).
> > > >>>> For
> > > >>>>>>>>> example with some functions to manipulate/clean up/convert
> the
> > > >>>> stored
> > > >>>>>>> data
> > > >>>>>>>>> in different catalogs registered in the respective catalogs.
> > > >>>>>>>>>
> > > >>>>>>>>> Piotrek
> > > >>>>>>>>>
> > > >>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> I agree with Xuefu that inconsistent handling with all the
> > > >>> other
> > > >>>>>>>>> objects is
> > > >>>>>>>>>> not a big problem.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regarding to option#3, the special "system.system" namespace
> > > >>> may
> > > >>>>>>> confuse
> > > >>>>>>>>>> users.
> > > >>>>>>>>>> Users need to know the set of built-in function names to
> know
> > > >>>> when
> > > >>>>> to
> > > >>>>>>>>> use
> > > >>>>>>>>>> "system.system" namespace.
> > > >>>>>>>>>> What will happen if user registers a non-builtin function
> name
> > > >>>>> under
> > > >>>>>>> the
> > > >>>>>>>>>> "system.system" namespace?
> > > >>>>>>>>>> Besides, I think it doesn't solve the "explode" problem I
> > > >>>> mentioned
> > > >>>>>>> at
> > > >>>>>>>>> the
> > > >>>>>>>>>> beginning of this thread.
> > > >>>>>>>>>>
> > > >>>>>>>>>> So here is my vote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> +1 for #1
> > > >>>>>>>>>> 0 for #2
> > > >>>>>>>>>> -1 for #3
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Jark
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
> > > >>> wrote:
> > > >>>>>>>>>>> @Dawid, Re: we also don't need additional referencing the
> > > >>>>>>>>> specialcatalog
> > > >>>>>>>>>>> anywhere.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> True. But once we allow such reference, then user can do so
> > > >>> in
> > > >>>> any
> > > >>>>>>>>> possible
> > > >>>>>>>>>>> place where a function name is expected, for which we have
> to
> > > >>>>>>> handle.
> > > >>>>>>>>>>> That's a big difference, I think.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks,
> > > >>>>>>>>>>> Xuefu
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > > >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> @Bowen I am not suggesting introducing additional
> catalog. I
> > > >>>>> think
> > > >>>>>>> we
> > > >>>>>>>>>>> need
> > > >>>>>>>>>>>> to get rid of the current built-in catalog.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
> > > >>> referencing
> > > >>>> the
> > > >>>>>>>>> special
> > > >>>>>>>>>>>> catalog anywhere else besides in the CREATE statement. The
> > > >>>>>>> resolution
> > > >>>>>>>>>>>> behaviour is exactly the same in both options.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
> > > >>> wrote:
> > > >>>>>>>>>>>>> Hi Dawid,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
> > > >>>> approach.
> > > >>>>> It
> > > >>>>>>>>> can
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>>> changed to something else for better.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> The difference between this and the #3 approach is that
> we
> > > >>>> only
> > > >>>>>>> need
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> keyword for this create DDL. For other places (such as
> > > >>>> function
> > > >>>>>>>>>>>>> referencing), no keyword or special namespace is needed.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> Xuefu
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > >>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>> I think it makes sense to start voting at this point.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Option 1: Only 1-part identifiers
> > > >>>>>>>>>>>>>> PROS:
> > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > >>>>>>>>>>>>>> CONS:
> > > >>>>>>>>>>>>>> - incosistent with all the other objects, both
> permanent &
> > > >>>>>>> temporary
> > > >>>>>>>>>>>>>> - does not allow shadowing catalog functions
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Option 2: Special keyword for built-in function
> > > >>>>>>>>>>>>>> I think this is quite similar to the special catalog/db.
> > > >>> The
> > > >>>>>>> thing I
> > > >>>>>>>>>>> am
> > > >>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL keyword.
> > > >>> This
> > > >>>>>>>>> keyword
> > > >>>>>>>>>>>>> has a
> > > >>>>>>>>>>>>>> meaning in rdbms systems and means a function that is
> > > >>> present
> > > >>>>>>> for a
> > > >>>>>>>>>>>>>> lifetime of a session in which it was created, but
> > > >>> available
> > > >>>> in
> > > >>>>>>> all
> > > >>>>>>>>>>>> other
> > > >>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
> > > >>> keyword
> > > >>>> in
> > > >>>>> a
> > > >>>>>>>>>>>>> different
> > > >>>>>>>>>>>>>> context.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Option 3: Special catalog/db
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> PROS:
> > > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > > >>>>>>>>>>>>>> - allows shadowing catalog functions
> > > >>>>>>>>>>>>>> - consistent with other objects
> > > >>>>>>>>>>>>>> CONS:
> > > >>>>>>>>>>>>>> - we introduce a special namespace for built-in
> functions
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I don't see a problem with introducing the special
> > > >>> namespace.
> > > >>>>> In
> > > >>>>>>> the
> > > >>>>>>>>>>>> end
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>> is very similar to the keyword approach. In this case
> the
> > > >>>>>>> catalog/db
> > > >>>>>>>>>>>>>> combination would be the "keyword"
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Therefore my votes:
> > > >>>>>>>>>>>>>> Option 1: -0
> > > >>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up
> with
> > > >>> a
> > > >>>>>>> better
> > > >>>>>>>>>>>>> keyword)
> > > >>>>>>>>>>>>>> Option 3: +1
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>> Dawid
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>>>>>>>>>> Hi Aljoscha,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks for the summary and these are great questions to
> > > >>> be
> > > >>>>>>>>>>> answered.
> > > >>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>> answer to your first question is clear: there is a
> > > >>> general
> > > >>>>>>>>>>> agreement
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> override built-in functions with temp functions.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> However, your second and third questions are sort of
> > > >>>> related,
> > > >>>>>>> as a
> > > >>>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>> reference can be either just function name (like
> "func")
> > > >>> or
> > > >>>> in
> > > >>>>>>> the
> > > >>>>>>>>>>>> form
> > > >>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>> "cat.db.func". When a reference is just function name,
> it
> > > >>>> can
> > > >>>>>>> mean
> > > >>>>>>>>>>>>>> either a
> > > >>>>>>>>>>>>>>> built-in function or a function defined in the current
> > > >>>> cat/db.
> > > >>>>>>> If
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> support overriding a built-in function with a temp
> > > >>> function,
> > > >>>>>>> such
> > > >>>>>>>>>>>>>>> overriding can also cover a function in the current
> > > >>> cat/db.
> > > >>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
> > > >>>> function"
> > > >>>>>>>>>>> means a
> > > >>>>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a catalog
> > > >>>> function
> > > >>>>>>>>>>> "func"
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support this,
> > > >>> temp
> > > >>>>>>>>>>> function
> > > >>>>>>>>>>>>> has
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that the
> 2nd
> > > >>>> and
> > > >>>>>>> 3rd
> > > >>>>>>>>>>>>>> questions
> > > >>>>>>>>>>>>>>> are related. The problem with such support is the
> > > >>> ambiguity
> > > >>>>> when
> > > >>>>>>>>>>> user
> > > >>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> > > >>> FUNCTION
> > > >>>>>>> func
> > > >>>>>>>>>>>> ...".
> > > >>>>>>>>>>>>>>> Here "func" can means a global temp function, or a temp
> > > >>>>>>> function in
> > > >>>>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates an
> > > >>>>>>> inconsistency
> > > >>>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
> > > >>> current
> > > >>>>>>> cat/db.
> > > >>>>>>>>>>>> If
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> assume the latter, then there is no way for user to
> > > >>> create a
> > > >>>>>>> global
> > > >>>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>> function.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Giving a special namespace for built-in functions may
> > > >>> solve
> > > >>>>> the
> > > >>>>>>>>>>>>> ambiguity
> > > >>>>>>>>>>>>>>> problem above, but it also introduces artificial
> > > >>>>>>> catalog/database
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness of
> > > >>> the
> > > >>>>>>> code. I
> > > >>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the problem,
> > > >>> like
> > > >>>>>>> "CREATE
> > > >>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate proposals
> for
> > > >>>>> voting
> > > >>>>>>>>>>>>>> purposes:
> > > >>>>>>>>>>>>>>> 1. Support only global, temporary functions without
> > > >>>> namespace.
> > > >>>>>>> Such
> > > >>>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>> functions overrides built-in functions and catalog
> > > >>> functions
> > > >>>>> in
> > > >>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
> > > >>> built-in
> > > >>>>>>>>>>> functions
> > > >>>>>>>>>>>>> ->
> > > >>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
> > > >>> functions
> > > >>>> has
> > > >>>>>>> no
> > > >>>>>>>>>>>>>>> ambiguity!)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 2. In addition to #1, support creating and referencing
> > > >>>>> temporary
> > > >>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
> > > >>> for
> > > >>>>>>> global
> > > >>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>> functions. The resolution order is: global temp
> > > >>> functions ->
> > > >>>>>>>>>>> built-in
> > > >>>>>>>>>>>>>>> functions -> temp functions in current cat/db ->
> catalog
> > > >>>>>>> function.
> > > >>>>>>>>>>>>>>> (Resolution for partially or fully qualified function
> > > >>>>> reference
> > > >>>>>>> is:
> > > >>>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>> functions -> persistent functions.)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 3. In addition to #1, support creating and referencing
> > > >>>>> temporary
> > > >>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>> associated with a cat/db with a special namespace for
> > > >>>> built-in
> > > >>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>> and global temp functions. The resolution is the same
> as
> > > >>> #2,
> > > >>>>>>> except
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> the special namespace might be prefixed to a reference
> > > >>> to a
> > > >>>>>>>>>>> built-in
> > > >>>>>>>>>>>>>>> function or global temp function. (In absence of the
> > > >>> special
> > > >>>>>>>>>>>> namespace,
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> resolution order is the same as in #2.)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> My personal preference is #1, given the unknown use
> case
> > > >>> and
> > > >>>>>>>>>>>> introduced
> > > >>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an acceptable
> > > >>>>>>> alternative.
> > > >>>>>>>>>>>>> Thus,
> > > >>>>>>>>>>>>>>> my votes are:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> +1 for #1
> > > >>>>>>>>>>>>>>> +0 for #2
> > > >>>>>>>>>>>>>>> -1 for #3
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
> > > >>> please!),
> > > >>>> or
> > > >>>>>>> let
> > > >>>>>>>>>>> me
> > > >>>>>>>>>>>>> know
> > > >>>>>>>>>>>>>>> if you have more questions or other candidates.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>> Xuefu
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > > >>>>>>>>>>>> aljoscha@apache.org>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64 are
> very
> > > >>>>>>>>>>> connected.
> > > >>>>>>>>>>>>> To
> > > >>>>>>>>>>>>>>>> resolve the differences, think we have to think about
> > > >>> the
> > > >>>>> basic
> > > >>>>>>>>>>>>>>> principles
> > > >>>>>>>>>>>>>>>> and find consensus there. The basic questions I see
> are:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> - Do we want to support overriding builtin functions?
> > > >>>>>>>>>>>>>>>> - Do we want to support overriding catalog functions?
> > > >>>>>>>>>>>>>>>> - And then later: should temporary functions be tied
> to
> > > >>> a
> > > >>>>>>>>>>>>>>>> catalog/database?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I don’t have much to say about these, except that we
> > > >>> should
> > > >>>>>>>>>>>> somewhat
> > > >>>>>>>>>>>>>>> stick
> > > >>>>>>>>>>>>>>>> to what the industry does. But I also understand that
> > > >>> the
> > > >>>>>>>>>>> industry
> > > >>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>> already very divided on this.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>> Aljoscha
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <imjark@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the remaining
> > > >>>> topics.
> > > >>>>>>> We
> > > >>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if we
> > > >>>> resume
> > > >>>>>>> the
> > > >>>>>>>>>>>>> topic
> > > >>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>> time later.
> > > >>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> > > >>>>>>>>>>>> “cat.db.fun”
> > > >>>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>> to override a catalog function.
> > > >>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it
> introduces a
> > > >>>>>>>>>>>> nonexistent
> > > >>>>>>>>>>>>>> cat
> > > >>>>>>>>>>>>>>>> & db? And we still need to do special treatment for
> the
> > > >>>>>>> dedicated
> > > >>>>>>>>>>>>>>>> system.system cat & db?
> > > >>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>> Jark
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <twalthr@apache.org
> >
> > > >>> 写道:
> > > >>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many things
> > > >>>>>>>>>>>>> incrementally.
> > > >>>>>>>>>>>>>>>> Users should be able to override all catalog objects
> > > >>>>>>> consistently
> > > >>>>>>>>>>>>>>> according
> > > >>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> > > >>> module).
> > > >>>>> If
> > > >>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>> are treated completely different, we need more code
> and
> > > >>>>> special
> > > >>>>>>>>>>>>> cases.
> > > >>>>>>>>>>>>>>> From
> > > >>>>>>>>>>>>>>>> an implementation perspective, this topic only affects
> > > >>> the
> > > >>>>>>> lookup
> > > >>>>>>>>>>>>> logic
> > > >>>>>>>>>>>>>>>> which is rather low implementation effort which is
> why I
> > > >>>>> would
> > > >>>>>>>>>>> like
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have a
> > > >>> slight
> > > >>>>>>>>>>> consenus
> > > >>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>> overriding built-in functions; we should also strive
> for
> > > >>>>>>> reaching
> > > >>>>>>>>>>>>>>> consensus
> > > >>>>>>>>>>>>>>>> on the remaining topics.
> > > >>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering
> > > >>>> catalog
> > > >>>>>>>>>>>> objects
> > > >>>>>>>>>>>>>>>> consistent and the overriding of built-in functions
> more
> > > >>>>>>>>>>> explicit.
> > > >>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>> Timo
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > > >>>>>>>>>>>>>>>>>>> hi, everyone
> > > >>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it supports
> > > >>>>> functions
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing the
> > > >>>>>>>>>>> duplication
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> functions.
> > > >>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
> > > >>> implements
> > > >>>>>>>>>>> create
> > > >>>>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata and
> > > >>> schema
> > > >>>>> into
> > > >>>>>>>>>>>>> mysql,
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes sql-client
> to
> > > >>>>>>> support
> > > >>>>>>>>>>>>>> custom
> > > >>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function is
> > > >>>>> currently
> > > >>>>>>>>>>>>>> global,
> > > >>>>>>>>>>>>>>>> and is
> > > >>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in the
> > > >>>>>>>>>>> development
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the
> community,
> > > >>> but
> > > >>>>>>>>>>> found
> > > >>>>>>>>>>>> it
> > > >>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>> difficult to join.
> > > >>>>>>>>>>>>>>>>>>> thank you.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二
> 上午11:19写道:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> It seems to me that there is a general consensus
> on
> > > >>>>> having
> > > >>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
> > > >>>> functions.
> > > >>>>>>>>>>> (As
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>> side
> > > >>>>>>>>>>>>>>>> note
> > > >>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
> > > >>> functions
> > > >>>> are
> > > >>>>>>>>>>> all
> > > >>>>>>>>>>>>>>>> temporary and
> > > >>>>>>>>>>>>>>>>>>>> having no namespaces.)
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having
> > > >>>>> namespaced
> > > >>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a specific
> > > >>>>> cat/db.
> > > >>>>>>>>>>>>>> However,
> > > >>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can be
> > > >>> added
> > > >>>>>>>>>>>>>>> incrementally.
> > > >>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced temp
> > > >>>>> functions
> > > >>>>>>>>>>> now
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> leave
> > > >>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
> > > >>> releases as
> > > >>>>> the
> > > >>>>>>>>>>>>>>>> requirement
> > > >>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps shorten
> > > >>> the
> > > >>>>>>>>>>> debate
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> allow us
> > > >>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
> > > >>> host
> > > >>>>> the
> > > >>>>>>>>>>>>>>> temporary
> > > >>>>>>>>>>>>>>>> temp
> > > >>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
> > > >>> concern
> > > >>>> is
> > > >>>>>>> the
> > > >>>>>>>>>>>>>> special
> > > >>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
> > > >>> clean, as
> > > >>>>>>>>>>>> evident
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>> treating
> > > >>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>> Xuefiu
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > >>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
> > > >>> suggestion.
> > > >>>>> How
> > > >>>>>>>>>>>> about
> > > >>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>> have a
> > > >>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for
> built-in
> > > >>>>>>>>>>> objects?
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>> catalog
> > > >>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
> > > >>> suggesting.
> > > >>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
> > > >>> functions, if
> > > >>>>>>> they
> > > >>>>>>>>>>>>> fully
> > > >>>>>>>>>>>>>>>> qualify
> > > >>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by
> default
> > > >>> the
> > > >>>>>>>>>>> common
> > > >>>>>>>>>>>>>> logic
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat & dB
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > >>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > >>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary
> function
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be
> writable
> > > >>>> for
> > > >>>>>>>>>>>>> permanent
> > > >>>>>>>>>>>>>>>>>>>> objects.
> > > >>>>>>>>>>>>>>>>>>>>> WDYT?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
> > > >>>> solutions.
> > > >>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>> Dawid
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > > >>>>>>>>>>> twalthr@apache.org
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of overriding
> > > >>>>> certain
> > > >>>>>>>>>>>>>> built-in
> > > >>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many
> > > >>> people
> > > >>>>>>>>>>> agree.
> > > >>>>>>>>>>>>>>>> However, it
> > > >>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding
> catalog
> > > >>>>>>>>>>> functions
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a
> query
> > > >>>> even
> > > >>>>>>>>>>>> though
> > > >>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available
> currently
> > > >>> or
> > > >>>>>>>>>>> should
> > > >>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and
> never
> > > >>>>>>>>>>>> consideres
> > > >>>>>>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with other
> DDL
> > > >>> but
> > > >>>>>>>>>>>>> acceptable
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>> functions I guess.
> > > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> > > >>>> objects
> > > >>>>>>>>>>>>> (tables,
> > > >>>>>>>>>>>>>>>> views)
> > > >>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the near
> > > >>>>> future.
> > > >>>>>>>>>>>> Take
> > > >>>>>>>>>>>>>>>>>>>>>>
> https://issues.apache.org/jira/browse/FLINK-13900
> > > >>> as
> > > >>>>> an
> > > >>>>>>>>>>>>>> example.
> > > >>>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>>> Timo
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> > > >>>> favorable
> > > >>>>>>>>>>>> thus I
> > > >>>>>>>>>>>>>>>> didn't
> > > >>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
> > > >>> discussion
> > > >>>> is
> > > >>>>>>>>>>>> mainly
> > > >>>>>>>>>>>>>>>> between
> > > >>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> > > >>>>> builtin.
> > > >>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions are
> > > >>>>>>>>>>> differently
> > > >>>>>>>>>>>>>>> treated
> > > >>>>>>>>>>>>>>>>>>>> than
> > > >>>>>>>>>>>>>>>>>>>>>>> other db objects.
> > > >>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from the
> > > >>> fact
> > > >>>>> that
> > > >>>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink don't
> > > >>> have
> > > >>>>> any
> > > >>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>>> built-in
> > > >>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>>>>>> Bowen
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>> Xuefu Zhang
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>> Xuefu Zhang
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> "In Honey We Trust!"
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>> Xuefu Zhang
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> "In Honey We Trust!"
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> Xuefu Zhang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> "In Honey We Trust!"
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > >
> > >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
I also like the 'System' keyword. I think we can assume we reached
consensus on this topic.

On Sat, 21 Sep 2019, 06:37 Xuefu Z, <us...@gmail.com> wrote:

> +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
>
> --Xuefu
>
> On Fri, Sep 20, 2019 at 3:28 PM Timo Walther <tw...@apache.org> wrote:
>
> > Hi everyone,
> >
> > sorry, for the late replay. I give also +1 for option #2. Thus, I guess
> > we have a clear winner.
> >
> > I would also like to find a better keyword/syntax for this statement.
> > Esp. the BUILTIN keyword can confuse people, because it could be written
> > as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> > introduce a new reserved keyword in the parser which affects also
> > non-DDL queries. How about:
> >
> > CREATE TEMPORARY SYSTEM FUNCTION xxx
> >
> > The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are
> > discussing to prefix some of the function with a SYSTEM_ prefix like
> > SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> > On 20.09.19 05:45, Bowen Li wrote:
> > > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
> > > temporary built-in function in the same session? With the former one,
> > they
> > > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the
> latter
> > > one, I'm not sure how users can "restore" the original builtin function
> > > easily from an "altered" function without introducing further
> nonstandard
> > > SQL syntax.
> > >
> > > Also please pardon me as I realized using net may not be a good idea...
> > I'm
> > > trying to fit this vote into cases listed in Flink Bylaw [1].
> > >
> > > >From the following result, the majority seems to be #2 too as it has
> the
> > > most approval so far and doesn't have strong "-1".
> > >
> > > #1:3 (+1), 1 (0), 4(-1)
> > > #2:4(0), 3 (+1), 1(+0.5)
> > >         * Dawid -1/0 depending on keyword
> > > #3:2(+1), 3(-1), 3(0)
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > >
> > > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Thanks everyone for your votes. I summarized the result as following:
> > >>
> > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > >> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
> > >>          Dawid -1/0 depending on keyword
> > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > >>
> > >> Given the result, I'd like to change my vote for #2 from 0 to +1, to
> > make
> > >> it a stronger case with net +3.5. So the votes so far are:
> > >>
> > >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> > >> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
> > >>          Dawid -1/0 depending on keyword
> > >> #3:2(+1), 3(-1), 3(0)       - net: -1
> > >>
> > >> What do you think? Do you think we can conclude with this result? Or
> > would
> > >> you like to take it as a formal FLIP vote with 3 days voting period?
> > >>
> > >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> BUILTIN
> > >> FUNCTION xxx TEMPORARILY" because
> > >> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
> > >> TEMPORARY FUNCTION"
> > >> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a
> built-in
> > >> function but it actually doesn't, the logic only creates a temp
> function
> > >> with higher priority than that built-in function in ambiguous
> resolution
> > >> order; and it would behave inconsistently with "ALTER FUNCTION".
> > >>
> > >>
> > >>
> > >> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >>
> > >>> I agree, it's very similar from the implementation point of view and
> > the
> > >>> implications.
> > >>>
> > >>> IMO, the difference is mostly on the mental model for the user.
> > >>> Instead of having a special class of temporary functions that have
> > >>> precedence over builtin functions it suggests to temporarily change
> > >>> built-in functions.
> > >>>
> > >>> Fabian
> > >>>
> > >>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <
> > ykt836@gmail.com
> > >>>> :
> > >>>> Hi Fabian,
> > >>>>
> > >>>> I think it's almost the same with #2 with different keyword:
> > >>>>
> > >>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
> > >>>>
> > >>>> Best,
> > >>>> Kurt
> > >>>>
> > >>>>
> > >>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com>
> > >>> wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> I thought about it a bit more and think that there is some good
> value
> > >>> in
> > >>>> my
> > >>>>> last proposal.
> > >>>>>
> > >>>>> A lot of complexity comes from the fact that we want to allow
> > >>> overriding
> > >>>>> built-in functions which are differently addressed as other
> functions
> > >>>> (and
> > >>>>> db objects).
> > >>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
> > >>> thing
> > >>>> as
> > >>>>> "CREATE FUNCTION" and treat both functions exactly the same except
> > >>> that:
> > >>>>> 1) temp functions disappear at the end of the session
> > >>>>> 2) temp function are resolved before other functions
> > >>>>>
> > >>>>> This would be Dawid's proposal from the beginning of this thread
> (in
> > >>> case
> > >>>>> you still remember... ;-) )
> > >>>>>
> > >>>>> Temporarily overriding built-in functions would be supported with
> an
> > >>>>> explicit command like
> > >>>>>
> > >>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> > >>>>>
> > >>>>> This would also address the concerns about accidentally changing
> the
> > >>>>> semantics of built-in functions.
> > >>>>> IMO, it can't get much more explicit than the above command.
> > >>>>>
> > >>>>> Sorry for bringing up a new option in the middle of the discussion,
> > >>> but
> > >>>> as
> > >>>>> I said, I think it has a bunch of benefits and I don't see major
> > >>>> drawbacks
> > >>>>> (maybe you do?).
> > >>>>>
> > >>>>> What do you think?
> > >>>>>
> > >>>>> Fabian
> > >>>>>
> > >>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > >>>>> fhueske@gmail.com
> > >>>>>> :
> > >>>>>> Hi everyone,
> > >>>>>>
> > >>>>>> I thought again about option #1 and something that I don't like is
> > >>> that
> > >>>>>> the resolved address of xyz is different in "CREATE FUNCTION xyz"
> > >>> and
> > >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> > >>>>>> IMO, adding the keyword "TEMPORARY" should only change the
> > >>> lifecycle of
> > >>>>>> the function, but not where it is located. This implicitly changed
> > >>>>> location
> > >>>>>> might be confusing for users.
> > >>>>>> After all, a temp function should behave pretty much like any
> other
> > >>>>>> function, except for the fact that it disappears when the session
> is
> > >>>>> closed.
> > >>>>>> Approach #2 with the additional keyword would make that pretty
> > >>> clear,
> > >>>>> IMO.
> > >>>>>> However, I neither like GLOBAL (for reasons mentioned by Dawid) or
> > >>>>> BUILDIN
> > >>>>>> (we are not adding a built-in function).
> > >>>>>> So I'd be OK with #2 if we find a good keyword. In fact, approach
> #2
> > >>>>> could
> > >>>>>> also be an alias for approach #3 to avoid explicit specification
> of
> > >>> the
> > >>>>>> system catalog/db.
> > >>>>>>
> > >>>>>> Approach #3 would be consistent with other db objects and the
> > >>> "CREATE
> > >>>>>> FUNCTION" statement.
> > >>>>>> Adding system catalog/db seems rather complex, but then again how
> > >>> often
> > >>>>> do
> > >>>>>> we expect users to override built-in functions? If this becomes a
> > >>> major
> > >>>>>> issue, we can still add option #2 as an alias.
> > >>>>>>
> > >>>>>> Not sure what's the best approach from an internal point of view,
> > >>> but I
> > >>>>>> certainly think that consistent behavior is important.
> > >>>>>> Hence my votes are:
> > >>>>>>
> > >>>>>> -1 for #1
> > >>>>>> 0 for #2
> > >>>>>> 0 for #3
> > >>>>>>
> > >>>>>> Btw. Did we consider a completely separate command for overriding
> > >>>>> built-in
> > >>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> > >>>>>>
> > >>>>>> Cheers, Fabian
> > >>>>>>
> > >>>>>>
> > >>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > >>>>>> <lz...@aliyun.com.invalid>:
> > >>>>>>
> > >>>>>>> I know Hive and Spark can shadow built-in functions by temporary
> > >>>>> function.
> > >>>>>>> Mysql, Oracle, Sql server can not shadow.
> > >>>>>>> User can use full names to access functions instead of shadowing.
> > >>>>>>>
> > >>>>>>> So I think it is a completely new thing, and the direct way to
> deal
> > >>>> with
> > >>>>>>> new things is to add new grammar. So,
> > >>>>>>> +1 for #2, +0 for #3, -1 for #1
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jingsong Lee
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> ------------------------------------------------------------------
> > >>>>>>> From:Kurt Young <yk...@gmail.com>
> > >>>>>>> Send Time:2019年9月19日(星期四) 16:43
> > >>>>>>> To:dev <de...@flink.apache.org>
> > >>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > >>>>>>>
> > >>>>>>> And let me make my vote complete:
> > >>>>>>>
> > >>>>>>> -1 for #1
> > >>>>>>> +1 for #2 with different keyword
> > >>>>>>> -0 for #3
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Kurt
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
> > >>> wrote:
> > >>>>>>>> Looks like I'm the only person who is willing to +1 to #2 for
> now
> > >>>> :-)
> > >>>>>>>> But I would suggest to change the keyword from GLOBAL to
> > >>>>>>>> something like BUILTIN.
> > >>>>>>>>
> > >>>>>>>> I think #2 and #3 are almost the same proposal, just with
> > >>> different
> > >>>>>>>> format to indicate whether it want to override built-in
> > >>> functions.
> > >>>>>>>> My biggest reason to choose it is I want this behavior be
> > >>> consistent
> > >>>>>>>> with temporal tables. I will give some examples to show the
> > >>> behavior
> > >>>>>>>> and also make sure I'm not misunderstanding anything here.
> > >>>>>>>>
> > >>>>>>>> For most DBs, when user create a temporary table with:
> > >>>>>>>>
> > >>>>>>>> CREATE TEMPORARY TABLE t1
> > >>>>>>>>
> > >>>>>>>> It's actually equivalent with:
> > >>>>>>>>
> > >>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
> > >>>>>>>>
> > >>>>>>>> If user change current database, they will not be able to access
> > >>> t1
> > >>>>>>> without
> > >>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
> > >>> database
> > >>>>> when
> > >>>>>>>> this temporary table is created).
> > >>>>>>>>
> > >>>>>>>> Only #2 and #3 followed this behavior and I would vote for this
> > >>>> since
> > >>>>>>> this
> > >>>>>>>> makes such behavior consistent through temporal tables and
> > >>>> functions.
> > >>>>>>>> Why I'm not voting for #3 is a special catalog and database just
> > >>>> looks
> > >>>>>>> very
> > >>>>>>>> hacky to me. It gave a imply that our built-in functions saved
> > >>> at a
> > >>>>>>>> special
> > >>>>>>>> catalog and database, which is actually not. Introducing a
> > >>> dedicated
> > >>>>>>>> keyword
> > >>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > >>>>>>>> straightforward. One can argue that we should avoid introducing
> > >>> new
> > >>>>>>>> keyword,
> > >>>>>>>> but it's also very rare that a system can overwrite built-in
> > >>>>> functions.
> > >>>>>>>> Since we
> > >>>>>>>> decided to support this, introduce a new keyword is not a big
> > >>> deal
> > >>>>> IMO.
> > >>>>>>>> Best,
> > >>>>>>>> Kurt
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> > >>> piotr@ververica.com
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> It is a quite long discussion to follow and I hope I didn’t
> > >>>>>>> misunderstand
> > >>>>>>>>> anything. From the proposals presented by Xuefu I would vote:
> > >>>>>>>>>
> > >>>>>>>>> -1 for #1 and #2
> > >>>>>>>>> +1 for #3
> > >>>>>>>>>
> > >>>>>>>>> Besides #3 being IMO more general and more consistent, having
> > >>>>> qualified
> > >>>>>>>>> names (#3) would help/make easier for someone to use cross
> > >>>>>>>>> databases/catalogs queries (joining multiple data
> sets/streams).
> > >>>> For
> > >>>>>>>>> example with some functions to manipulate/clean up/convert the
> > >>>> stored
> > >>>>>>> data
> > >>>>>>>>> in different catalogs registered in the respective catalogs.
> > >>>>>>>>>
> > >>>>>>>>> Piotrek
> > >>>>>>>>>
> > >>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> I agree with Xuefu that inconsistent handling with all the
> > >>> other
> > >>>>>>>>> objects is
> > >>>>>>>>>> not a big problem.
> > >>>>>>>>>>
> > >>>>>>>>>> Regarding to option#3, the special "system.system" namespace
> > >>> may
> > >>>>>>> confuse
> > >>>>>>>>>> users.
> > >>>>>>>>>> Users need to know the set of built-in function names to know
> > >>>> when
> > >>>>> to
> > >>>>>>>>> use
> > >>>>>>>>>> "system.system" namespace.
> > >>>>>>>>>> What will happen if user registers a non-builtin function name
> > >>>>> under
> > >>>>>>> the
> > >>>>>>>>>> "system.system" namespace?
> > >>>>>>>>>> Besides, I think it doesn't solve the "explode" problem I
> > >>>> mentioned
> > >>>>>>> at
> > >>>>>>>>> the
> > >>>>>>>>>> beginning of this thread.
> > >>>>>>>>>>
> > >>>>>>>>>> So here is my vote:
> > >>>>>>>>>>
> > >>>>>>>>>> +1 for #1
> > >>>>>>>>>> 0 for #2
> > >>>>>>>>>> -1 for #3
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jark
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
> > >>> wrote:
> > >>>>>>>>>>> @Dawid, Re: we also don't need additional referencing the
> > >>>>>>>>> specialcatalog
> > >>>>>>>>>>> anywhere.
> > >>>>>>>>>>>
> > >>>>>>>>>>> True. But once we allow such reference, then user can do so
> > >>> in
> > >>>> any
> > >>>>>>>>> possible
> > >>>>>>>>>>> place where a function name is expected, for which we have to
> > >>>>>>> handle.
> > >>>>>>>>>>> That's a big difference, I think.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Xuefu
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> @Bowen I am not suggesting introducing additional catalog. I
> > >>>>> think
> > >>>>>>> we
> > >>>>>>>>>>> need
> > >>>>>>>>>>>> to get rid of the current built-in catalog.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
> > >>> referencing
> > >>>> the
> > >>>>>>>>> special
> > >>>>>>>>>>>> catalog anywhere else besides in the CREATE statement. The
> > >>>>>>> resolution
> > >>>>>>>>>>>> behaviour is exactly the same in both options.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
> > >>> wrote:
> > >>>>>>>>>>>>> Hi Dawid,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
> > >>>> approach.
> > >>>>> It
> > >>>>>>>>> can
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>> changed to something else for better.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The difference between this and the #3 approach is that we
> > >>>> only
> > >>>>>>> need
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> keyword for this create DDL. For other places (such as
> > >>>> function
> > >>>>>>>>>>>>> referencing), no keyword or special namespace is needed.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Xuefu
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > >>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>> I think it makes sense to start voting at this point.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Option 1: Only 1-part identifiers
> > >>>>>>>>>>>>>> PROS:
> > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > >>>>>>>>>>>>>> CONS:
> > >>>>>>>>>>>>>> - incosistent with all the other objects, both permanent &
> > >>>>>>> temporary
> > >>>>>>>>>>>>>> - does not allow shadowing catalog functions
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Option 2: Special keyword for built-in function
> > >>>>>>>>>>>>>> I think this is quite similar to the special catalog/db.
> > >>> The
> > >>>>>>> thing I
> > >>>>>>>>>>> am
> > >>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL keyword.
> > >>> This
> > >>>>>>>>> keyword
> > >>>>>>>>>>>>> has a
> > >>>>>>>>>>>>>> meaning in rdbms systems and means a function that is
> > >>> present
> > >>>>>>> for a
> > >>>>>>>>>>>>>> lifetime of a session in which it was created, but
> > >>> available
> > >>>> in
> > >>>>>>> all
> > >>>>>>>>>>>> other
> > >>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
> > >>> keyword
> > >>>> in
> > >>>>> a
> > >>>>>>>>>>>>> different
> > >>>>>>>>>>>>>> context.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Option 3: Special catalog/db
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> PROS:
> > >>>>>>>>>>>>>> - allows shadowing built-in functions
> > >>>>>>>>>>>>>> - allows shadowing catalog functions
> > >>>>>>>>>>>>>> - consistent with other objects
> > >>>>>>>>>>>>>> CONS:
> > >>>>>>>>>>>>>> - we introduce a special namespace for built-in functions
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I don't see a problem with introducing the special
> > >>> namespace.
> > >>>>> In
> > >>>>>>> the
> > >>>>>>>>>>>> end
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>> is very similar to the keyword approach. In this case the
> > >>>>>>> catalog/db
> > >>>>>>>>>>>>>> combination would be the "keyword"
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Therefore my votes:
> > >>>>>>>>>>>>>> Option 1: -0
> > >>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up with
> > >>> a
> > >>>>>>> better
> > >>>>>>>>>>>>> keyword)
> > >>>>>>>>>>>>>> Option 3: +1
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>> Dawid
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
> > >>>> wrote:
> > >>>>>>>>>>>>>>> Hi Aljoscha,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks for the summary and these are great questions to
> > >>> be
> > >>>>>>>>>>> answered.
> > >>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>> answer to your first question is clear: there is a
> > >>> general
> > >>>>>>>>>>> agreement
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> override built-in functions with temp functions.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> However, your second and third questions are sort of
> > >>>> related,
> > >>>>>>> as a
> > >>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>> reference can be either just function name (like "func")
> > >>> or
> > >>>> in
> > >>>>>>> the
> > >>>>>>>>>>>> form
> > >>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>> "cat.db.func". When a reference is just function name, it
> > >>>> can
> > >>>>>>> mean
> > >>>>>>>>>>>>>> either a
> > >>>>>>>>>>>>>>> built-in function or a function defined in the current
> > >>>> cat/db.
> > >>>>>>> If
> > >>>>>>>>>>> we
> > >>>>>>>>>>>>>>> support overriding a built-in function with a temp
> > >>> function,
> > >>>>>>> such
> > >>>>>>>>>>>>>>> overriding can also cover a function in the current
> > >>> cat/db.
> > >>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
> > >>>> function"
> > >>>>>>>>>>> means a
> > >>>>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a catalog
> > >>>> function
> > >>>>>>>>>>> "func"
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support this,
> > >>> temp
> > >>>>>>>>>>> function
> > >>>>>>>>>>>>> has
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that the 2nd
> > >>>> and
> > >>>>>>> 3rd
> > >>>>>>>>>>>>>> questions
> > >>>>>>>>>>>>>>> are related. The problem with such support is the
> > >>> ambiguity
> > >>>>> when
> > >>>>>>>>>>> user
> > >>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> > >>> FUNCTION
> > >>>>>>> func
> > >>>>>>>>>>>> ...".
> > >>>>>>>>>>>>>>> Here "func" can means a global temp function, or a temp
> > >>>>>>> function in
> > >>>>>>>>>>>>>> current
> > >>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates an
> > >>>>>>> inconsistency
> > >>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
> > >>> current
> > >>>>>>> cat/db.
> > >>>>>>>>>>>> If
> > >>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> assume the latter, then there is no way for user to
> > >>> create a
> > >>>>>>> global
> > >>>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>> function.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Giving a special namespace for built-in functions may
> > >>> solve
> > >>>>> the
> > >>>>>>>>>>>>> ambiguity
> > >>>>>>>>>>>>>>> problem above, but it also introduces artificial
> > >>>>>>> catalog/database
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness of
> > >>> the
> > >>>>>>> code. I
> > >>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the problem,
> > >>> like
> > >>>>>>> "CREATE
> > >>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate proposals for
> > >>>>> voting
> > >>>>>>>>>>>>>> purposes:
> > >>>>>>>>>>>>>>> 1. Support only global, temporary functions without
> > >>>> namespace.
> > >>>>>>> Such
> > >>>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>> functions overrides built-in functions and catalog
> > >>> functions
> > >>>>> in
> > >>>>>>>>>>>> current
> > >>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
> > >>> built-in
> > >>>>>>>>>>> functions
> > >>>>>>>>>>>>> ->
> > >>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
> > >>> functions
> > >>>> has
> > >>>>>>> no
> > >>>>>>>>>>>>>>> ambiguity!)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 2. In addition to #1, support creating and referencing
> > >>>>> temporary
> > >>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
> > >>> for
> > >>>>>>> global
> > >>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>> functions. The resolution order is: global temp
> > >>> functions ->
> > >>>>>>>>>>> built-in
> > >>>>>>>>>>>>>>> functions -> temp functions in current cat/db -> catalog
> > >>>>>>> function.
> > >>>>>>>>>>>>>>> (Resolution for partially or fully qualified function
> > >>>>> reference
> > >>>>>>> is:
> > >>>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>> functions -> persistent functions.)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 3. In addition to #1, support creating and referencing
> > >>>>> temporary
> > >>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>> associated with a cat/db with a special namespace for
> > >>>> built-in
> > >>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>> and global temp functions. The resolution is the same as
> > >>> #2,
> > >>>>>>> except
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> the special namespace might be prefixed to a reference
> > >>> to a
> > >>>>>>>>>>> built-in
> > >>>>>>>>>>>>>>> function or global temp function. (In absence of the
> > >>> special
> > >>>>>>>>>>>> namespace,
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> resolution order is the same as in #2.)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> My personal preference is #1, given the unknown use case
> > >>> and
> > >>>>>>>>>>>> introduced
> > >>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an acceptable
> > >>>>>>> alternative.
> > >>>>>>>>>>>>> Thus,
> > >>>>>>>>>>>>>>> my votes are:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> +1 for #1
> > >>>>>>>>>>>>>>> +0 for #2
> > >>>>>>>>>>>>>>> -1 for #3
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
> > >>> please!),
> > >>>> or
> > >>>>>>> let
> > >>>>>>>>>>> me
> > >>>>>>>>>>>>> know
> > >>>>>>>>>>>>>>> if you have more questions or other candidates.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>> Xuefu
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > >>>>>>>>>>>> aljoscha@apache.org>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64 are very
> > >>>>>>>>>>> connected.
> > >>>>>>>>>>>>> To
> > >>>>>>>>>>>>>>>> resolve the differences, think we have to think about
> > >>> the
> > >>>>> basic
> > >>>>>>>>>>>>>>> principles
> > >>>>>>>>>>>>>>>> and find consensus there. The basic questions I see are:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - Do we want to support overriding builtin functions?
> > >>>>>>>>>>>>>>>> - Do we want to support overriding catalog functions?
> > >>>>>>>>>>>>>>>> - And then later: should temporary functions be tied to
> > >>> a
> > >>>>>>>>>>>>>>>> catalog/database?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I don’t have much to say about these, except that we
> > >>> should
> > >>>>>>>>>>>> somewhat
> > >>>>>>>>>>>>>>> stick
> > >>>>>>>>>>>>>>>> to what the industry does. But I also understand that
> > >>> the
> > >>>>>>>>>>> industry
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>> already very divided on this.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>> Aljoscha
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the remaining
> > >>>> topics.
> > >>>>>>> We
> > >>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if we
> > >>>> resume
> > >>>>>>> the
> > >>>>>>>>>>>>> topic
> > >>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>> time later.
> > >>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> > >>>>>>>>>>>> “cat.db.fun”
> > >>>>>>>>>>>>>> way
> > >>>>>>>>>>>>>>>> to override a catalog function.
> > >>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> > >>>>>>>>>>>> nonexistent
> > >>>>>>>>>>>>>> cat
> > >>>>>>>>>>>>>>>> & db? And we still need to do special treatment for the
> > >>>>>>> dedicated
> > >>>>>>>>>>>>>>>> system.system cat & db?
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Jark
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org>
> > >>> 写道:
> > >>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many things
> > >>>>>>>>>>>>> incrementally.
> > >>>>>>>>>>>>>>>> Users should be able to override all catalog objects
> > >>>>>>> consistently
> > >>>>>>>>>>>>>>> according
> > >>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> > >>> module).
> > >>>>> If
> > >>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>> are treated completely different, we need more code and
> > >>>>> special
> > >>>>>>>>>>>>> cases.
> > >>>>>>>>>>>>>>> From
> > >>>>>>>>>>>>>>>> an implementation perspective, this topic only affects
> > >>> the
> > >>>>>>> lookup
> > >>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>> which is rather low implementation effort which is why I
> > >>>>> would
> > >>>>>>>>>>> like
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have a
> > >>> slight
> > >>>>>>>>>>> consenus
> > >>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>> overriding built-in functions; we should also strive for
> > >>>>>>> reaching
> > >>>>>>>>>>>>>>> consensus
> > >>>>>>>>>>>>>>>> on the remaining topics.
> > >>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering
> > >>>> catalog
> > >>>>>>>>>>>> objects
> > >>>>>>>>>>>>>>>> consistent and the overriding of built-in functions more
> > >>>>>>>>>>> explicit.
> > >>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > >>>>>>>>>>>>>>>>>>> hi, everyone
> > >>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it supports
> > >>>>> functions
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing the
> > >>>>>>>>>>> duplication
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> functions.
> > >>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
> > >>> implements
> > >>>>>>>>>>> create
> > >>>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata and
> > >>> schema
> > >>>>> into
> > >>>>>>>>>>>>> mysql,
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes sql-client to
> > >>>>>>> support
> > >>>>>>>>>>>>>> custom
> > >>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function is
> > >>>>> currently
> > >>>>>>>>>>>>>> global,
> > >>>>>>>>>>>>>>>> and is
> > >>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in the
> > >>>>>>>>>>> development
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the community,
> > >>> but
> > >>>>>>>>>>> found
> > >>>>>>>>>>>> it
> > >>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>> difficult to join.
> > >>>>>>>>>>>>>>>>>>> thank you.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> It seems to me that there is a general consensus on
> > >>>>> having
> > >>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
> > >>>> functions.
> > >>>>>>>>>>> (As
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>> side
> > >>>>>>>>>>>>>>>> note
> > >>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
> > >>> functions
> > >>>> are
> > >>>>>>>>>>> all
> > >>>>>>>>>>>>>>>> temporary and
> > >>>>>>>>>>>>>>>>>>>> having no namespaces.)
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having
> > >>>>> namespaced
> > >>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a specific
> > >>>>> cat/db.
> > >>>>>>>>>>>>>> However,
> > >>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can be
> > >>> added
> > >>>>>>>>>>>>>>> incrementally.
> > >>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced temp
> > >>>>> functions
> > >>>>>>>>>>> now
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> leave
> > >>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
> > >>> releases as
> > >>>>> the
> > >>>>>>>>>>>>>>>> requirement
> > >>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps shorten
> > >>> the
> > >>>>>>>>>>> debate
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> allow us
> > >>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
> > >>> host
> > >>>>> the
> > >>>>>>>>>>>>>>> temporary
> > >>>>>>>>>>>>>>>> temp
> > >>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
> > >>> concern
> > >>>> is
> > >>>>>>> the
> > >>>>>>>>>>>>>> special
> > >>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
> > >>> clean, as
> > >>>>>>>>>>>> evident
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>> treating
> > >>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>> Xuefiu
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > >>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
> > >>> suggestion.
> > >>>>> How
> > >>>>>>>>>>>> about
> > >>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>> have a
> > >>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for built-in
> > >>>>>>>>>>> objects?
> > >>>>>>>>>>>>> This
> > >>>>>>>>>>>>>>>> catalog
> > >>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
> > >>> suggesting.
> > >>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
> > >>> functions, if
> > >>>>>>> they
> > >>>>>>>>>>>>> fully
> > >>>>>>>>>>>>>>>> qualify
> > >>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by default
> > >>> the
> > >>>>>>>>>>> common
> > >>>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > >>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat & dB
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > >>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > >>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary function
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be writable
> > >>>> for
> > >>>>>>>>>>>>> permanent
> > >>>>>>>>>>>>>>>>>>>> objects.
> > >>>>>>>>>>>>>>>>>>>>> WDYT?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
> > >>>> solutions.
> > >>>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>>> Dawid
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > >>>>>>>>>>> twalthr@apache.org
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of overriding
> > >>>>> certain
> > >>>>>>>>>>>>>> built-in
> > >>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many
> > >>> people
> > >>>>>>>>>>> agree.
> > >>>>>>>>>>>>>>>> However, it
> > >>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding catalog
> > >>>>>>>>>>> functions
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a query
> > >>>> even
> > >>>>>>>>>>>> though
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available currently
> > >>> or
> > >>>>>>>>>>> should
> > >>>>>>>>>>>>> not
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and never
> > >>>>>>>>>>>> consideres
> > >>>>>>>>>>>>>>>> current
> > >>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with other DDL
> > >>> but
> > >>>>>>>>>>>>> acceptable
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>> functions I guess.
> > >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> > >>>> objects
> > >>>>>>>>>>>>> (tables,
> > >>>>>>>>>>>>>>>> views)
> > >>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the near
> > >>>>> future.
> > >>>>>>>>>>>> Take
> > >>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900
> > >>> as
> > >>>>> an
> > >>>>>>>>>>>>>> example.
> > >>>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > >>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> > >>>> favorable
> > >>>>>>>>>>>> thus I
> > >>>>>>>>>>>>>>>> didn't
> > >>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
> > >>> discussion
> > >>>> is
> > >>>>>>>>>>>> mainly
> > >>>>>>>>>>>>>>>> between
> > >>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> > >>>>> builtin.
> > >>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions are
> > >>>>>>>>>>> differently
> > >>>>>>>>>>>>>>> treated
> > >>>>>>>>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>>>>>>>> other db objects.
> > >>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from the
> > >>> fact
> > >>>>> that
> > >>>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink don't
> > >>> have
> > >>>>> any
> > >>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>>> built-in
> > >>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>>> Bowen
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>> Xuefu Zhang
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>> Xuefu Zhang
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> "In Honey We Trust!"
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>> Xuefu Zhang
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> "In Honey We Trust!"
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> Xuefu Zhang
> > >>>>>>>>>>>
> > >>>>>>>>>>> "In Honey We Trust!"
> > >>>>>>>>>>>
> > >>>>>>>>>
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
+1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!

--Xuefu

On Fri, Sep 20, 2019 at 3:28 PM Timo Walther <tw...@apache.org> wrote:

> Hi everyone,
>
> sorry, for the late replay. I give also +1 for option #2. Thus, I guess
> we have a clear winner.
>
> I would also like to find a better keyword/syntax for this statement.
> Esp. the BUILTIN keyword can confuse people, because it could be written
> as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> introduce a new reserved keyword in the parser which affects also
> non-DDL queries. How about:
>
> CREATE TEMPORARY SYSTEM FUNCTION xxx
>
> The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are
> discussing to prefix some of the function with a SYSTEM_ prefix like
> SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".
>
> What do you think?
>
> Thanks,
> Timo
>
>
> On 20.09.19 05:45, Bowen Li wrote:
> > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
> > temporary built-in function in the same session? With the former one,
> they
> > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
> > one, I'm not sure how users can "restore" the original builtin function
> > easily from an "altered" function without introducing further nonstandard
> > SQL syntax.
> >
> > Also please pardon me as I realized using net may not be a good idea...
> I'm
> > trying to fit this vote into cases listed in Flink Bylaw [1].
> >
> > >From the following result, the majority seems to be #2 too as it has the
> > most approval so far and doesn't have strong "-1".
> >
> > #1:3 (+1), 1 (0), 4(-1)
> > #2:4(0), 3 (+1), 1(+0.5)
> >         * Dawid -1/0 depending on keyword
> > #3:2(+1), 3(-1), 3(0)
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> >
> > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Thanks everyone for your votes. I summarized the result as following:
> >>
> >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> >> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
> >>          Dawid -1/0 depending on keyword
> >> #3:2(+1), 3(-1), 3(0)       - net: -1
> >>
> >> Given the result, I'd like to change my vote for #2 from 0 to +1, to
> make
> >> it a stronger case with net +3.5. So the votes so far are:
> >>
> >> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> >> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
> >>          Dawid -1/0 depending on keyword
> >> #3:2(+1), 3(-1), 3(0)       - net: -1
> >>
> >> What do you think? Do you think we can conclude with this result? Or
> would
> >> you like to take it as a formal FLIP vote with 3 days voting period?
> >>
> >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
> >> FUNCTION xxx TEMPORARILY" because
> >> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
> >> TEMPORARY FUNCTION"
> >> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
> >> function but it actually doesn't, the logic only creates a temp function
> >> with higher priority than that built-in function in ambiguous resolution
> >> order; and it would behave inconsistently with "ALTER FUNCTION".
> >>
> >>
> >>
> >> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com>
> wrote:
> >>
> >>> I agree, it's very similar from the implementation point of view and
> the
> >>> implications.
> >>>
> >>> IMO, the difference is mostly on the mental model for the user.
> >>> Instead of having a special class of temporary functions that have
> >>> precedence over builtin functions it suggests to temporarily change
> >>> built-in functions.
> >>>
> >>> Fabian
> >>>
> >>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <
> ykt836@gmail.com
> >>>> :
> >>>> Hi Fabian,
> >>>>
> >>>> I think it's almost the same with #2 with different keyword:
> >>>>
> >>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com>
> >>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I thought about it a bit more and think that there is some good value
> >>> in
> >>>> my
> >>>>> last proposal.
> >>>>>
> >>>>> A lot of complexity comes from the fact that we want to allow
> >>> overriding
> >>>>> built-in functions which are differently addressed as other functions
> >>>> (and
> >>>>> db objects).
> >>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
> >>> thing
> >>>> as
> >>>>> "CREATE FUNCTION" and treat both functions exactly the same except
> >>> that:
> >>>>> 1) temp functions disappear at the end of the session
> >>>>> 2) temp function are resolved before other functions
> >>>>>
> >>>>> This would be Dawid's proposal from the beginning of this thread (in
> >>> case
> >>>>> you still remember... ;-) )
> >>>>>
> >>>>> Temporarily overriding built-in functions would be supported with an
> >>>>> explicit command like
> >>>>>
> >>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> >>>>>
> >>>>> This would also address the concerns about accidentally changing the
> >>>>> semantics of built-in functions.
> >>>>> IMO, it can't get much more explicit than the above command.
> >>>>>
> >>>>> Sorry for bringing up a new option in the middle of the discussion,
> >>> but
> >>>> as
> >>>>> I said, I think it has a bunch of benefits and I don't see major
> >>>> drawbacks
> >>>>> (maybe you do?).
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Fabian
> >>>>>
> >>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> >>>>> fhueske@gmail.com
> >>>>>> :
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> I thought again about option #1 and something that I don't like is
> >>> that
> >>>>>> the resolved address of xyz is different in "CREATE FUNCTION xyz"
> >>> and
> >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> >>>>>> IMO, adding the keyword "TEMPORARY" should only change the
> >>> lifecycle of
> >>>>>> the function, but not where it is located. This implicitly changed
> >>>>> location
> >>>>>> might be confusing for users.
> >>>>>> After all, a temp function should behave pretty much like any other
> >>>>>> function, except for the fact that it disappears when the session is
> >>>>> closed.
> >>>>>> Approach #2 with the additional keyword would make that pretty
> >>> clear,
> >>>>> IMO.
> >>>>>> However, I neither like GLOBAL (for reasons mentioned by Dawid) or
> >>>>> BUILDIN
> >>>>>> (we are not adding a built-in function).
> >>>>>> So I'd be OK with #2 if we find a good keyword. In fact, approach #2
> >>>>> could
> >>>>>> also be an alias for approach #3 to avoid explicit specification of
> >>> the
> >>>>>> system catalog/db.
> >>>>>>
> >>>>>> Approach #3 would be consistent with other db objects and the
> >>> "CREATE
> >>>>>> FUNCTION" statement.
> >>>>>> Adding system catalog/db seems rather complex, but then again how
> >>> often
> >>>>> do
> >>>>>> we expect users to override built-in functions? If this becomes a
> >>> major
> >>>>>> issue, we can still add option #2 as an alias.
> >>>>>>
> >>>>>> Not sure what's the best approach from an internal point of view,
> >>> but I
> >>>>>> certainly think that consistent behavior is important.
> >>>>>> Hence my votes are:
> >>>>>>
> >>>>>> -1 for #1
> >>>>>> 0 for #2
> >>>>>> 0 for #3
> >>>>>>
> >>>>>> Btw. Did we consider a completely separate command for overriding
> >>>>> built-in
> >>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> >>>>>>
> >>>>>> Cheers, Fabian
> >>>>>>
> >>>>>>
> >>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> >>>>>> <lz...@aliyun.com.invalid>:
> >>>>>>
> >>>>>>> I know Hive and Spark can shadow built-in functions by temporary
> >>>>> function.
> >>>>>>> Mysql, Oracle, Sql server can not shadow.
> >>>>>>> User can use full names to access functions instead of shadowing.
> >>>>>>>
> >>>>>>> So I think it is a completely new thing, and the direct way to deal
> >>>> with
> >>>>>>> new things is to add new grammar. So,
> >>>>>>> +1 for #2, +0 for #3, -1 for #1
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong Lee
> >>>>>>>
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> From:Kurt Young <yk...@gmail.com>
> >>>>>>> Send Time:2019年9月19日(星期四) 16:43
> >>>>>>> To:dev <de...@flink.apache.org>
> >>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> >>>>>>>
> >>>>>>> And let me make my vote complete:
> >>>>>>>
> >>>>>>> -1 for #1
> >>>>>>> +1 for #2 with different keyword
> >>>>>>> -0 for #3
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Kurt
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
> >>> wrote:
> >>>>>>>> Looks like I'm the only person who is willing to +1 to #2 for now
> >>>> :-)
> >>>>>>>> But I would suggest to change the keyword from GLOBAL to
> >>>>>>>> something like BUILTIN.
> >>>>>>>>
> >>>>>>>> I think #2 and #3 are almost the same proposal, just with
> >>> different
> >>>>>>>> format to indicate whether it want to override built-in
> >>> functions.
> >>>>>>>> My biggest reason to choose it is I want this behavior be
> >>> consistent
> >>>>>>>> with temporal tables. I will give some examples to show the
> >>> behavior
> >>>>>>>> and also make sure I'm not misunderstanding anything here.
> >>>>>>>>
> >>>>>>>> For most DBs, when user create a temporary table with:
> >>>>>>>>
> >>>>>>>> CREATE TEMPORARY TABLE t1
> >>>>>>>>
> >>>>>>>> It's actually equivalent with:
> >>>>>>>>
> >>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
> >>>>>>>>
> >>>>>>>> If user change current database, they will not be able to access
> >>> t1
> >>>>>>> without
> >>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
> >>> database
> >>>>> when
> >>>>>>>> this temporary table is created).
> >>>>>>>>
> >>>>>>>> Only #2 and #3 followed this behavior and I would vote for this
> >>>> since
> >>>>>>> this
> >>>>>>>> makes such behavior consistent through temporal tables and
> >>>> functions.
> >>>>>>>> Why I'm not voting for #3 is a special catalog and database just
> >>>> looks
> >>>>>>> very
> >>>>>>>> hacky to me. It gave a imply that our built-in functions saved
> >>> at a
> >>>>>>>> special
> >>>>>>>> catalog and database, which is actually not. Introducing a
> >>> dedicated
> >>>>>>>> keyword
> >>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> >>>>>>>> straightforward. One can argue that we should avoid introducing
> >>> new
> >>>>>>>> keyword,
> >>>>>>>> but it's also very rare that a system can overwrite built-in
> >>>>> functions.
> >>>>>>>> Since we
> >>>>>>>> decided to support this, introduce a new keyword is not a big
> >>> deal
> >>>>> IMO.
> >>>>>>>> Best,
> >>>>>>>> Kurt
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> >>> piotr@ververica.com
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> It is a quite long discussion to follow and I hope I didn’t
> >>>>>>> misunderstand
> >>>>>>>>> anything. From the proposals presented by Xuefu I would vote:
> >>>>>>>>>
> >>>>>>>>> -1 for #1 and #2
> >>>>>>>>> +1 for #3
> >>>>>>>>>
> >>>>>>>>> Besides #3 being IMO more general and more consistent, having
> >>>>> qualified
> >>>>>>>>> names (#3) would help/make easier for someone to use cross
> >>>>>>>>> databases/catalogs queries (joining multiple data sets/streams).
> >>>> For
> >>>>>>>>> example with some functions to manipulate/clean up/convert the
> >>>> stored
> >>>>>>> data
> >>>>>>>>> in different catalogs registered in the respective catalogs.
> >>>>>>>>>
> >>>>>>>>> Piotrek
> >>>>>>>>>
> >>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I agree with Xuefu that inconsistent handling with all the
> >>> other
> >>>>>>>>> objects is
> >>>>>>>>>> not a big problem.
> >>>>>>>>>>
> >>>>>>>>>> Regarding to option#3, the special "system.system" namespace
> >>> may
> >>>>>>> confuse
> >>>>>>>>>> users.
> >>>>>>>>>> Users need to know the set of built-in function names to know
> >>>> when
> >>>>> to
> >>>>>>>>> use
> >>>>>>>>>> "system.system" namespace.
> >>>>>>>>>> What will happen if user registers a non-builtin function name
> >>>>> under
> >>>>>>> the
> >>>>>>>>>> "system.system" namespace?
> >>>>>>>>>> Besides, I think it doesn't solve the "explode" problem I
> >>>> mentioned
> >>>>>>> at
> >>>>>>>>> the
> >>>>>>>>>> beginning of this thread.
> >>>>>>>>>>
> >>>>>>>>>> So here is my vote:
> >>>>>>>>>>
> >>>>>>>>>> +1 for #1
> >>>>>>>>>> 0 for #2
> >>>>>>>>>> -1 for #3
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
> >>> wrote:
> >>>>>>>>>>> @Dawid, Re: we also don't need additional referencing the
> >>>>>>>>> specialcatalog
> >>>>>>>>>>> anywhere.
> >>>>>>>>>>>
> >>>>>>>>>>> True. But once we allow such reference, then user can do so
> >>> in
> >>>> any
> >>>>>>>>> possible
> >>>>>>>>>>> place where a function name is expected, for which we have to
> >>>>>>> handle.
> >>>>>>>>>>> That's a big difference, I think.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Xuefu
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> @Bowen I am not suggesting introducing additional catalog. I
> >>>>> think
> >>>>>>> we
> >>>>>>>>>>> need
> >>>>>>>>>>>> to get rid of the current built-in catalog.
> >>>>>>>>>>>>
> >>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
> >>> referencing
> >>>> the
> >>>>>>>>> special
> >>>>>>>>>>>> catalog anywhere else besides in the CREATE statement. The
> >>>>>>> resolution
> >>>>>>>>>>>> behaviour is exactly the same in both options.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
> >>> wrote:
> >>>>>>>>>>>>> Hi Dawid,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
> >>>> approach.
> >>>>> It
> >>>>>>>>> can
> >>>>>>>>>>> be
> >>>>>>>>>>>>> changed to something else for better.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The difference between this and the #3 approach is that we
> >>>> only
> >>>>>>> need
> >>>>>>>>>>> the
> >>>>>>>>>>>>> keyword for this create DDL. For other places (such as
> >>>> function
> >>>>>>>>>>>>> referencing), no keyword or special namespace is needed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Xuefu
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> >>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>> I think it makes sense to start voting at this point.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Option 1: Only 1-part identifiers
> >>>>>>>>>>>>>> PROS:
> >>>>>>>>>>>>>> - allows shadowing built-in functions
> >>>>>>>>>>>>>> CONS:
> >>>>>>>>>>>>>> - incosistent with all the other objects, both permanent &
> >>>>>>> temporary
> >>>>>>>>>>>>>> - does not allow shadowing catalog functions
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Option 2: Special keyword for built-in function
> >>>>>>>>>>>>>> I think this is quite similar to the special catalog/db.
> >>> The
> >>>>>>> thing I
> >>>>>>>>>>> am
> >>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL keyword.
> >>> This
> >>>>>>>>> keyword
> >>>>>>>>>>>>> has a
> >>>>>>>>>>>>>> meaning in rdbms systems and means a function that is
> >>> present
> >>>>>>> for a
> >>>>>>>>>>>>>> lifetime of a session in which it was created, but
> >>> available
> >>>> in
> >>>>>>> all
> >>>>>>>>>>>> other
> >>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
> >>> keyword
> >>>> in
> >>>>> a
> >>>>>>>>>>>>> different
> >>>>>>>>>>>>>> context.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Option 3: Special catalog/db
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> PROS:
> >>>>>>>>>>>>>> - allows shadowing built-in functions
> >>>>>>>>>>>>>> - allows shadowing catalog functions
> >>>>>>>>>>>>>> - consistent with other objects
> >>>>>>>>>>>>>> CONS:
> >>>>>>>>>>>>>> - we introduce a special namespace for built-in functions
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don't see a problem with introducing the special
> >>> namespace.
> >>>>> In
> >>>>>>> the
> >>>>>>>>>>>> end
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>> is very similar to the keyword approach. In this case the
> >>>>>>> catalog/db
> >>>>>>>>>>>>>> combination would be the "keyword"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Therefore my votes:
> >>>>>>>>>>>>>> Option 1: -0
> >>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up with
> >>> a
> >>>>>>> better
> >>>>>>>>>>>>> keyword)
> >>>>>>>>>>>>>> Option 3: +1
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Dawid
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
> >>>> wrote:
> >>>>>>>>>>>>>>> Hi Aljoscha,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the summary and these are great questions to
> >>> be
> >>>>>>>>>>> answered.
> >>>>>>>>>>>>> The
> >>>>>>>>>>>>>>> answer to your first question is clear: there is a
> >>> general
> >>>>>>>>>>> agreement
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> override built-in functions with temp functions.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> However, your second and third questions are sort of
> >>>> related,
> >>>>>>> as a
> >>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>> reference can be either just function name (like "func")
> >>> or
> >>>> in
> >>>>>>> the
> >>>>>>>>>>>> form
> >>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>> "cat.db.func". When a reference is just function name, it
> >>>> can
> >>>>>>> mean
> >>>>>>>>>>>>>> either a
> >>>>>>>>>>>>>>> built-in function or a function defined in the current
> >>>> cat/db.
> >>>>>>> If
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>> support overriding a built-in function with a temp
> >>> function,
> >>>>>>> such
> >>>>>>>>>>>>>>> overriding can also cover a function in the current
> >>> cat/db.
> >>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
> >>>> function"
> >>>>>>>>>>> means a
> >>>>>>>>>>>>>> temp
> >>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a catalog
> >>>> function
> >>>>>>>>>>> "func"
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support this,
> >>> temp
> >>>>>>>>>>> function
> >>>>>>>>>>>>> has
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that the 2nd
> >>>> and
> >>>>>>> 3rd
> >>>>>>>>>>>>>> questions
> >>>>>>>>>>>>>>> are related. The problem with such support is the
> >>> ambiguity
> >>>>> when
> >>>>>>>>>>> user
> >>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> >>> FUNCTION
> >>>>>>> func
> >>>>>>>>>>>> ...".
> >>>>>>>>>>>>>>> Here "func" can means a global temp function, or a temp
> >>>>>>> function in
> >>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates an
> >>>>>>> inconsistency
> >>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
> >>> current
> >>>>>>> cat/db.
> >>>>>>>>>>>> If
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> assume the latter, then there is no way for user to
> >>> create a
> >>>>>>> global
> >>>>>>>>>>>>> temp
> >>>>>>>>>>>>>>> function.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Giving a special namespace for built-in functions may
> >>> solve
> >>>>> the
> >>>>>>>>>>>>> ambiguity
> >>>>>>>>>>>>>>> problem above, but it also introduces artificial
> >>>>>>> catalog/database
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness of
> >>> the
> >>>>>>> code. I
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the problem,
> >>> like
> >>>>>>> "CREATE
> >>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate proposals for
> >>>>> voting
> >>>>>>>>>>>>>> purposes:
> >>>>>>>>>>>>>>> 1. Support only global, temporary functions without
> >>>> namespace.
> >>>>>>> Such
> >>>>>>>>>>>>> temp
> >>>>>>>>>>>>>>> functions overrides built-in functions and catalog
> >>> functions
> >>>>> in
> >>>>>>>>>>>> current
> >>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
> >>> built-in
> >>>>>>>>>>> functions
> >>>>>>>>>>>>> ->
> >>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
> >>> functions
> >>>> has
> >>>>>>> no
> >>>>>>>>>>>>>>> ambiguity!)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2. In addition to #1, support creating and referencing
> >>>>> temporary
> >>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
> >>> for
> >>>>>>> global
> >>>>>>>>>>>> temp
> >>>>>>>>>>>>>>> functions. The resolution order is: global temp
> >>> functions ->
> >>>>>>>>>>> built-in
> >>>>>>>>>>>>>>> functions -> temp functions in current cat/db -> catalog
> >>>>>>> function.
> >>>>>>>>>>>>>>> (Resolution for partially or fully qualified function
> >>>>> reference
> >>>>>>> is:
> >>>>>>>>>>>>> temp
> >>>>>>>>>>>>>>> functions -> persistent functions.)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 3. In addition to #1, support creating and referencing
> >>>>> temporary
> >>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>> associated with a cat/db with a special namespace for
> >>>> built-in
> >>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>> and global temp functions. The resolution is the same as
> >>> #2,
> >>>>>>> except
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> the special namespace might be prefixed to a reference
> >>> to a
> >>>>>>>>>>> built-in
> >>>>>>>>>>>>>>> function or global temp function. (In absence of the
> >>> special
> >>>>>>>>>>>> namespace,
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> resolution order is the same as in #2.)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My personal preference is #1, given the unknown use case
> >>> and
> >>>>>>>>>>>> introduced
> >>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an acceptable
> >>>>>>> alternative.
> >>>>>>>>>>>>> Thus,
> >>>>>>>>>>>>>>> my votes are:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1 for #1
> >>>>>>>>>>>>>>> +0 for #2
> >>>>>>>>>>>>>>> -1 for #3
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
> >>> please!),
> >>>> or
> >>>>>>> let
> >>>>>>>>>>> me
> >>>>>>>>>>>>> know
> >>>>>>>>>>>>>>> if you have more questions or other candidates.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Xuefu
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> >>>>>>>>>>>> aljoscha@apache.org>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64 are very
> >>>>>>>>>>> connected.
> >>>>>>>>>>>>> To
> >>>>>>>>>>>>>>>> resolve the differences, think we have to think about
> >>> the
> >>>>> basic
> >>>>>>>>>>>>>>> principles
> >>>>>>>>>>>>>>>> and find consensus there. The basic questions I see are:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Do we want to support overriding builtin functions?
> >>>>>>>>>>>>>>>> - Do we want to support overriding catalog functions?
> >>>>>>>>>>>>>>>> - And then later: should temporary functions be tied to
> >>> a
> >>>>>>>>>>>>>>>> catalog/database?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I don’t have much to say about these, except that we
> >>> should
> >>>>>>>>>>>> somewhat
> >>>>>>>>>>>>>>> stick
> >>>>>>>>>>>>>>>> to what the industry does. But I also understand that
> >>> the
> >>>>>>>>>>> industry
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> already very divided on this.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Aljoscha
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the remaining
> >>>> topics.
> >>>>>>> We
> >>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if we
> >>>> resume
> >>>>>>> the
> >>>>>>>>>>>>> topic
> >>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>> time later.
> >>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> >>>>>>>>>>>> “cat.db.fun”
> >>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>> to override a catalog function.
> >>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> >>>>>>>>>>>> nonexistent
> >>>>>>>>>>>>>> cat
> >>>>>>>>>>>>>>>> & db? And we still need to do special treatment for the
> >>>>>>> dedicated
> >>>>>>>>>>>>>>>> system.system cat & db?
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org>
> >>> 写道:
> >>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many things
> >>>>>>>>>>>>> incrementally.
> >>>>>>>>>>>>>>>> Users should be able to override all catalog objects
> >>>>>>> consistently
> >>>>>>>>>>>>>>> according
> >>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> >>> module).
> >>>>> If
> >>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>>> are treated completely different, we need more code and
> >>>>> special
> >>>>>>>>>>>>> cases.
> >>>>>>>>>>>>>>> From
> >>>>>>>>>>>>>>>> an implementation perspective, this topic only affects
> >>> the
> >>>>>>> lookup
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>> which is rather low implementation effort which is why I
> >>>>> would
> >>>>>>>>>>> like
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have a
> >>> slight
> >>>>>>>>>>> consenus
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> overriding built-in functions; we should also strive for
> >>>>>>> reaching
> >>>>>>>>>>>>>>> consensus
> >>>>>>>>>>>>>>>> on the remaining topics.
> >>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering
> >>>> catalog
> >>>>>>>>>>>> objects
> >>>>>>>>>>>>>>>> consistent and the overriding of built-in functions more
> >>>>>>>>>>> explicit.
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> >>>>>>>>>>>>>>>>>>> hi, everyone
> >>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it supports
> >>>>> functions
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing the
> >>>>>>>>>>> duplication
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> functions.
> >>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
> >>> implements
> >>>>>>>>>>> create
> >>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata and
> >>> schema
> >>>>> into
> >>>>>>>>>>>>> mysql,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes sql-client to
> >>>>>>> support
> >>>>>>>>>>>>>> custom
> >>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function is
> >>>>> currently
> >>>>>>>>>>>>>> global,
> >>>>>>>>>>>>>>>> and is
> >>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in the
> >>>>>>>>>>> development
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the community,
> >>> but
> >>>>>>>>>>> found
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>> difficult to join.
> >>>>>>>>>>>>>>>>>>> thank you.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> It seems to me that there is a general consensus on
> >>>>> having
> >>>>>>>>>>>> temp
> >>>>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
> >>>> functions.
> >>>>>>>>>>> (As
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>> side
> >>>>>>>>>>>>>>>> note
> >>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
> >>> functions
> >>>> are
> >>>>>>>>>>> all
> >>>>>>>>>>>>>>>> temporary and
> >>>>>>>>>>>>>>>>>>>> having no namespaces.)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having
> >>>>> namespaced
> >>>>>>>>>>>> temp
> >>>>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a specific
> >>>>> cat/db.
> >>>>>>>>>>>>>> However,
> >>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can be
> >>> added
> >>>>>>>>>>>>>>> incrementally.
> >>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced temp
> >>>>> functions
> >>>>>>>>>>> now
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> leave
> >>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
> >>> releases as
> >>>>> the
> >>>>>>>>>>>>>>>> requirement
> >>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps shorten
> >>> the
> >>>>>>>>>>> debate
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> allow us
> >>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
> >>> host
> >>>>> the
> >>>>>>>>>>>>>>> temporary
> >>>>>>>>>>>>>>>> temp
> >>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
> >>> concern
> >>>> is
> >>>>>>> the
> >>>>>>>>>>>>>> special
> >>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
> >>> clean, as
> >>>>>>>>>>>> evident
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> treating
> >>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>> Xuefiu
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
> >>> suggestion.
> >>>>> How
> >>>>>>>>>>>> about
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for built-in
> >>>>>>>>>>> objects?
> >>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>> catalog
> >>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
> >>> suggesting.
> >>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
> >>> functions, if
> >>>>>>> they
> >>>>>>>>>>>>> fully
> >>>>>>>>>>>>>>>> qualify
> >>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by default
> >>> the
> >>>>>>>>>>> common
> >>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> >>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat & dB
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary function
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be writable
> >>>> for
> >>>>>>>>>>>>> permanent
> >>>>>>>>>>>>>>>>>>>> objects.
> >>>>>>>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
> >>>> solutions.
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Dawid
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> >>>>>>>>>>> twalthr@apache.org
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of overriding
> >>>>> certain
> >>>>>>>>>>>>>> built-in
> >>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many
> >>> people
> >>>>>>>>>>> agree.
> >>>>>>>>>>>>>>>> However, it
> >>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding catalog
> >>>>>>>>>>> functions
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a query
> >>>> even
> >>>>>>>>>>>> though
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available currently
> >>> or
> >>>>>>>>>>> should
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and never
> >>>>>>>>>>>> consideres
> >>>>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with other DDL
> >>> but
> >>>>>>>>>>>>> acceptable
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>> functions I guess.
> >>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> >>>> objects
> >>>>>>>>>>>>> (tables,
> >>>>>>>>>>>>>>>> views)
> >>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the near
> >>>>> future.
> >>>>>>>>>>>> Take
> >>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900
> >>> as
> >>>>> an
> >>>>>>>>>>>>>> example.
> >>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> >>>> favorable
> >>>>>>>>>>>> thus I
> >>>>>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
> >>> discussion
> >>>> is
> >>>>>>>>>>>> mainly
> >>>>>>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> >>>>> builtin.
> >>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions are
> >>>>>>>>>>> differently
> >>>>>>>>>>>>>>> treated
> >>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>> other db objects.
> >>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from the
> >>> fact
> >>>>> that
> >>>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink don't
> >>> have
> >>>>> any
> >>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>> built-in
> >>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>
> >>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>
> >>>>>>>>>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi everyone,

sorry, for the late replay. I give also +1 for option #2. Thus, I guess 
we have a clear winner.

I would also like to find a better keyword/syntax for this statement. 
Esp. the BUILTIN keyword can confuse people, because it could be written 
as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to 
introduce a new reserved keyword in the parser which affects also 
non-DDL queries. How about:

CREATE TEMPORARY SYSTEM FUNCTION xxx

The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are 
discussing to prefix some of the function with a SYSTEM_ prefix like 
SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".

What do you think?

Thanks,
Timo


On 20.09.19 05:45, Bowen Li wrote:
> Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
> temporary built-in function in the same session? With the former one, they
> can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
> one, I'm not sure how users can "restore" the original builtin function
> easily from an "altered" function without introducing further nonstandard
> SQL syntax.
>
> Also please pardon me as I realized using net may not be a good idea... I'm
> trying to fit this vote into cases listed in Flink Bylaw [1].
>
> >From the following result, the majority seems to be #2 too as it has the
> most approval so far and doesn't have strong "-1".
>
> #1:3 (+1), 1 (0), 4(-1)
> #2:4(0), 3 (+1), 1(+0.5)
>         * Dawid -1/0 depending on keyword
> #3:2(+1), 3(-1), 3(0)
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>
> On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks everyone for your votes. I summarized the result as following:
>>
>> #1:3 (+1), 1 (0), 4(-1)     - net: -1
>> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
>>          Dawid -1/0 depending on keyword
>> #3:2(+1), 3(-1), 3(0)       - net: -1
>>
>> Given the result, I'd like to change my vote for #2 from 0 to +1, to make
>> it a stronger case with net +3.5. So the votes so far are:
>>
>> #1:3 (+1), 1 (0), 4(-1)     - net: -1
>> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
>>          Dawid -1/0 depending on keyword
>> #3:2(+1), 3(-1), 3(0)       - net: -1
>>
>> What do you think? Do you think we can conclude with this result? Or would
>> you like to take it as a formal FLIP vote with 3 days voting period?
>>
>> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
>> FUNCTION xxx TEMPORARILY" because
>> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
>> TEMPORARY FUNCTION"
>> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
>> function but it actually doesn't, the logic only creates a temp function
>> with higher priority than that built-in function in ambiguous resolution
>> order; and it would behave inconsistently with "ALTER FUNCTION".
>>
>>
>>
>> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> I agree, it's very similar from the implementation point of view and the
>>> implications.
>>>
>>> IMO, the difference is mostly on the mental model for the user.
>>> Instead of having a special class of temporary functions that have
>>> precedence over builtin functions it suggests to temporarily change
>>> built-in functions.
>>>
>>> Fabian
>>>
>>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <ykt836@gmail.com
>>>> :
>>>> Hi Fabian,
>>>>
>>>> I think it's almost the same with #2 with different keyword:
>>>>
>>>> CREATE TEMPORARY BUILTIN FUNCTION xxx
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I thought about it a bit more and think that there is some good value
>>> in
>>>> my
>>>>> last proposal.
>>>>>
>>>>> A lot of complexity comes from the fact that we want to allow
>>> overriding
>>>>> built-in functions which are differently addressed as other functions
>>>> (and
>>>>> db objects).
>>>>> We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
>>> thing
>>>> as
>>>>> "CREATE FUNCTION" and treat both functions exactly the same except
>>> that:
>>>>> 1) temp functions disappear at the end of the session
>>>>> 2) temp function are resolved before other functions
>>>>>
>>>>> This would be Dawid's proposal from the beginning of this thread (in
>>> case
>>>>> you still remember... ;-) )
>>>>>
>>>>> Temporarily overriding built-in functions would be supported with an
>>>>> explicit command like
>>>>>
>>>>> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
>>>>>
>>>>> This would also address the concerns about accidentally changing the
>>>>> semantics of built-in functions.
>>>>> IMO, it can't get much more explicit than the above command.
>>>>>
>>>>> Sorry for bringing up a new option in the middle of the discussion,
>>> but
>>>> as
>>>>> I said, I think it has a bunch of benefits and I don't see major
>>>> drawbacks
>>>>> (maybe you do?).
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Fabian
>>>>>
>>>>> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
>>>>> fhueske@gmail.com
>>>>>> :
>>>>>> Hi everyone,
>>>>>>
>>>>>> I thought again about option #1 and something that I don't like is
>>> that
>>>>>> the resolved address of xyz is different in "CREATE FUNCTION xyz"
>>> and
>>>>>> "CREATE TEMPORARY FUNCTION xyz".
>>>>>> IMO, adding the keyword "TEMPORARY" should only change the
>>> lifecycle of
>>>>>> the function, but not where it is located. This implicitly changed
>>>>> location
>>>>>> might be confusing for users.
>>>>>> After all, a temp function should behave pretty much like any other
>>>>>> function, except for the fact that it disappears when the session is
>>>>> closed.
>>>>>> Approach #2 with the additional keyword would make that pretty
>>> clear,
>>>>> IMO.
>>>>>> However, I neither like GLOBAL (for reasons mentioned by Dawid) or
>>>>> BUILDIN
>>>>>> (we are not adding a built-in function).
>>>>>> So I'd be OK with #2 if we find a good keyword. In fact, approach #2
>>>>> could
>>>>>> also be an alias for approach #3 to avoid explicit specification of
>>> the
>>>>>> system catalog/db.
>>>>>>
>>>>>> Approach #3 would be consistent with other db objects and the
>>> "CREATE
>>>>>> FUNCTION" statement.
>>>>>> Adding system catalog/db seems rather complex, but then again how
>>> often
>>>>> do
>>>>>> we expect users to override built-in functions? If this becomes a
>>> major
>>>>>> issue, we can still add option #2 as an alias.
>>>>>>
>>>>>> Not sure what's the best approach from an internal point of view,
>>> but I
>>>>>> certainly think that consistent behavior is important.
>>>>>> Hence my votes are:
>>>>>>
>>>>>> -1 for #1
>>>>>> 0 for #2
>>>>>> 0 for #3
>>>>>>
>>>>>> Btw. Did we consider a completely separate command for overriding
>>>>> built-in
>>>>>> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
>>>>>>
>>>>>> Cheers, Fabian
>>>>>>
>>>>>>
>>>>>> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
>>>>>> <lz...@aliyun.com.invalid>:
>>>>>>
>>>>>>> I know Hive and Spark can shadow built-in functions by temporary
>>>>> function.
>>>>>>> Mysql, Oracle, Sql server can not shadow.
>>>>>>> User can use full names to access functions instead of shadowing.
>>>>>>>
>>>>>>> So I think it is a completely new thing, and the direct way to deal
>>>> with
>>>>>>> new things is to add new grammar. So,
>>>>>>> +1 for #2, +0 for #3, -1 for #1
>>>>>>>
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------
>>>>>>> From:Kurt Young <yk...@gmail.com>
>>>>>>> Send Time:2019年9月19日(星期四) 16:43
>>>>>>> To:dev <de...@flink.apache.org>
>>>>>>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
>>>>>>>
>>>>>>> And let me make my vote complete:
>>>>>>>
>>>>>>> -1 for #1
>>>>>>> +1 for #2 with different keyword
>>>>>>> -0 for #3
>>>>>>>
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
>>> wrote:
>>>>>>>> Looks like I'm the only person who is willing to +1 to #2 for now
>>>> :-)
>>>>>>>> But I would suggest to change the keyword from GLOBAL to
>>>>>>>> something like BUILTIN.
>>>>>>>>
>>>>>>>> I think #2 and #3 are almost the same proposal, just with
>>> different
>>>>>>>> format to indicate whether it want to override built-in
>>> functions.
>>>>>>>> My biggest reason to choose it is I want this behavior be
>>> consistent
>>>>>>>> with temporal tables. I will give some examples to show the
>>> behavior
>>>>>>>> and also make sure I'm not misunderstanding anything here.
>>>>>>>>
>>>>>>>> For most DBs, when user create a temporary table with:
>>>>>>>>
>>>>>>>> CREATE TEMPORARY TABLE t1
>>>>>>>>
>>>>>>>> It's actually equivalent with:
>>>>>>>>
>>>>>>>> CREATE TEMPORARY TABLE `curent_db`.t1
>>>>>>>>
>>>>>>>> If user change current database, they will not be able to access
>>> t1
>>>>>>> without
>>>>>>>> fully qualified name, .i.e db1.t1 (assuming db1 is current
>>> database
>>>>> when
>>>>>>>> this temporary table is created).
>>>>>>>>
>>>>>>>> Only #2 and #3 followed this behavior and I would vote for this
>>>> since
>>>>>>> this
>>>>>>>> makes such behavior consistent through temporal tables and
>>>> functions.
>>>>>>>> Why I'm not voting for #3 is a special catalog and database just
>>>> looks
>>>>>>> very
>>>>>>>> hacky to me. It gave a imply that our built-in functions saved
>>> at a
>>>>>>>> special
>>>>>>>> catalog and database, which is actually not. Introducing a
>>> dedicated
>>>>>>>> keyword
>>>>>>>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
>>>>>>>> straightforward. One can argue that we should avoid introducing
>>> new
>>>>>>>> keyword,
>>>>>>>> but it's also very rare that a system can overwrite built-in
>>>>> functions.
>>>>>>>> Since we
>>>>>>>> decided to support this, introduce a new keyword is not a big
>>> deal
>>>>> IMO.
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
>>> piotr@ververica.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> It is a quite long discussion to follow and I hope I didn’t
>>>>>>> misunderstand
>>>>>>>>> anything. From the proposals presented by Xuefu I would vote:
>>>>>>>>>
>>>>>>>>> -1 for #1 and #2
>>>>>>>>> +1 for #3
>>>>>>>>>
>>>>>>>>> Besides #3 being IMO more general and more consistent, having
>>>>> qualified
>>>>>>>>> names (#3) would help/make easier for someone to use cross
>>>>>>>>> databases/catalogs queries (joining multiple data sets/streams).
>>>> For
>>>>>>>>> example with some functions to manipulate/clean up/convert the
>>>> stored
>>>>>>> data
>>>>>>>>> in different catalogs registered in the respective catalogs.
>>>>>>>>>
>>>>>>>>> Piotrek
>>>>>>>>>
>>>>>>>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> I agree with Xuefu that inconsistent handling with all the
>>> other
>>>>>>>>> objects is
>>>>>>>>>> not a big problem.
>>>>>>>>>>
>>>>>>>>>> Regarding to option#3, the special "system.system" namespace
>>> may
>>>>>>> confuse
>>>>>>>>>> users.
>>>>>>>>>> Users need to know the set of built-in function names to know
>>>> when
>>>>> to
>>>>>>>>> use
>>>>>>>>>> "system.system" namespace.
>>>>>>>>>> What will happen if user registers a non-builtin function name
>>>>> under
>>>>>>> the
>>>>>>>>>> "system.system" namespace?
>>>>>>>>>> Besides, I think it doesn't solve the "explode" problem I
>>>> mentioned
>>>>>>> at
>>>>>>>>> the
>>>>>>>>>> beginning of this thread.
>>>>>>>>>>
>>>>>>>>>> So here is my vote:
>>>>>>>>>>
>>>>>>>>>> +1 for #1
>>>>>>>>>> 0 for #2
>>>>>>>>>> -1 for #3
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
>>> wrote:
>>>>>>>>>>> @Dawid, Re: we also don't need additional referencing the
>>>>>>>>> specialcatalog
>>>>>>>>>>> anywhere.
>>>>>>>>>>>
>>>>>>>>>>> True. But once we allow such reference, then user can do so
>>> in
>>>> any
>>>>>>>>> possible
>>>>>>>>>>> place where a function name is expected, for which we have to
>>>>>>> handle.
>>>>>>>>>>> That's a big difference, I think.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Xuefu
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>>>>>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> @Bowen I am not suggesting introducing additional catalog. I
>>>>> think
>>>>>>> we
>>>>>>>>>>> need
>>>>>>>>>>>> to get rid of the current built-in catalog.
>>>>>>>>>>>>
>>>>>>>>>>>> @Xuefu in option #3 we also don't need additional
>>> referencing
>>>> the
>>>>>>>>> special
>>>>>>>>>>>> catalog anywhere else besides in the CREATE statement. The
>>>>>>> resolution
>>>>>>>>>>>> behaviour is exactly the same in both options.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
>>> wrote:
>>>>>>>>>>>>> Hi Dawid,
>>>>>>>>>>>>>
>>>>>>>>>>>>> "GLOBAL" is a temporary keyword that was given to the
>>>> approach.
>>>>> It
>>>>>>>>> can
>>>>>>>>>>> be
>>>>>>>>>>>>> changed to something else for better.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The difference between this and the #3 approach is that we
>>>> only
>>>>>>> need
>>>>>>>>>>> the
>>>>>>>>>>>>> keyword for this create DDL. For other places (such as
>>>> function
>>>>>>>>>>>>> referencing), no keyword or special namespace is needed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Xuefu
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I think it makes sense to start voting at this point.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Option 1: Only 1-part identifiers
>>>>>>>>>>>>>> PROS:
>>>>>>>>>>>>>> - allows shadowing built-in functions
>>>>>>>>>>>>>> CONS:
>>>>>>>>>>>>>> - incosistent with all the other objects, both permanent &
>>>>>>> temporary
>>>>>>>>>>>>>> - does not allow shadowing catalog functions
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Option 2: Special keyword for built-in function
>>>>>>>>>>>>>> I think this is quite similar to the special catalog/db.
>>> The
>>>>>>> thing I
>>>>>>>>>>> am
>>>>>>>>>>>>>> strongly against in this proposal is the GLOBAL keyword.
>>> This
>>>>>>>>> keyword
>>>>>>>>>>>>> has a
>>>>>>>>>>>>>> meaning in rdbms systems and means a function that is
>>> present
>>>>>>> for a
>>>>>>>>>>>>>> lifetime of a session in which it was created, but
>>> available
>>>> in
>>>>>>> all
>>>>>>>>>>>> other
>>>>>>>>>>>>>> sessions. Therefore I really don't want to use this
>>> keyword
>>>> in
>>>>> a
>>>>>>>>>>>>> different
>>>>>>>>>>>>>> context.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Option 3: Special catalog/db
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> PROS:
>>>>>>>>>>>>>> - allows shadowing built-in functions
>>>>>>>>>>>>>> - allows shadowing catalog functions
>>>>>>>>>>>>>> - consistent with other objects
>>>>>>>>>>>>>> CONS:
>>>>>>>>>>>>>> - we introduce a special namespace for built-in functions
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't see a problem with introducing the special
>>> namespace.
>>>>> In
>>>>>>> the
>>>>>>>>>>>> end
>>>>>>>>>>>>> it
>>>>>>>>>>>>>> is very similar to the keyword approach. In this case the
>>>>>>> catalog/db
>>>>>>>>>>>>>> combination would be the "keyword"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore my votes:
>>>>>>>>>>>>>> Option 1: -0
>>>>>>>>>>>>>> Option 2: -1 (I might change to +0 if we can come up with
>>> a
>>>>>>> better
>>>>>>>>>>>>> keyword)
>>>>>>>>>>>>>> Option 3: +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
>>>> wrote:
>>>>>>>>>>>>>>> Hi Aljoscha,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the summary and these are great questions to
>>> be
>>>>>>>>>>> answered.
>>>>>>>>>>>>> The
>>>>>>>>>>>>>>> answer to your first question is clear: there is a
>>> general
>>>>>>>>>>> agreement
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> override built-in functions with temp functions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, your second and third questions are sort of
>>>> related,
>>>>>>> as a
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> reference can be either just function name (like "func")
>>> or
>>>> in
>>>>>>> the
>>>>>>>>>>>> form
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> "cat.db.func". When a reference is just function name, it
>>>> can
>>>>>>> mean
>>>>>>>>>>>>>> either a
>>>>>>>>>>>>>>> built-in function or a function defined in the current
>>>> cat/db.
>>>>>>> If
>>>>>>>>>>> we
>>>>>>>>>>>>>>> support overriding a built-in function with a temp
>>> function,
>>>>>>> such
>>>>>>>>>>>>>>> overriding can also cover a function in the current
>>> cat/db.
>>>>>>>>>>>>>>> I think what Timo referred as "overriding a catalog
>>>> function"
>>>>>>>>>>> means a
>>>>>>>>>>>>>> temp
>>>>>>>>>>>>>>> function defined as "cat.db.func" overrides a catalog
>>>> function
>>>>>>>>>>> "func"
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> cat/db even if cat/db is not current. To support this,
>>> temp
>>>>>>>>>>> function
>>>>>>>>>>>>> has
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be tied to a cat/db. What's why I said above that the 2nd
>>>> and
>>>>>>> 3rd
>>>>>>>>>>>>>> questions
>>>>>>>>>>>>>>> are related. The problem with such support is the
>>> ambiguity
>>>>> when
>>>>>>>>>>> user
>>>>>>>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY
>>> FUNCTION
>>>>>>> func
>>>>>>>>>>>> ...".
>>>>>>>>>>>>>>> Here "func" can means a global temp function, or a temp
>>>>>>> function in
>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>> cat/db. If we can assume the former, this creates an
>>>>>>> inconsistency
>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>> "CREATE FUNCTION func" actually means a function in
>>> current
>>>>>>> cat/db.
>>>>>>>>>>>> If
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> assume the latter, then there is no way for user to
>>> create a
>>>>>>> global
>>>>>>>>>>>>> temp
>>>>>>>>>>>>>>> function.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Giving a special namespace for built-in functions may
>>> solve
>>>>> the
>>>>>>>>>>>>> ambiguity
>>>>>>>>>>>>>>> problem above, but it also introduces artificial
>>>>>>> catalog/database
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> needs special treatment and pollutes the cleanness of
>>> the
>>>>>>> code. I
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> rather introduce a syntax in DDL to solve the problem,
>>> like
>>>>>>> "CREATE
>>>>>>>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus, I'd like to summarize a few candidate proposals for
>>>>> voting
>>>>>>>>>>>>>> purposes:
>>>>>>>>>>>>>>> 1. Support only global, temporary functions without
>>>> namespace.
>>>>>>> Such
>>>>>>>>>>>>> temp
>>>>>>>>>>>>>>> functions overrides built-in functions and catalog
>>> functions
>>>>> in
>>>>>>>>>>>> current
>>>>>>>>>>>>>>> cat/db. The resolution order is: temp functions ->
>>> built-in
>>>>>>>>>>> functions
>>>>>>>>>>>>> ->
>>>>>>>>>>>>>>> catalog functions. (Partially or fully qualified
>>> functions
>>>> has
>>>>>>> no
>>>>>>>>>>>>>>> ambiguity!)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. In addition to #1, support creating and referencing
>>>>> temporary
>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
>>> for
>>>>>>> global
>>>>>>>>>>>> temp
>>>>>>>>>>>>>>> functions. The resolution order is: global temp
>>> functions ->
>>>>>>>>>>> built-in
>>>>>>>>>>>>>>> functions -> temp functions in current cat/db -> catalog
>>>>>>> function.
>>>>>>>>>>>>>>> (Resolution for partially or fully qualified function
>>>>> reference
>>>>>>> is:
>>>>>>>>>>>>> temp
>>>>>>>>>>>>>>> functions -> persistent functions.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3. In addition to #1, support creating and referencing
>>>>> temporary
>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>> associated with a cat/db with a special namespace for
>>>> built-in
>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>> and global temp functions. The resolution is the same as
>>> #2,
>>>>>>> except
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the special namespace might be prefixed to a reference
>>> to a
>>>>>>>>>>> built-in
>>>>>>>>>>>>>>> function or global temp function. (In absence of the
>>> special
>>>>>>>>>>>> namespace,
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> resolution order is the same as in #2.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My personal preference is #1, given the unknown use case
>>> and
>>>>>>>>>>>> introduced
>>>>>>>>>>>>>>> complexity for #2 and #3. However, #2 is an acceptable
>>>>>>> alternative.
>>>>>>>>>>>>> Thus,
>>>>>>>>>>>>>>> my votes are:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 for #1
>>>>>>>>>>>>>>> +0 for #2
>>>>>>>>>>>>>>> -1 for #3
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Everyone, please cast your vote (in above format
>>> please!),
>>>> or
>>>>>>> let
>>>>>>>>>>> me
>>>>>>>>>>>>> know
>>>>>>>>>>>>>>> if you have more questions or other candidates.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Xuefu
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>>>>>>>>>>>> aljoscha@apache.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think this discussion and the one for FLIP-64 are very
>>>>>>>>>>> connected.
>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>> resolve the differences, think we have to think about
>>> the
>>>>> basic
>>>>>>>>>>>>>>> principles
>>>>>>>>>>>>>>>> and find consensus there. The basic questions I see are:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Do we want to support overriding builtin functions?
>>>>>>>>>>>>>>>> - Do we want to support overriding catalog functions?
>>>>>>>>>>>>>>>> - And then later: should temporary functions be tied to
>>> a
>>>>>>>>>>>>>>>> catalog/database?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don’t have much to say about these, except that we
>>> should
>>>>>>>>>>>> somewhat
>>>>>>>>>>>>>>> stick
>>>>>>>>>>>>>>>> to what the industry does. But I also understand that
>>> the
>>>>>>>>>>> industry
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> already very divided on this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 to strive for reaching consensus on the remaining
>>>> topics.
>>>>>>> We
>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> close to the truth. It will waste a lot of time if we
>>>> resume
>>>>>>> the
>>>>>>>>>>>>> topic
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> time later.
>>>>>>>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>>>>>>>>>>>> “cat.db.fun”
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>> to override a catalog function.
>>>>>>>>>>>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>>>>>>>>>>>> nonexistent
>>>>>>>>>>>>>> cat
>>>>>>>>>>>>>>>> & db? And we still need to do special treatment for the
>>>>>>> dedicated
>>>>>>>>>>>>>>>> system.system cat & db?
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org>
>>> 写道:
>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many things
>>>>>>>>>>>>> incrementally.
>>>>>>>>>>>>>>>> Users should be able to override all catalog objects
>>>>>>> consistently
>>>>>>>>>>>>>>> according
>>>>>>>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table
>>> module).
>>>>> If
>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>>> are treated completely different, we need more code and
>>>>> special
>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>> From
>>>>>>>>>>>>>>>> an implementation perspective, this topic only affects
>>> the
>>>>>>> lookup
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>> which is rather low implementation effort which is why I
>>>>> would
>>>>>>>>>>> like
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> clarify the remaining items. As you said, we have a
>>> slight
>>>>>>>>>>> consenus
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> overriding built-in functions; we should also strive for
>>>>>>> reaching
>>>>>>>>>>>>>>> consensus
>>>>>>>>>>>>>>>> on the remaining topics.
>>>>>>>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering
>>>> catalog
>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>> consistent and the overriding of built-in functions more
>>>>>>>>>>> explicit.
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>>>>>>>>>>>>>>>>>>> hi, everyone
>>>>>>>>>>>>>>>>>>> I think this flip is very meaningful. it supports
>>>>> functions
>>>>>>>>>>>> that
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> shared by different catalogs and dbs, reducing the
>>>>>>>>>>> duplication
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>> Our group based on flink's sql parser module
>>> implements
>>>>>>>>>>> create
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>> feature, stores the parsed function metadata and
>>> schema
>>>>> into
>>>>>>>>>>>>> mysql,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> also customizes the catalog, customizes sql-client to
>>>>>>> support
>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>>> schemas and functions. Loaded, but the function is
>>>>> currently
>>>>>>>>>>>>>> global,
>>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>>>>>> not subdivided according to catalog and db.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition, I very much hope to participate in the
>>>>>>>>>>> development
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> flip, I have been paying attention to the community,
>>> but
>>>>>>>>>>> found
>>>>>>>>>>>> it
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>> difficult to join.
>>>>>>>>>>>>>>>>>>> thank you.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It seems to me that there is a general consensus on
>>>>> having
>>>>>>>>>>>> temp
>>>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>>>>>>> that have no namespaces and overwrite built-in
>>>> functions.
>>>>>>>>>>> (As
>>>>>>>>>>>> a
>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>> note
>>>>>>>>>>>>>>>>>>>> for comparability, the current user defined
>>> functions
>>>> are
>>>>>>>>>>> all
>>>>>>>>>>>>>>>> temporary and
>>>>>>>>>>>>>>>>>>>> having no namespaces.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having
>>>>> namespaced
>>>>>>>>>>>> temp
>>>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>>>>>>> that can overwrite functions defined in a specific
>>>>> cat/db.
>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> idea appears orthogonal to the former and can be
>>> added
>>>>>>>>>>>>>>> incrementally.
>>>>>>>>>>>>>>>>>>>> How about we first implement non-namespaced temp
>>>>> functions
>>>>>>>>>>> now
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> leave
>>>>>>>>>>>>>>>>>>>> the door open for namespaced ones for later
>>> releases as
>>>>> the
>>>>>>>>>>>>>>>> requirement
>>>>>>>>>>>>>>>>>>>> might become more crystal? This also helps shorten
>>> the
>>>>>>>>>>> debate
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> allow us
>>>>>>>>>>>>>>>>>>>> to make some progress along this direction.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
>>> host
>>>>> the
>>>>>>>>>>>>>>> temporary
>>>>>>>>>>>>>>>> temp
>>>>>>>>>>>>>>>>>>>> functions that don't have namespaces, my only
>>> concern
>>>> is
>>>>>>> the
>>>>>>>>>>>>>> special
>>>>>>>>>>>>>>>>>>>> treatment for a cat/db, which makes code less
>>> clean, as
>>>>>>>>>>>> evident
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> treating
>>>>>>>>>>>>>>>>>>>> the built-in catalog currently.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Xuefiu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>>>>>>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> Another idea to consider on top of Timo's
>>> suggestion.
>>>>> How
>>>>>>>>>>>> about
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>> special namespace (catalog + database) for built-in
>>>>>>>>>>> objects?
>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>> catalog
>>>>>>>>>>>>>>>>>>>>> would be invisible for users as Xuefu was
>>> suggesting.
>>>>>>>>>>>>>>>>>>>>> Then users could still override built-in
>>> functions, if
>>>>>>> they
>>>>>>>>>>>>> fully
>>>>>>>>>>>>>>>> qualify
>>>>>>>>>>>>>>>>>>>>> object with the built-in namespace, but by default
>>> the
>>>>>>>>>>> common
>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> current dB & cat would be used.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>>>>>>>>>>>>>>>>>>>>> registers temporary function in current cat & dB
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>>>>>>>>>>>>>>>>>>> registers temporary function in cat db
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>>>>>>>>>>>>>>>>>>> Overrides built-in function with temporary function
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The built-in/system namespace would not be writable
>>>> for
>>>>>>>>>>>>> permanent
>>>>>>>>>>>>>>>>>>>> objects.
>>>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This way I think we can have benefits of both
>>>> solutions.
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>>>>>>>>>>> twalthr@apache.org
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I understand the potential benefit of overriding
>>>>> certain
>>>>>>>>>>>>>> built-in
>>>>>>>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many
>>> people
>>>>>>>>>>> agree.
>>>>>>>>>>>>>>>> However, it
>>>>>>>>>>>>>>>>>>>>>> would be great to still support overriding catalog
>>>>>>>>>>> functions
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> temporary functions in order to prototype a query
>>>> even
>>>>>>>>>>>> though
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> catalog/database might not be available currently
>>> or
>>>>>>>>>>> should
>>>>>>>>>>>>> not
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and never
>>>>>>>>>>>> consideres
>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>>>>>> catalog and database; inconsistent with other DDL
>>> but
>>>>>>>>>>>>> acceptable
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> functions I guess.
>>>>>>>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>>>>>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in
>>>> objects
>>>>>>>>>>>>> (tables,
>>>>>>>>>>>>>>>> views)
>>>>>>>>>>>>>>>>>>>>>> except functions", this might change in the near
>>>>> future.
>>>>>>>>>>>> Take
>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900
>>> as
>>>>> an
>>>>>>>>>>>>>> example.
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
>>>> favorable
>>>>>>>>>>>> thus I
>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>>>> include that as a voting option, and the
>>> discussion
>>>> is
>>>>>>>>>>>> mainly
>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
>>>>> builtin.
>>>>>>>>>>>>>>>>>>>>>>> Re > However, it means that temp functions are
>>>>>>>>>>> differently
>>>>>>>>>>>>>>> treated
>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>> other db objects.
>>>>>>>>>>>>>>>>>>>>>>> IMO, the treatment difference results from the
>>> fact
>>>>> that
>>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> bit different from other objects - Flink don't
>>> have
>>>>> any
>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>> built-in
>>>>>>>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>
>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>
>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>
>>>>>>>>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
temporary built-in function in the same session? With the former one, they
can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
one, I'm not sure how users can "restore" the original builtin function
easily from an "altered" function without introducing further nonstandard
SQL syntax.

Also please pardon me as I realized using net may not be a good idea... I'm
trying to fit this vote into cases listed in Flink Bylaw [1].

From the following result, the majority seems to be #2 too as it has the
most approval so far and doesn't have strong "-1".

#1:3 (+1), 1 (0), 4(-1)
#2:4(0), 3 (+1), 1(+0.5)
       * Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bo...@gmail.com> wrote:

> Hi,
>
> Thanks everyone for your votes. I summarized the result as following:
>
> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> #2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
>         Dawid -1/0 depending on keyword
> #3:2(+1), 3(-1), 3(0)       - net: -1
>
> Given the result, I'd like to change my vote for #2 from 0 to +1, to make
> it a stronger case with net +3.5. So the votes so far are:
>
> #1:3 (+1), 1 (0), 4(-1)     - net: -1
> #2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
>         Dawid -1/0 depending on keyword
> #3:2(+1), 3(-1), 3(0)       - net: -1
>
> What do you think? Do you think we can conclude with this result? Or would
> you like to take it as a formal FLIP vote with 3 days voting period?
>
> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
> FUNCTION xxx TEMPORARILY" because
> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
> TEMPORARY FUNCTION"
> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
> function but it actually doesn't, the logic only creates a temp function
> with higher priority than that built-in function in ambiguous resolution
> order; and it would behave inconsistently with "ALTER FUNCTION".
>
>
>
> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com> wrote:
>
>> I agree, it's very similar from the implementation point of view and the
>> implications.
>>
>> IMO, the difference is mostly on the mental model for the user.
>> Instead of having a special class of temporary functions that have
>> precedence over builtin functions it suggests to temporarily change
>> built-in functions.
>>
>> Fabian
>>
>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <ykt836@gmail.com
>> >:
>>
>> > Hi Fabian,
>> >
>> > I think it's almost the same with #2 with different keyword:
>> >
>> > CREATE TEMPORARY BUILTIN FUNCTION xxx
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I thought about it a bit more and think that there is some good value
>> in
>> > my
>> > > last proposal.
>> > >
>> > > A lot of complexity comes from the fact that we want to allow
>> overriding
>> > > built-in functions which are differently addressed as other functions
>> > (and
>> > > db objects).
>> > > We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
>> thing
>> > as
>> > > "CREATE FUNCTION" and treat both functions exactly the same except
>> that:
>> > > 1) temp functions disappear at the end of the session
>> > > 2) temp function are resolved before other functions
>> > >
>> > > This would be Dawid's proposal from the beginning of this thread (in
>> case
>> > > you still remember... ;-) )
>> > >
>> > > Temporarily overriding built-in functions would be supported with an
>> > > explicit command like
>> > >
>> > > ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
>> > >
>> > > This would also address the concerns about accidentally changing the
>> > > semantics of built-in functions.
>> > > IMO, it can't get much more explicit than the above command.
>> > >
>> > > Sorry for bringing up a new option in the middle of the discussion,
>> but
>> > as
>> > > I said, I think it has a bunch of benefits and I don't see major
>> > drawbacks
>> > > (maybe you do?).
>> > >
>> > > What do you think?
>> > >
>> > > Fabian
>> > >
>> > > Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
>> > > fhueske@gmail.com
>> > > >:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > I thought again about option #1 and something that I don't like is
>> that
>> > > > the resolved address of xyz is different in "CREATE FUNCTION xyz"
>> and
>> > > > "CREATE TEMPORARY FUNCTION xyz".
>> > > > IMO, adding the keyword "TEMPORARY" should only change the
>> lifecycle of
>> > > > the function, but not where it is located. This implicitly changed
>> > > location
>> > > > might be confusing for users.
>> > > > After all, a temp function should behave pretty much like any other
>> > > > function, except for the fact that it disappears when the session is
>> > > closed.
>> > > >
>> > > > Approach #2 with the additional keyword would make that pretty
>> clear,
>> > > IMO.
>> > > > However, I neither like GLOBAL (for reasons mentioned by Dawid) or
>> > > BUILDIN
>> > > > (we are not adding a built-in function).
>> > > > So I'd be OK with #2 if we find a good keyword. In fact, approach #2
>> > > could
>> > > > also be an alias for approach #3 to avoid explicit specification of
>> the
>> > > > system catalog/db.
>> > > >
>> > > > Approach #3 would be consistent with other db objects and the
>> "CREATE
>> > > > FUNCTION" statement.
>> > > > Adding system catalog/db seems rather complex, but then again how
>> often
>> > > do
>> > > > we expect users to override built-in functions? If this becomes a
>> major
>> > > > issue, we can still add option #2 as an alias.
>> > > >
>> > > > Not sure what's the best approach from an internal point of view,
>> but I
>> > > > certainly think that consistent behavior is important.
>> > > > Hence my votes are:
>> > > >
>> > > > -1 for #1
>> > > > 0 for #2
>> > > > 0 for #3
>> > > >
>> > > > Btw. Did we consider a completely separate command for overriding
>> > > built-in
>> > > > functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
>> > > >
>> > > > Cheers, Fabian
>> > > >
>> > > >
>> > > > Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
>> > > > <lz...@aliyun.com.invalid>:
>> > > >
>> > > >> I know Hive and Spark can shadow built-in functions by temporary
>> > > function.
>> > > >> Mysql, Oracle, Sql server can not shadow.
>> > > >> User can use full names to access functions instead of shadowing.
>> > > >>
>> > > >> So I think it is a completely new thing, and the direct way to deal
>> > with
>> > > >> new things is to add new grammar. So,
>> > > >> +1 for #2, +0 for #3, -1 for #1
>> > > >>
>> > > >> Best,
>> > > >> Jingsong Lee
>> > > >>
>> > > >>
>> > > >> ------------------------------------------------------------------
>> > > >> From:Kurt Young <yk...@gmail.com>
>> > > >> Send Time:2019年9月19日(星期四) 16:43
>> > > >> To:dev <de...@flink.apache.org>
>> > > >> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
>> > > >>
>> > > >> And let me make my vote complete:
>> > > >>
>> > > >> -1 for #1
>> > > >> +1 for #2 with different keyword
>> > > >> -0 for #3
>> > > >>
>> > > >> Best,
>> > > >> Kurt
>> > > >>
>> > > >>
>> > > >> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
>> wrote:
>> > > >>
>> > > >> > Looks like I'm the only person who is willing to +1 to #2 for now
>> > :-)
>> > > >> > But I would suggest to change the keyword from GLOBAL to
>> > > >> > something like BUILTIN.
>> > > >> >
>> > > >> > I think #2 and #3 are almost the same proposal, just with
>> different
>> > > >> > format to indicate whether it want to override built-in
>> functions.
>> > > >> >
>> > > >> > My biggest reason to choose it is I want this behavior be
>> consistent
>> > > >> > with temporal tables. I will give some examples to show the
>> behavior
>> > > >> > and also make sure I'm not misunderstanding anything here.
>> > > >> >
>> > > >> > For most DBs, when user create a temporary table with:
>> > > >> >
>> > > >> > CREATE TEMPORARY TABLE t1
>> > > >> >
>> > > >> > It's actually equivalent with:
>> > > >> >
>> > > >> > CREATE TEMPORARY TABLE `curent_db`.t1
>> > > >> >
>> > > >> > If user change current database, they will not be able to access
>> t1
>> > > >> without
>> > > >> > fully qualified name, .i.e db1.t1 (assuming db1 is current
>> database
>> > > when
>> > > >> > this temporary table is created).
>> > > >> >
>> > > >> > Only #2 and #3 followed this behavior and I would vote for this
>> > since
>> > > >> this
>> > > >> > makes such behavior consistent through temporal tables and
>> > functions.
>> > > >> >
>> > > >> > Why I'm not voting for #3 is a special catalog and database just
>> > looks
>> > > >> very
>> > > >> > hacky to me. It gave a imply that our built-in functions saved
>> at a
>> > > >> > special
>> > > >> > catalog and database, which is actually not. Introducing a
>> dedicated
>> > > >> > keyword
>> > > >> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
>> > > >> > straightforward. One can argue that we should avoid introducing
>> new
>> > > >> > keyword,
>> > > >> > but it's also very rare that a system can overwrite built-in
>> > > functions.
>> > > >> > Since we
>> > > >> > decided to support this, introduce a new keyword is not a big
>> deal
>> > > IMO.
>> > > >> >
>> > > >> > Best,
>> > > >> > Kurt
>> > > >> >
>> > > >> >
>> > > >> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
>> piotr@ververica.com
>> > >
>> > > >> > wrote:
>> > > >> >
>> > > >> >> Hi,
>> > > >> >>
>> > > >> >> It is a quite long discussion to follow and I hope I didn’t
>> > > >> misunderstand
>> > > >> >> anything. From the proposals presented by Xuefu I would vote:
>> > > >> >>
>> > > >> >> -1 for #1 and #2
>> > > >> >> +1 for #3
>> > > >> >>
>> > > >> >> Besides #3 being IMO more general and more consistent, having
>> > > qualified
>> > > >> >> names (#3) would help/make easier for someone to use cross
>> > > >> >> databases/catalogs queries (joining multiple data sets/streams).
>> > For
>> > > >> >> example with some functions to manipulate/clean up/convert the
>> > stored
>> > > >> data
>> > > >> >> in different catalogs registered in the respective catalogs.
>> > > >> >>
>> > > >> >> Piotrek
>> > > >> >>
>> > > >> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>> > > >> >> >
>> > > >> >> > I agree with Xuefu that inconsistent handling with all the
>> other
>> > > >> >> objects is
>> > > >> >> > not a big problem.
>> > > >> >> >
>> > > >> >> > Regarding to option#3, the special "system.system" namespace
>> may
>> > > >> confuse
>> > > >> >> > users.
>> > > >> >> > Users need to know the set of built-in function names to know
>> > when
>> > > to
>> > > >> >> use
>> > > >> >> > "system.system" namespace.
>> > > >> >> > What will happen if user registers a non-builtin function name
>> > > under
>> > > >> the
>> > > >> >> > "system.system" namespace?
>> > > >> >> > Besides, I think it doesn't solve the "explode" problem I
>> > mentioned
>> > > >> at
>> > > >> >> the
>> > > >> >> > beginning of this thread.
>> > > >> >> >
>> > > >> >> > So here is my vote:
>> > > >> >> >
>> > > >> >> > +1 for #1
>> > > >> >> > 0 for #2
>> > > >> >> > -1 for #3
>> > > >> >> >
>> > > >> >> > Best,
>> > > >> >> > Jark
>> > > >> >> >
>> > > >> >> >
>> > > >> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
>> wrote:
>> > > >> >> >
>> > > >> >> >> @Dawid, Re: we also don't need additional referencing the
>> > > >> >> specialcatalog
>> > > >> >> >> anywhere.
>> > > >> >> >>
>> > > >> >> >> True. But once we allow such reference, then user can do so
>> in
>> > any
>> > > >> >> possible
>> > > >> >> >> place where a function name is expected, for which we have to
>> > > >> handle.
>> > > >> >> >> That's a big difference, I think.
>> > > >> >> >>
>> > > >> >> >> Thanks,
>> > > >> >> >> Xuefu
>> > > >> >> >>
>> > > >> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>> > > >> >> >> wysakowicz.dawid@gmail.com>
>> > > >> >> >> wrote:
>> > > >> >> >>
>> > > >> >> >>> @Bowen I am not suggesting introducing additional catalog. I
>> > > think
>> > > >> we
>> > > >> >> >> need
>> > > >> >> >>> to get rid of the current built-in catalog.
>> > > >> >> >>>
>> > > >> >> >>> @Xuefu in option #3 we also don't need additional
>> referencing
>> > the
>> > > >> >> special
>> > > >> >> >>> catalog anywhere else besides in the CREATE statement. The
>> > > >> resolution
>> > > >> >> >>> behaviour is exactly the same in both options.
>> > > >> >> >>>
>> > > >> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
>> wrote:
>> > > >> >> >>>
>> > > >> >> >>>> Hi Dawid,
>> > > >> >> >>>>
>> > > >> >> >>>> "GLOBAL" is a temporary keyword that was given to the
>> > approach.
>> > > It
>> > > >> >> can
>> > > >> >> >> be
>> > > >> >> >>>> changed to something else for better.
>> > > >> >> >>>>
>> > > >> >> >>>> The difference between this and the #3 approach is that we
>> > only
>> > > >> need
>> > > >> >> >> the
>> > > >> >> >>>> keyword for this create DDL. For other places (such as
>> > function
>> > > >> >> >>>> referencing), no keyword or special namespace is needed.
>> > > >> >> >>>>
>> > > >> >> >>>> Thanks,
>> > > >> >> >>>> Xuefu
>> > > >> >> >>>>
>> > > >> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>> > > >> >> >>>> wysakowicz.dawid@gmail.com>
>> > > >> >> >>>> wrote:
>> > > >> >> >>>>
>> > > >> >> >>>>> Hi,
>> > > >> >> >>>>> I think it makes sense to start voting at this point.
>> > > >> >> >>>>>
>> > > >> >> >>>>> Option 1: Only 1-part identifiers
>> > > >> >> >>>>> PROS:
>> > > >> >> >>>>> - allows shadowing built-in functions
>> > > >> >> >>>>> CONS:
>> > > >> >> >>>>> - incosistent with all the other objects, both permanent &
>> > > >> temporary
>> > > >> >> >>>>> - does not allow shadowing catalog functions
>> > > >> >> >>>>>
>> > > >> >> >>>>> Option 2: Special keyword for built-in function
>> > > >> >> >>>>> I think this is quite similar to the special catalog/db.
>> The
>> > > >> thing I
>> > > >> >> >> am
>> > > >> >> >>>>> strongly against in this proposal is the GLOBAL keyword.
>> This
>> > > >> >> keyword
>> > > >> >> >>>> has a
>> > > >> >> >>>>> meaning in rdbms systems and means a function that is
>> present
>> > > >> for a
>> > > >> >> >>>>> lifetime of a session in which it was created, but
>> available
>> > in
>> > > >> all
>> > > >> >> >>> other
>> > > >> >> >>>>> sessions. Therefore I really don't want to use this
>> keyword
>> > in
>> > > a
>> > > >> >> >>>> different
>> > > >> >> >>>>> context.
>> > > >> >> >>>>>
>> > > >> >> >>>>> Option 3: Special catalog/db
>> > > >> >> >>>>>
>> > > >> >> >>>>> PROS:
>> > > >> >> >>>>> - allows shadowing built-in functions
>> > > >> >> >>>>> - allows shadowing catalog functions
>> > > >> >> >>>>> - consistent with other objects
>> > > >> >> >>>>> CONS:
>> > > >> >> >>>>> - we introduce a special namespace for built-in functions
>> > > >> >> >>>>>
>> > > >> >> >>>>> I don't see a problem with introducing the special
>> namespace.
>> > > In
>> > > >> the
>> > > >> >> >>> end
>> > > >> >> >>>> it
>> > > >> >> >>>>> is very similar to the keyword approach. In this case the
>> > > >> catalog/db
>> > > >> >> >>>>> combination would be the "keyword"
>> > > >> >> >>>>>
>> > > >> >> >>>>> Therefore my votes:
>> > > >> >> >>>>> Option 1: -0
>> > > >> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with
>> a
>> > > >> better
>> > > >> >> >>>> keyword)
>> > > >> >> >>>>> Option 3: +1
>> > > >> >> >>>>>
>> > > >> >> >>>>> Best,
>> > > >> >> >>>>> Dawid
>> > > >> >> >>>>>
>> > > >> >> >>>>>
>> > > >> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
>> > wrote:
>> > > >> >> >>>>>
>> > > >> >> >>>>>> Hi Aljoscha,
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> Thanks for the summary and these are great questions to
>> be
>> > > >> >> >> answered.
>> > > >> >> >>>> The
>> > > >> >> >>>>>> answer to your first question is clear: there is a
>> general
>> > > >> >> >> agreement
>> > > >> >> >>> to
>> > > >> >> >>>>>> override built-in functions with temp functions.
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> However, your second and third questions are sort of
>> > related,
>> > > >> as a
>> > > >> >> >>>>> function
>> > > >> >> >>>>>> reference can be either just function name (like "func")
>> or
>> > in
>> > > >> the
>> > > >> >> >>> form
>> > > >> >> >>>>> or
>> > > >> >> >>>>>> "cat.db.func". When a reference is just function name, it
>> > can
>> > > >> mean
>> > > >> >> >>>>> either a
>> > > >> >> >>>>>> built-in function or a function defined in the current
>> > cat/db.
>> > > >> If
>> > > >> >> >> we
>> > > >> >> >>>>>> support overriding a built-in function with a temp
>> function,
>> > > >> such
>> > > >> >> >>>>>> overriding can also cover a function in the current
>> cat/db.
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> I think what Timo referred as "overriding a catalog
>> > function"
>> > > >> >> >> means a
>> > > >> >> >>>>> temp
>> > > >> >> >>>>>> function defined as "cat.db.func" overrides a catalog
>> > function
>> > > >> >> >> "func"
>> > > >> >> >>>> in
>> > > >> >> >>>>>> cat/db even if cat/db is not current. To support this,
>> temp
>> > > >> >> >> function
>> > > >> >> >>>> has
>> > > >> >> >>>>> to
>> > > >> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd
>> > and
>> > > >> 3rd
>> > > >> >> >>>>> questions
>> > > >> >> >>>>>> are related. The problem with such support is the
>> ambiguity
>> > > when
>> > > >> >> >> user
>> > > >> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY
>> FUNCTION
>> > > >> func
>> > > >> >> >>> ...".
>> > > >> >> >>>>>> Here "func" can means a global temp function, or a temp
>> > > >> function in
>> > > >> >> >>>>> current
>> > > >> >> >>>>>> cat/db. If we can assume the former, this creates an
>> > > >> inconsistency
>> > > >> >> >>>>> because
>> > > >> >> >>>>>> "CREATE FUNCTION func" actually means a function in
>> current
>> > > >> cat/db.
>> > > >> >> >>> If
>> > > >> >> >>>> we
>> > > >> >> >>>>>> assume the latter, then there is no way for user to
>> create a
>> > > >> global
>> > > >> >> >>>> temp
>> > > >> >> >>>>>> function.
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> Giving a special namespace for built-in functions may
>> solve
>> > > the
>> > > >> >> >>>> ambiguity
>> > > >> >> >>>>>> problem above, but it also introduces artificial
>> > > >> catalog/database
>> > > >> >> >>> that
>> > > >> >> >>>>>> needs special treatment and pollutes the cleanness of
>> the
>> > > >> code. I
>> > > >> >> >>>> would
>> > > >> >> >>>>>> rather introduce a syntax in DDL to solve the problem,
>> like
>> > > >> "CREATE
>> > > >> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for
>> > > voting
>> > > >> >> >>>>> purposes:
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> 1. Support only global, temporary functions without
>> > namespace.
>> > > >> Such
>> > > >> >> >>>> temp
>> > > >> >> >>>>>> functions overrides built-in functions and catalog
>> functions
>> > > in
>> > > >> >> >>> current
>> > > >> >> >>>>>> cat/db. The resolution order is: temp functions ->
>> built-in
>> > > >> >> >> functions
>> > > >> >> >>>> ->
>> > > >> >> >>>>>> catalog functions. (Partially or fully qualified
>> functions
>> > has
>> > > >> no
>> > > >> >> >>>>>> ambiguity!)
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> 2. In addition to #1, support creating and referencing
>> > > temporary
>> > > >> >> >>>>> functions
>> > > >> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
>> for
>> > > >> global
>> > > >> >> >>> temp
>> > > >> >> >>>>>> functions. The resolution order is: global temp
>> functions ->
>> > > >> >> >> built-in
>> > > >> >> >>>>>> functions -> temp functions in current cat/db -> catalog
>> > > >> function.
>> > > >> >> >>>>>> (Resolution for partially or fully qualified function
>> > > reference
>> > > >> is:
>> > > >> >> >>>> temp
>> > > >> >> >>>>>> functions -> persistent functions.)
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> 3. In addition to #1, support creating and referencing
>> > > temporary
>> > > >> >> >>>>> functions
>> > > >> >> >>>>>> associated with a cat/db with a special namespace for
>> > built-in
>> > > >> >> >>>> functions
>> > > >> >> >>>>>> and global temp functions. The resolution is the same as
>> #2,
>> > > >> except
>> > > >> >> >>>> that
>> > > >> >> >>>>>> the special namespace might be prefixed to a reference
>> to a
>> > > >> >> >> built-in
>> > > >> >> >>>>>> function or global temp function. (In absence of the
>> special
>> > > >> >> >>> namespace,
>> > > >> >> >>>>> the
>> > > >> >> >>>>>> resolution order is the same as in #2.)
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> My personal preference is #1, given the unknown use case
>> and
>> > > >> >> >>> introduced
>> > > >> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
>> > > >> alternative.
>> > > >> >> >>>> Thus,
>> > > >> >> >>>>>> my votes are:
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> +1 for #1
>> > > >> >> >>>>>> +0 for #2
>> > > >> >> >>>>>> -1 for #3
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> Everyone, please cast your vote (in above format
>> please!),
>> > or
>> > > >> let
>> > > >> >> >> me
>> > > >> >> >>>> know
>> > > >> >> >>>>>> if you have more questions or other candidates.
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> Thanks,
>> > > >> >> >>>>>> Xuefu
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>> > > >> >> >>> aljoscha@apache.org>
>> > > >> >> >>>>>> wrote:
>> > > >> >> >>>>>>
>> > > >> >> >>>>>>> Hi,
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
>> > > >> >> >> connected.
>> > > >> >> >>>> To
>> > > >> >> >>>>>>> resolve the differences, think we have to think about
>> the
>> > > basic
>> > > >> >> >>>>>> principles
>> > > >> >> >>>>>>> and find consensus there. The basic questions I see are:
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>> - Do we want to support overriding builtin functions?
>> > > >> >> >>>>>>> - Do we want to support overriding catalog functions?
>> > > >> >> >>>>>>> - And then later: should temporary functions be tied to
>> a
>> > > >> >> >>>>>>> catalog/database?
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>> I don’t have much to say about these, except that we
>> should
>> > > >> >> >>> somewhat
>> > > >> >> >>>>>> stick
>> > > >> >> >>>>>>> to what the industry does. But I also understand that
>> the
>> > > >> >> >> industry
>> > > >> >> >>> is
>> > > >> >> >>>>>>> already very divided on this.
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>> Best,
>> > > >> >> >>>>>>> Aljoscha
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
>> > > wrote:
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>> Hi,
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>> +1 to strive for reaching consensus on the remaining
>> > topics.
>> > > >> We
>> > > >> >> >>> are
>> > > >> >> >>>>>>> close to the truth. It will waste a lot of time if we
>> > resume
>> > > >> the
>> > > >> >> >>>> topic
>> > > >> >> >>>>>> some
>> > > >> >> >>>>>>> time later.
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>> > > >> >> >>> “cat.db.fun”
>> > > >> >> >>>>> way
>> > > >> >> >>>>>>> to override a catalog function.
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>> > > >> >> >>> nonexistent
>> > > >> >> >>>>> cat
>> > > >> >> >>>>>>> & db? And we still need to do special treatment for the
>> > > >> dedicated
>> > > >> >> >>>>>>> system.system cat & db?
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>> Best,
>> > > >> >> >>>>>>>> Jark
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org>
>> 写道:
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>> Hi everyone,
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
>> > > >> >> >>>> incrementally.
>> > > >> >> >>>>>>> Users should be able to override all catalog objects
>> > > >> consistently
>> > > >> >> >>>>>> according
>> > > >> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table
>> module).
>> > > If
>> > > >> >> >>>>> functions
>> > > >> >> >>>>>>> are treated completely different, we need more code and
>> > > special
>> > > >> >> >>>> cases.
>> > > >> >> >>>>>> From
>> > > >> >> >>>>>>> an implementation perspective, this topic only affects
>> the
>> > > >> lookup
>> > > >> >> >>>> logic
>> > > >> >> >>>>>>> which is rather low implementation effort which is why I
>> > > would
>> > > >> >> >> like
>> > > >> >> >>>> to
>> > > >> >> >>>>>>> clarify the remaining items. As you said, we have a
>> slight
>> > > >> >> >> consenus
>> > > >> >> >>>> on
>> > > >> >> >>>>>>> overriding built-in functions; we should also strive for
>> > > >> reaching
>> > > >> >> >>>>>> consensus
>> > > >> >> >>>>>>> on the remaining topics.
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering
>> > catalog
>> > > >> >> >>> objects
>> > > >> >> >>>>>>> consistent and the overriding of built-in functions more
>> > > >> >> >> explicit.
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>> Thanks,
>> > > >> >> >>>>>>>>> Timo
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>> > > >> >> >>>>>>>>>> hi, everyone
>> > > >> >> >>>>>>>>>> I think this flip is very meaningful. it supports
>> > > functions
>> > > >> >> >>> that
>> > > >> >> >>>>> can
>> > > >> >> >>>>>> be
>> > > >> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
>> > > >> >> >> duplication
>> > > >> >> >>> of
>> > > >> >> >>>>>>> functions.
>> > > >> >> >>>>>>>>>>
>> > > >> >> >>>>>>>>>> Our group based on flink's sql parser module
>> implements
>> > > >> >> >> create
>> > > >> >> >>>>>> function
>> > > >> >> >>>>>>>>>> feature, stores the parsed function metadata and
>> schema
>> > > into
>> > > >> >> >>>> mysql,
>> > > >> >> >>>>>> and
>> > > >> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to
>> > > >> support
>> > > >> >> >>>>> custom
>> > > >> >> >>>>>>>>>> schemas and functions. Loaded, but the function is
>> > > currently
>> > > >> >> >>>>> global,
>> > > >> >> >>>>>>> and is
>> > > >> >> >>>>>>>>>> not subdivided according to catalog and db.
>> > > >> >> >>>>>>>>>>
>> > > >> >> >>>>>>>>>> In addition, I very much hope to participate in the
>> > > >> >> >> development
>> > > >> >> >>>> of
>> > > >> >> >>>>>> this
>> > > >> >> >>>>>>>>>> flip, I have been paying attention to the community,
>> but
>> > > >> >> >> found
>> > > >> >> >>> it
>> > > >> >> >>>>> is
>> > > >> >> >>>>>>> more
>> > > >> >> >>>>>>>>>> difficult to join.
>> > > >> >> >>>>>>>>>> thank you.
>> > > >> >> >>>>>>>>>>
>> > > >> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> > > >> >> >>>>>>>>>>
>> > > >> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> It seems to me that there is a general consensus on
>> > > having
>> > > >> >> >>> temp
>> > > >> >> >>>>>>> functions
>> > > >> >> >>>>>>>>>>> that have no namespaces and overwrite built-in
>> > functions.
>> > > >> >> >> (As
>> > > >> >> >>> a
>> > > >> >> >>>>> side
>> > > >> >> >>>>>>> note
>> > > >> >> >>>>>>>>>>> for comparability, the current user defined
>> functions
>> > are
>> > > >> >> >> all
>> > > >> >> >>>>>>> temporary and
>> > > >> >> >>>>>>>>>>> having no namespaces.)
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having
>> > > namespaced
>> > > >> >> >>> temp
>> > > >> >> >>>>>>> functions
>> > > >> >> >>>>>>>>>>> that can overwrite functions defined in a specific
>> > > cat/db.
>> > > >> >> >>>>> However,
>> > > >> >> >>>>>>> this
>> > > >> >> >>>>>>>>>>> idea appears orthogonal to the former and can be
>> added
>> > > >> >> >>>>>> incrementally.
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> How about we first implement non-namespaced temp
>> > > functions
>> > > >> >> >> now
>> > > >> >> >>>> and
>> > > >> >> >>>>>>> leave
>> > > >> >> >>>>>>>>>>> the door open for namespaced ones for later
>> releases as
>> > > the
>> > > >> >> >>>>>>> requirement
>> > > >> >> >>>>>>>>>>> might become more crystal? This also helps shorten
>> the
>> > > >> >> >> debate
>> > > >> >> >>>> and
>> > > >> >> >>>>>>> allow us
>> > > >> >> >>>>>>>>>>> to make some progress along this direction.
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
>> host
>> > > the
>> > > >> >> >>>>>> temporary
>> > > >> >> >>>>>>> temp
>> > > >> >> >>>>>>>>>>> functions that don't have namespaces, my only
>> concern
>> > is
>> > > >> the
>> > > >> >> >>>>> special
>> > > >> >> >>>>>>>>>>> treatment for a cat/db, which makes code less
>> clean, as
>> > > >> >> >>> evident
>> > > >> >> >>>> in
>> > > >> >> >>>>>>> treating
>> > > >> >> >>>>>>>>>>> the built-in catalog currently.
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> Thanks,
>> > > >> >> >>>>>>>>>>> Xuefiu
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> > > >> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
>> > > >> >> >>>>>>>>>>> wrote:
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> Hi,
>> > > >> >> >>>>>>>>>>>> Another idea to consider on top of Timo's
>> suggestion.
>> > > How
>> > > >> >> >>> about
>> > > >> >> >>>>> we
>> > > >> >> >>>>>>> have a
>> > > >> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
>> > > >> >> >> objects?
>> > > >> >> >>>> This
>> > > >> >> >>>>>>> catalog
>> > > >> >> >>>>>>>>>>>> would be invisible for users as Xuefu was
>> suggesting.
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> Then users could still override built-in
>> functions, if
>> > > >> they
>> > > >> >> >>>> fully
>> > > >> >> >>>>>>> qualify
>> > > >> >> >>>>>>>>>>>> object with the built-in namespace, but by default
>> the
>> > > >> >> >> common
>> > > >> >> >>>>> logic
>> > > >> >> >>>>>>> of
>> > > >> >> >>>>>>>>>>>> current dB & cat would be used.
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>> > > >> >> >>>>>>>>>>>> registers temporary function in current cat & dB
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>> > > >> >> >>>>>>>>>>>> registers temporary function in cat db
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>> > > >> >> >>>>>>>>>>>> Overrides built-in function with temporary function
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> The built-in/system namespace would not be writable
>> > for
>> > > >> >> >>>> permanent
>> > > >> >> >>>>>>>>>>> objects.
>> > > >> >> >>>>>>>>>>>> WDYT?
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> This way I think we can have benefits of both
>> > solutions.
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> Best,
>> > > >> >> >>>>>>>>>>>> Dawid
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>> > > >> >> >> twalthr@apache.org
>> > > >> >> >>>>
>> > > >> >> >>>>>> wrote:
>> > > >> >> >>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> Hi Bowen,
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> I understand the potential benefit of overriding
>> > > certain
>> > > >> >> >>>>> built-in
>> > > >> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many
>> people
>> > > >> >> >> agree.
>> > > >> >> >>>>>>> However, it
>> > > >> >> >>>>>>>>>>>>> would be great to still support overriding catalog
>> > > >> >> >> functions
>> > > >> >> >>>>> with
>> > > >> >> >>>>>>>>>>>>> temporary functions in order to prototype a query
>> > even
>> > > >> >> >>> though
>> > > >> >> >>>> a
>> > > >> >> >>>>>>>>>>>>> catalog/database might not be available currently
>> or
>> > > >> >> >> should
>> > > >> >> >>>> not
>> > > >> >> >>>>> be
>> > > >> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>> > > >> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
>> > > >> >> >>> consideres
>> > > >> >> >>>>>>> current
>> > > >> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL
>> but
>> > > >> >> >>>> acceptable
>> > > >> >> >>>>>> for
>> > > >> >> >>>>>>>>>>>>> functions I guess.
>> > > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>> > > >> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in
>> > objects
>> > > >> >> >>>> (tables,
>> > > >> >> >>>>>>> views)
>> > > >> >> >>>>>>>>>>>>> except functions", this might change in the near
>> > > future.
>> > > >> >> >>> Take
>> > > >> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900
>> as
>> > > an
>> > > >> >> >>>>> example.
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> Thanks,
>> > > >> >> >>>>>>>>>>>>> Timo
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>> > > >> >> >>>>>>>>>>>>>> Hi Fabian,
>> > > >> >> >>>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
>> > favorable
>> > > >> >> >>> thus I
>> > > >> >> >>>>>>> didn't
>> > > >> >> >>>>>>>>>>>>>> include that as a voting option, and the
>> discussion
>> > is
>> > > >> >> >>> mainly
>> > > >> >> >>>>>>> between
>> > > >> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
>> > > builtin.
>> > > >> >> >>>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
>> > > >> >> >> differently
>> > > >> >> >>>>>> treated
>> > > >> >> >>>>>>>>>>> than
>> > > >> >> >>>>>>>>>>>>>> other db objects.
>> > > >> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the
>> fact
>> > > that
>> > > >> >> >>>>>> functions
>> > > >> >> >>>>>>>>>>> are
>> > > >> >> >>>>>>>>>>>> a
>> > > >> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't
>> have
>> > > any
>> > > >> >> >>> other
>> > > >> >> >>>>>>>>>>> built-in
>> > > >> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
>> > > >> >> >>>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>>> Cheers,
>> > > >> >> >>>>>>>>>>>>>> Bowen
>> > > >> >> >>>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>>>
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> --
>> > > >> >> >>>>>>>>>>> Xuefu Zhang
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>>> "In Honey We Trust!"
>> > > >> >> >>>>>>>>>>>
>> > > >> >> >>>>>>>>>
>> > > >> >> >>>>>>>>
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>>
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> --
>> > > >> >> >>>>>> Xuefu Zhang
>> > > >> >> >>>>>>
>> > > >> >> >>>>>> "In Honey We Trust!"
>> > > >> >> >>>>>>
>> > > >> >> >>>>>
>> > > >> >> >>>>
>> > > >> >> >>>>
>> > > >> >> >>>> --
>> > > >> >> >>>> Xuefu Zhang
>> > > >> >> >>>>
>> > > >> >> >>>> "In Honey We Trust!"
>> > > >> >> >>>>
>> > > >> >> >>>
>> > > >> >> >>
>> > > >> >> >>
>> > > >> >> >> --
>> > > >> >> >> Xuefu Zhang
>> > > >> >> >>
>> > > >> >> >> "In Honey We Trust!"
>> > > >> >> >>
>> > > >> >>
>> > > >> >>
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi,

Thanks everyone for your votes. I summarized the result as following:

#1:3 (+1), 1 (0), 4(-1)     - net: -1
#2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
        Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)       - net: -1

Given the result, I'd like to change my vote for #2 from 0 to +1, to make
it a stronger case with net +3.5. So the votes so far are:

#1:3 (+1), 1 (0), 4(-1)     - net: -1
#2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
        Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)       - net: -1

What do you think? Do you think we can conclude with this result? Or would
you like to take it as a formal FLIP vote with 3 days voting period?

BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
FUNCTION xxx TEMPORARILY" because
1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
TEMPORARY FUNCTION"
2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
function but it actually doesn't, the logic only creates a temp function
with higher priority than that built-in function in ambiguous resolution
order; and it would behave inconsistently with "ALTER FUNCTION".



On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fh...@gmail.com> wrote:

> I agree, it's very similar from the implementation point of view and the
> implications.
>
> IMO, the difference is mostly on the mental model for the user.
> Instead of having a special class of temporary functions that have
> precedence over builtin functions it suggests to temporarily change
> built-in functions.
>
> Fabian
>
> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <yk...@gmail.com>:
>
> > Hi Fabian,
> >
> > I think it's almost the same with #2 with different keyword:
> >
> > CREATE TEMPORARY BUILTIN FUNCTION xxx
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I thought about it a bit more and think that there is some good value
> in
> > my
> > > last proposal.
> > >
> > > A lot of complexity comes from the fact that we want to allow
> overriding
> > > built-in functions which are differently addressed as other functions
> > (and
> > > db objects).
> > > We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
> thing
> > as
> > > "CREATE FUNCTION" and treat both functions exactly the same except
> that:
> > > 1) temp functions disappear at the end of the session
> > > 2) temp function are resolved before other functions
> > >
> > > This would be Dawid's proposal from the beginning of this thread (in
> case
> > > you still remember... ;-) )
> > >
> > > Temporarily overriding built-in functions would be supported with an
> > > explicit command like
> > >
> > > ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> > >
> > > This would also address the concerns about accidentally changing the
> > > semantics of built-in functions.
> > > IMO, it can't get much more explicit than the above command.
> > >
> > > Sorry for bringing up a new option in the middle of the discussion, but
> > as
> > > I said, I think it has a bunch of benefits and I don't see major
> > drawbacks
> > > (maybe you do?).
> > >
> > > What do you think?
> > >
> > > Fabian
> > >
> > > Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > > fhueske@gmail.com
> > > >:
> > >
> > > > Hi everyone,
> > > >
> > > > I thought again about option #1 and something that I don't like is
> that
> > > > the resolved address of xyz is different in "CREATE FUNCTION xyz" and
> > > > "CREATE TEMPORARY FUNCTION xyz".
> > > > IMO, adding the keyword "TEMPORARY" should only change the lifecycle
> of
> > > > the function, but not where it is located. This implicitly changed
> > > location
> > > > might be confusing for users.
> > > > After all, a temp function should behave pretty much like any other
> > > > function, except for the fact that it disappears when the session is
> > > closed.
> > > >
> > > > Approach #2 with the additional keyword would make that pretty clear,
> > > IMO.
> > > > However, I neither like GLOBAL (for reasons mentioned by Dawid) or
> > > BUILDIN
> > > > (we are not adding a built-in function).
> > > > So I'd be OK with #2 if we find a good keyword. In fact, approach #2
> > > could
> > > > also be an alias for approach #3 to avoid explicit specification of
> the
> > > > system catalog/db.
> > > >
> > > > Approach #3 would be consistent with other db objects and the "CREATE
> > > > FUNCTION" statement.
> > > > Adding system catalog/db seems rather complex, but then again how
> often
> > > do
> > > > we expect users to override built-in functions? If this becomes a
> major
> > > > issue, we can still add option #2 as an alias.
> > > >
> > > > Not sure what's the best approach from an internal point of view,
> but I
> > > > certainly think that consistent behavior is important.
> > > > Hence my votes are:
> > > >
> > > > -1 for #1
> > > > 0 for #2
> > > > 0 for #3
> > > >
> > > > Btw. Did we consider a completely separate command for overriding
> > > built-in
> > > > functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> > > >
> > > > Cheers, Fabian
> > > >
> > > >
> > > > Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > > <lz...@aliyun.com.invalid>:
> > > >
> > > >> I know Hive and Spark can shadow built-in functions by temporary
> > > function.
> > > >> Mysql, Oracle, Sql server can not shadow.
> > > >> User can use full names to access functions instead of shadowing.
> > > >>
> > > >> So I think it is a completely new thing, and the direct way to deal
> > with
> > > >> new things is to add new grammar. So,
> > > >> +1 for #2, +0 for #3, -1 for #1
> > > >>
> > > >> Best,
> > > >> Jingsong Lee
> > > >>
> > > >>
> > > >> ------------------------------------------------------------------
> > > >> From:Kurt Young <yk...@gmail.com>
> > > >> Send Time:2019年9月19日(星期四) 16:43
> > > >> To:dev <de...@flink.apache.org>
> > > >> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > > >>
> > > >> And let me make my vote complete:
> > > >>
> > > >> -1 for #1
> > > >> +1 for #2 with different keyword
> > > >> -0 for #3
> > > >>
> > > >> Best,
> > > >> Kurt
> > > >>
> > > >>
> > > >> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com>
> wrote:
> > > >>
> > > >> > Looks like I'm the only person who is willing to +1 to #2 for now
> > :-)
> > > >> > But I would suggest to change the keyword from GLOBAL to
> > > >> > something like BUILTIN.
> > > >> >
> > > >> > I think #2 and #3 are almost the same proposal, just with
> different
> > > >> > format to indicate whether it want to override built-in functions.
> > > >> >
> > > >> > My biggest reason to choose it is I want this behavior be
> consistent
> > > >> > with temporal tables. I will give some examples to show the
> behavior
> > > >> > and also make sure I'm not misunderstanding anything here.
> > > >> >
> > > >> > For most DBs, when user create a temporary table with:
> > > >> >
> > > >> > CREATE TEMPORARY TABLE t1
> > > >> >
> > > >> > It's actually equivalent with:
> > > >> >
> > > >> > CREATE TEMPORARY TABLE `curent_db`.t1
> > > >> >
> > > >> > If user change current database, they will not be able to access
> t1
> > > >> without
> > > >> > fully qualified name, .i.e db1.t1 (assuming db1 is current
> database
> > > when
> > > >> > this temporary table is created).
> > > >> >
> > > >> > Only #2 and #3 followed this behavior and I would vote for this
> > since
> > > >> this
> > > >> > makes such behavior consistent through temporal tables and
> > functions.
> > > >> >
> > > >> > Why I'm not voting for #3 is a special catalog and database just
> > looks
> > > >> very
> > > >> > hacky to me. It gave a imply that our built-in functions saved at
> a
> > > >> > special
> > > >> > catalog and database, which is actually not. Introducing a
> dedicated
> > > >> > keyword
> > > >> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > > >> > straightforward. One can argue that we should avoid introducing
> new
> > > >> > keyword,
> > > >> > but it's also very rare that a system can overwrite built-in
> > > functions.
> > > >> > Since we
> > > >> > decided to support this, introduce a new keyword is not a big deal
> > > IMO.
> > > >> >
> > > >> > Best,
> > > >> > Kurt
> > > >> >
> > > >> >
> > > >> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
> piotr@ververica.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> It is a quite long discussion to follow and I hope I didn’t
> > > >> misunderstand
> > > >> >> anything. From the proposals presented by Xuefu I would vote:
> > > >> >>
> > > >> >> -1 for #1 and #2
> > > >> >> +1 for #3
> > > >> >>
> > > >> >> Besides #3 being IMO more general and more consistent, having
> > > qualified
> > > >> >> names (#3) would help/make easier for someone to use cross
> > > >> >> databases/catalogs queries (joining multiple data sets/streams).
> > For
> > > >> >> example with some functions to manipulate/clean up/convert the
> > stored
> > > >> data
> > > >> >> in different catalogs registered in the respective catalogs.
> > > >> >>
> > > >> >> Piotrek
> > > >> >>
> > > >> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> > > >> >> >
> > > >> >> > I agree with Xuefu that inconsistent handling with all the
> other
> > > >> >> objects is
> > > >> >> > not a big problem.
> > > >> >> >
> > > >> >> > Regarding to option#3, the special "system.system" namespace
> may
> > > >> confuse
> > > >> >> > users.
> > > >> >> > Users need to know the set of built-in function names to know
> > when
> > > to
> > > >> >> use
> > > >> >> > "system.system" namespace.
> > > >> >> > What will happen if user registers a non-builtin function name
> > > under
> > > >> the
> > > >> >> > "system.system" namespace?
> > > >> >> > Besides, I think it doesn't solve the "explode" problem I
> > mentioned
> > > >> at
> > > >> >> the
> > > >> >> > beginning of this thread.
> > > >> >> >
> > > >> >> > So here is my vote:
> > > >> >> >
> > > >> >> > +1 for #1
> > > >> >> > 0 for #2
> > > >> >> > -1 for #3
> > > >> >> >
> > > >> >> > Best,
> > > >> >> > Jark
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com>
> wrote:
> > > >> >> >
> > > >> >> >> @Dawid, Re: we also don't need additional referencing the
> > > >> >> specialcatalog
> > > >> >> >> anywhere.
> > > >> >> >>
> > > >> >> >> True. But once we allow such reference, then user can do so in
> > any
> > > >> >> possible
> > > >> >> >> place where a function name is expected, for which we have to
> > > >> handle.
> > > >> >> >> That's a big difference, I think.
> > > >> >> >>
> > > >> >> >> Thanks,
> > > >> >> >> Xuefu
> > > >> >> >>
> > > >> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > > >> >> >> wysakowicz.dawid@gmail.com>
> > > >> >> >> wrote:
> > > >> >> >>
> > > >> >> >>> @Bowen I am not suggesting introducing additional catalog. I
> > > think
> > > >> we
> > > >> >> >> need
> > > >> >> >>> to get rid of the current built-in catalog.
> > > >> >> >>>
> > > >> >> >>> @Xuefu in option #3 we also don't need additional referencing
> > the
> > > >> >> special
> > > >> >> >>> catalog anywhere else besides in the CREATE statement. The
> > > >> resolution
> > > >> >> >>> behaviour is exactly the same in both options.
> > > >> >> >>>
> > > >> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com>
> wrote:
> > > >> >> >>>
> > > >> >> >>>> Hi Dawid,
> > > >> >> >>>>
> > > >> >> >>>> "GLOBAL" is a temporary keyword that was given to the
> > approach.
> > > It
> > > >> >> can
> > > >> >> >> be
> > > >> >> >>>> changed to something else for better.
> > > >> >> >>>>
> > > >> >> >>>> The difference between this and the #3 approach is that we
> > only
> > > >> need
> > > >> >> >> the
> > > >> >> >>>> keyword for this create DDL. For other places (such as
> > function
> > > >> >> >>>> referencing), no keyword or special namespace is needed.
> > > >> >> >>>>
> > > >> >> >>>> Thanks,
> > > >> >> >>>> Xuefu
> > > >> >> >>>>
> > > >> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > >> >> >>>> wysakowicz.dawid@gmail.com>
> > > >> >> >>>> wrote:
> > > >> >> >>>>
> > > >> >> >>>>> Hi,
> > > >> >> >>>>> I think it makes sense to start voting at this point.
> > > >> >> >>>>>
> > > >> >> >>>>> Option 1: Only 1-part identifiers
> > > >> >> >>>>> PROS:
> > > >> >> >>>>> - allows shadowing built-in functions
> > > >> >> >>>>> CONS:
> > > >> >> >>>>> - incosistent with all the other objects, both permanent &
> > > >> temporary
> > > >> >> >>>>> - does not allow shadowing catalog functions
> > > >> >> >>>>>
> > > >> >> >>>>> Option 2: Special keyword for built-in function
> > > >> >> >>>>> I think this is quite similar to the special catalog/db.
> The
> > > >> thing I
> > > >> >> >> am
> > > >> >> >>>>> strongly against in this proposal is the GLOBAL keyword.
> This
> > > >> >> keyword
> > > >> >> >>>> has a
> > > >> >> >>>>> meaning in rdbms systems and means a function that is
> present
> > > >> for a
> > > >> >> >>>>> lifetime of a session in which it was created, but
> available
> > in
> > > >> all
> > > >> >> >>> other
> > > >> >> >>>>> sessions. Therefore I really don't want to use this keyword
> > in
> > > a
> > > >> >> >>>> different
> > > >> >> >>>>> context.
> > > >> >> >>>>>
> > > >> >> >>>>> Option 3: Special catalog/db
> > > >> >> >>>>>
> > > >> >> >>>>> PROS:
> > > >> >> >>>>> - allows shadowing built-in functions
> > > >> >> >>>>> - allows shadowing catalog functions
> > > >> >> >>>>> - consistent with other objects
> > > >> >> >>>>> CONS:
> > > >> >> >>>>> - we introduce a special namespace for built-in functions
> > > >> >> >>>>>
> > > >> >> >>>>> I don't see a problem with introducing the special
> namespace.
> > > In
> > > >> the
> > > >> >> >>> end
> > > >> >> >>>> it
> > > >> >> >>>>> is very similar to the keyword approach. In this case the
> > > >> catalog/db
> > > >> >> >>>>> combination would be the "keyword"
> > > >> >> >>>>>
> > > >> >> >>>>> Therefore my votes:
> > > >> >> >>>>> Option 1: -0
> > > >> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a
> > > >> better
> > > >> >> >>>> keyword)
> > > >> >> >>>>> Option 3: +1
> > > >> >> >>>>>
> > > >> >> >>>>> Best,
> > > >> >> >>>>> Dawid
> > > >> >> >>>>>
> > > >> >> >>>>>
> > > >> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
> > wrote:
> > > >> >> >>>>>
> > > >> >> >>>>>> Hi Aljoscha,
> > > >> >> >>>>>>
> > > >> >> >>>>>> Thanks for the summary and these are great questions to be
> > > >> >> >> answered.
> > > >> >> >>>> The
> > > >> >> >>>>>> answer to your first question is clear: there is a general
> > > >> >> >> agreement
> > > >> >> >>> to
> > > >> >> >>>>>> override built-in functions with temp functions.
> > > >> >> >>>>>>
> > > >> >> >>>>>> However, your second and third questions are sort of
> > related,
> > > >> as a
> > > >> >> >>>>> function
> > > >> >> >>>>>> reference can be either just function name (like "func")
> or
> > in
> > > >> the
> > > >> >> >>> form
> > > >> >> >>>>> or
> > > >> >> >>>>>> "cat.db.func". When a reference is just function name, it
> > can
> > > >> mean
> > > >> >> >>>>> either a
> > > >> >> >>>>>> built-in function or a function defined in the current
> > cat/db.
> > > >> If
> > > >> >> >> we
> > > >> >> >>>>>> support overriding a built-in function with a temp
> function,
> > > >> such
> > > >> >> >>>>>> overriding can also cover a function in the current
> cat/db.
> > > >> >> >>>>>>
> > > >> >> >>>>>> I think what Timo referred as "overriding a catalog
> > function"
> > > >> >> >> means a
> > > >> >> >>>>> temp
> > > >> >> >>>>>> function defined as "cat.db.func" overrides a catalog
> > function
> > > >> >> >> "func"
> > > >> >> >>>> in
> > > >> >> >>>>>> cat/db even if cat/db is not current. To support this,
> temp
> > > >> >> >> function
> > > >> >> >>>> has
> > > >> >> >>>>> to
> > > >> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd
> > and
> > > >> 3rd
> > > >> >> >>>>> questions
> > > >> >> >>>>>> are related. The problem with such support is the
> ambiguity
> > > when
> > > >> >> >> user
> > > >> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY
> FUNCTION
> > > >> func
> > > >> >> >>> ...".
> > > >> >> >>>>>> Here "func" can means a global temp function, or a temp
> > > >> function in
> > > >> >> >>>>> current
> > > >> >> >>>>>> cat/db. If we can assume the former, this creates an
> > > >> inconsistency
> > > >> >> >>>>> because
> > > >> >> >>>>>> "CREATE FUNCTION func" actually means a function in
> current
> > > >> cat/db.
> > > >> >> >>> If
> > > >> >> >>>> we
> > > >> >> >>>>>> assume the latter, then there is no way for user to
> create a
> > > >> global
> > > >> >> >>>> temp
> > > >> >> >>>>>> function.
> > > >> >> >>>>>>
> > > >> >> >>>>>> Giving a special namespace for built-in functions may
> solve
> > > the
> > > >> >> >>>> ambiguity
> > > >> >> >>>>>> problem above, but it also introduces artificial
> > > >> catalog/database
> > > >> >> >>> that
> > > >> >> >>>>>> needs special treatment and pollutes the cleanness of  the
> > > >> code. I
> > > >> >> >>>> would
> > > >> >> >>>>>> rather introduce a syntax in DDL to solve the problem,
> like
> > > >> "CREATE
> > > >> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > > >> >> >>>>>>
> > > >> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for
> > > voting
> > > >> >> >>>>> purposes:
> > > >> >> >>>>>>
> > > >> >> >>>>>> 1. Support only global, temporary functions without
> > namespace.
> > > >> Such
> > > >> >> >>>> temp
> > > >> >> >>>>>> functions overrides built-in functions and catalog
> functions
> > > in
> > > >> >> >>> current
> > > >> >> >>>>>> cat/db. The resolution order is: temp functions ->
> built-in
> > > >> >> >> functions
> > > >> >> >>>> ->
> > > >> >> >>>>>> catalog functions. (Partially or fully qualified functions
> > has
> > > >> no
> > > >> >> >>>>>> ambiguity!)
> > > >> >> >>>>>>
> > > >> >> >>>>>> 2. In addition to #1, support creating and referencing
> > > temporary
> > > >> >> >>>>> functions
> > > >> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL
> for
> > > >> global
> > > >> >> >>> temp
> > > >> >> >>>>>> functions. The resolution order is: global temp functions
> ->
> > > >> >> >> built-in
> > > >> >> >>>>>> functions -> temp functions in current cat/db -> catalog
> > > >> function.
> > > >> >> >>>>>> (Resolution for partially or fully qualified function
> > > reference
> > > >> is:
> > > >> >> >>>> temp
> > > >> >> >>>>>> functions -> persistent functions.)
> > > >> >> >>>>>>
> > > >> >> >>>>>> 3. In addition to #1, support creating and referencing
> > > temporary
> > > >> >> >>>>> functions
> > > >> >> >>>>>> associated with a cat/db with a special namespace for
> > built-in
> > > >> >> >>>> functions
> > > >> >> >>>>>> and global temp functions. The resolution is the same as
> #2,
> > > >> except
> > > >> >> >>>> that
> > > >> >> >>>>>> the special namespace might be prefixed to a reference to
> a
> > > >> >> >> built-in
> > > >> >> >>>>>> function or global temp function. (In absence of the
> special
> > > >> >> >>> namespace,
> > > >> >> >>>>> the
> > > >> >> >>>>>> resolution order is the same as in #2.)
> > > >> >> >>>>>>
> > > >> >> >>>>>> My personal preference is #1, given the unknown use case
> and
> > > >> >> >>> introduced
> > > >> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
> > > >> alternative.
> > > >> >> >>>> Thus,
> > > >> >> >>>>>> my votes are:
> > > >> >> >>>>>>
> > > >> >> >>>>>> +1 for #1
> > > >> >> >>>>>> +0 for #2
> > > >> >> >>>>>> -1 for #3
> > > >> >> >>>>>>
> > > >> >> >>>>>> Everyone, please cast your vote (in above format please!),
> > or
> > > >> let
> > > >> >> >> me
> > > >> >> >>>> know
> > > >> >> >>>>>> if you have more questions or other candidates.
> > > >> >> >>>>>>
> > > >> >> >>>>>> Thanks,
> > > >> >> >>>>>> Xuefu
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > > >> >> >>> aljoscha@apache.org>
> > > >> >> >>>>>> wrote:
> > > >> >> >>>>>>
> > > >> >> >>>>>>> Hi,
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
> > > >> >> >> connected.
> > > >> >> >>>> To
> > > >> >> >>>>>>> resolve the differences, think we have to think about the
> > > basic
> > > >> >> >>>>>> principles
> > > >> >> >>>>>>> and find consensus there. The basic questions I see are:
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> - Do we want to support overriding builtin functions?
> > > >> >> >>>>>>> - Do we want to support overriding catalog functions?
> > > >> >> >>>>>>> - And then later: should temporary functions be tied to a
> > > >> >> >>>>>>> catalog/database?
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> I don’t have much to say about these, except that we
> should
> > > >> >> >>> somewhat
> > > >> >> >>>>>> stick
> > > >> >> >>>>>>> to what the industry does. But I also understand that the
> > > >> >> >> industry
> > > >> >> >>> is
> > > >> >> >>>>>>> already very divided on this.
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> Best,
> > > >> >> >>>>>>> Aljoscha
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
> > > wrote:
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> Hi,
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> +1 to strive for reaching consensus on the remaining
> > topics.
> > > >> We
> > > >> >> >>> are
> > > >> >> >>>>>>> close to the truth. It will waste a lot of time if we
> > resume
> > > >> the
> > > >> >> >>>> topic
> > > >> >> >>>>>> some
> > > >> >> >>>>>>> time later.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> > > >> >> >>> “cat.db.fun”
> > > >> >> >>>>> way
> > > >> >> >>>>>>> to override a catalog function.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> > > >> >> >>> nonexistent
> > > >> >> >>>>> cat
> > > >> >> >>>>>>> & db? And we still need to do special treatment for the
> > > >> dedicated
> > > >> >> >>>>>>> system.system cat & db?
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> Best,
> > > >> >> >>>>>>>> Jark
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org>
> 写道:
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Hi everyone,
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
> > > >> >> >>>> incrementally.
> > > >> >> >>>>>>> Users should be able to override all catalog objects
> > > >> consistently
> > > >> >> >>>>>> according
> > > >> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table
> module).
> > > If
> > > >> >> >>>>> functions
> > > >> >> >>>>>>> are treated completely different, we need more code and
> > > special
> > > >> >> >>>> cases.
> > > >> >> >>>>>> From
> > > >> >> >>>>>>> an implementation perspective, this topic only affects
> the
> > > >> lookup
> > > >> >> >>>> logic
> > > >> >> >>>>>>> which is rather low implementation effort which is why I
> > > would
> > > >> >> >> like
> > > >> >> >>>> to
> > > >> >> >>>>>>> clarify the remaining items. As you said, we have a
> slight
> > > >> >> >> consenus
> > > >> >> >>>> on
> > > >> >> >>>>>>> overriding built-in functions; we should also strive for
> > > >> reaching
> > > >> >> >>>>>> consensus
> > > >> >> >>>>>>> on the remaining topics.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering
> > catalog
> > > >> >> >>> objects
> > > >> >> >>>>>>> consistent and the overriding of built-in functions more
> > > >> >> >> explicit.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Thanks,
> > > >> >> >>>>>>>>> Timo
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > > >> >> >>>>>>>>>> hi, everyone
> > > >> >> >>>>>>>>>> I think this flip is very meaningful. it supports
> > > functions
> > > >> >> >>> that
> > > >> >> >>>>> can
> > > >> >> >>>>>> be
> > > >> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
> > > >> >> >> duplication
> > > >> >> >>> of
> > > >> >> >>>>>>> functions.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> Our group based on flink's sql parser module
> implements
> > > >> >> >> create
> > > >> >> >>>>>> function
> > > >> >> >>>>>>>>>> feature, stores the parsed function metadata and
> schema
> > > into
> > > >> >> >>>> mysql,
> > > >> >> >>>>>> and
> > > >> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to
> > > >> support
> > > >> >> >>>>> custom
> > > >> >> >>>>>>>>>> schemas and functions. Loaded, but the function is
> > > currently
> > > >> >> >>>>> global,
> > > >> >> >>>>>>> and is
> > > >> >> >>>>>>>>>> not subdivided according to catalog and db.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> In addition, I very much hope to participate in the
> > > >> >> >> development
> > > >> >> >>>> of
> > > >> >> >>>>>> this
> > > >> >> >>>>>>>>>> flip, I have been paying attention to the community,
> but
> > > >> >> >> found
> > > >> >> >>> it
> > > >> >> >>>>> is
> > > >> >> >>>>>>> more
> > > >> >> >>>>>>>>>> difficult to join.
> > > >> >> >>>>>>>>>> thank you.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> It seems to me that there is a general consensus on
> > > having
> > > >> >> >>> temp
> > > >> >> >>>>>>> functions
> > > >> >> >>>>>>>>>>> that have no namespaces and overwrite built-in
> > functions.
> > > >> >> >> (As
> > > >> >> >>> a
> > > >> >> >>>>> side
> > > >> >> >>>>>>> note
> > > >> >> >>>>>>>>>>> for comparability, the current user defined functions
> > are
> > > >> >> >> all
> > > >> >> >>>>>>> temporary and
> > > >> >> >>>>>>>>>>> having no namespaces.)
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having
> > > namespaced
> > > >> >> >>> temp
> > > >> >> >>>>>>> functions
> > > >> >> >>>>>>>>>>> that can overwrite functions defined in a specific
> > > cat/db.
> > > >> >> >>>>> However,
> > > >> >> >>>>>>> this
> > > >> >> >>>>>>>>>>> idea appears orthogonal to the former and can be
> added
> > > >> >> >>>>>> incrementally.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> How about we first implement non-namespaced temp
> > > functions
> > > >> >> >> now
> > > >> >> >>>> and
> > > >> >> >>>>>>> leave
> > > >> >> >>>>>>>>>>> the door open for namespaced ones for later releases
> as
> > > the
> > > >> >> >>>>>>> requirement
> > > >> >> >>>>>>>>>>> might become more crystal? This also helps shorten
> the
> > > >> >> >> debate
> > > >> >> >>>> and
> > > >> >> >>>>>>> allow us
> > > >> >> >>>>>>>>>>> to make some progress along this direction.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to
> host
> > > the
> > > >> >> >>>>>> temporary
> > > >> >> >>>>>>> temp
> > > >> >> >>>>>>>>>>> functions that don't have namespaces, my only concern
> > is
> > > >> the
> > > >> >> >>>>> special
> > > >> >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean,
> as
> > > >> >> >>> evident
> > > >> >> >>>> in
> > > >> >> >>>>>>> treating
> > > >> >> >>>>>>>>>>> the built-in catalog currently.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> Thanks,
> > > >> >> >>>>>>>>>>> Xuefiu
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > >> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > > >> >> >>>>>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> Hi,
> > > >> >> >>>>>>>>>>>> Another idea to consider on top of Timo's
> suggestion.
> > > How
> > > >> >> >>> about
> > > >> >> >>>>> we
> > > >> >> >>>>>>> have a
> > > >> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
> > > >> >> >> objects?
> > > >> >> >>>> This
> > > >> >> >>>>>>> catalog
> > > >> >> >>>>>>>>>>>> would be invisible for users as Xuefu was
> suggesting.
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> Then users could still override built-in functions,
> if
> > > >> they
> > > >> >> >>>> fully
> > > >> >> >>>>>>> qualify
> > > >> >> >>>>>>>>>>>> object with the built-in namespace, but by default
> the
> > > >> >> >> common
> > > >> >> >>>>> logic
> > > >> >> >>>>>>> of
> > > >> >> >>>>>>>>>>>> current dB & cat would be used.
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > > >> >> >>>>>>>>>>>> registers temporary function in current cat & dB
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > >> >> >>>>>>>>>>>> registers temporary function in cat db
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > >> >> >>>>>>>>>>>> Overrides built-in function with temporary function
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> The built-in/system namespace would not be writable
> > for
> > > >> >> >>>> permanent
> > > >> >> >>>>>>>>>>> objects.
> > > >> >> >>>>>>>>>>>> WDYT?
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> This way I think we can have benefits of both
> > solutions.
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> Best,
> > > >> >> >>>>>>>>>>>> Dawid
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > > >> >> >> twalthr@apache.org
> > > >> >> >>>>
> > > >> >> >>>>>> wrote:
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Hi Bowen,
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> I understand the potential benefit of overriding
> > > certain
> > > >> >> >>>>> built-in
> > > >> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many
> people
> > > >> >> >> agree.
> > > >> >> >>>>>>> However, it
> > > >> >> >>>>>>>>>>>>> would be great to still support overriding catalog
> > > >> >> >> functions
> > > >> >> >>>>> with
> > > >> >> >>>>>>>>>>>>> temporary functions in order to prototype a query
> > even
> > > >> >> >>> though
> > > >> >> >>>> a
> > > >> >> >>>>>>>>>>>>> catalog/database might not be available currently
> or
> > > >> >> >> should
> > > >> >> >>>> not
> > > >> >> >>>>> be
> > > >> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > > >> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
> > > >> >> >>> consideres
> > > >> >> >>>>>>> current
> > > >> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL
> but
> > > >> >> >>>> acceptable
> > > >> >> >>>>>> for
> > > >> >> >>>>>>>>>>>>> functions I guess.
> > > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > >> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> > objects
> > > >> >> >>>> (tables,
> > > >> >> >>>>>>> views)
> > > >> >> >>>>>>>>>>>>> except functions", this might change in the near
> > > future.
> > > >> >> >>> Take
> > > >> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900
> as
> > > an
> > > >> >> >>>>> example.
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Thanks,
> > > >> >> >>>>>>>>>>>>> Timo
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > >> >> >>>>>>>>>>>>>> Hi Fabian,
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> > favorable
> > > >> >> >>> thus I
> > > >> >> >>>>>>> didn't
> > > >> >> >>>>>>>>>>>>>> include that as a voting option, and the
> discussion
> > is
> > > >> >> >>> mainly
> > > >> >> >>>>>>> between
> > > >> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> > > builtin.
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
> > > >> >> >> differently
> > > >> >> >>>>>> treated
> > > >> >> >>>>>>>>>>> than
> > > >> >> >>>>>>>>>>>>>> other db objects.
> > > >> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the
> fact
> > > that
> > > >> >> >>>>>> functions
> > > >> >> >>>>>>>>>>> are
> > > >> >> >>>>>>>>>>>> a
> > > >> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't
> have
> > > any
> > > >> >> >>> other
> > > >> >> >>>>>>>>>>> built-in
> > > >> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> Cheers,
> > > >> >> >>>>>>>>>>>>>> Bowen
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> --
> > > >> >> >>>>>>>>>>> Xuefu Zhang
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> "In Honey We Trust!"
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>> --
> > > >> >> >>>>>> Xuefu Zhang
> > > >> >> >>>>>>
> > > >> >> >>>>>> "In Honey We Trust!"
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>>
> > > >> >> >>>> --
> > > >> >> >>>> Xuefu Zhang
> > > >> >> >>>>
> > > >> >> >>>> "In Honey We Trust!"
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> --
> > > >> >> >> Xuefu Zhang
> > > >> >> >>
> > > >> >> >> "In Honey We Trust!"
> > > >> >> >>
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
I agree, it's very similar from the implementation point of view and the
implications.

IMO, the difference is mostly on the mental model for the user.
Instead of having a special class of temporary functions that have
precedence over builtin functions it suggests to temporarily change
built-in functions.

Fabian

Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <yk...@gmail.com>:

> Hi Fabian,
>
> I think it's almost the same with #2 with different keyword:
>
> CREATE TEMPORARY BUILTIN FUNCTION xxx
>
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi,
> >
> > I thought about it a bit more and think that there is some good value in
> my
> > last proposal.
> >
> > A lot of complexity comes from the fact that we want to allow overriding
> > built-in functions which are differently addressed as other functions
> (and
> > db objects).
> > We could just have "CREATE TEMPORARY FUNCTION" do exactly the same thing
> as
> > "CREATE FUNCTION" and treat both functions exactly the same except that:
> > 1) temp functions disappear at the end of the session
> > 2) temp function are resolved before other functions
> >
> > This would be Dawid's proposal from the beginning of this thread (in case
> > you still remember... ;-) )
> >
> > Temporarily overriding built-in functions would be supported with an
> > explicit command like
> >
> > ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
> >
> > This would also address the concerns about accidentally changing the
> > semantics of built-in functions.
> > IMO, it can't get much more explicit than the above command.
> >
> > Sorry for bringing up a new option in the middle of the discussion, but
> as
> > I said, I think it has a bunch of benefits and I don't see major
> drawbacks
> > (maybe you do?).
> >
> > What do you think?
> >
> > Fabian
> >
> > Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> > fhueske@gmail.com
> > >:
> >
> > > Hi everyone,
> > >
> > > I thought again about option #1 and something that I don't like is that
> > > the resolved address of xyz is different in "CREATE FUNCTION xyz" and
> > > "CREATE TEMPORARY FUNCTION xyz".
> > > IMO, adding the keyword "TEMPORARY" should only change the lifecycle of
> > > the function, but not where it is located. This implicitly changed
> > location
> > > might be confusing for users.
> > > After all, a temp function should behave pretty much like any other
> > > function, except for the fact that it disappears when the session is
> > closed.
> > >
> > > Approach #2 with the additional keyword would make that pretty clear,
> > IMO.
> > > However, I neither like GLOBAL (for reasons mentioned by Dawid) or
> > BUILDIN
> > > (we are not adding a built-in function).
> > > So I'd be OK with #2 if we find a good keyword. In fact, approach #2
> > could
> > > also be an alias for approach #3 to avoid explicit specification of the
> > > system catalog/db.
> > >
> > > Approach #3 would be consistent with other db objects and the "CREATE
> > > FUNCTION" statement.
> > > Adding system catalog/db seems rather complex, but then again how often
> > do
> > > we expect users to override built-in functions? If this becomes a major
> > > issue, we can still add option #2 as an alias.
> > >
> > > Not sure what's the best approach from an internal point of view, but I
> > > certainly think that consistent behavior is important.
> > > Hence my votes are:
> > >
> > > -1 for #1
> > > 0 for #2
> > > 0 for #3
> > >
> > > Btw. Did we consider a completely separate command for overriding
> > built-in
> > > functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> > >
> > > Cheers, Fabian
> > >
> > >
> > > Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > <lz...@aliyun.com.invalid>:
> > >
> > >> I know Hive and Spark can shadow built-in functions by temporary
> > function.
> > >> Mysql, Oracle, Sql server can not shadow.
> > >> User can use full names to access functions instead of shadowing.
> > >>
> > >> So I think it is a completely new thing, and the direct way to deal
> with
> > >> new things is to add new grammar. So,
> > >> +1 for #2, +0 for #3, -1 for #1
> > >>
> > >> Best,
> > >> Jingsong Lee
> > >>
> > >>
> > >> ------------------------------------------------------------------
> > >> From:Kurt Young <yk...@gmail.com>
> > >> Send Time:2019年9月19日(星期四) 16:43
> > >> To:dev <de...@flink.apache.org>
> > >> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> > >>
> > >> And let me make my vote complete:
> > >>
> > >> -1 for #1
> > >> +1 for #2 with different keyword
> > >> -0 for #3
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:
> > >>
> > >> > Looks like I'm the only person who is willing to +1 to #2 for now
> :-)
> > >> > But I would suggest to change the keyword from GLOBAL to
> > >> > something like BUILTIN.
> > >> >
> > >> > I think #2 and #3 are almost the same proposal, just with different
> > >> > format to indicate whether it want to override built-in functions.
> > >> >
> > >> > My biggest reason to choose it is I want this behavior be consistent
> > >> > with temporal tables. I will give some examples to show the behavior
> > >> > and also make sure I'm not misunderstanding anything here.
> > >> >
> > >> > For most DBs, when user create a temporary table with:
> > >> >
> > >> > CREATE TEMPORARY TABLE t1
> > >> >
> > >> > It's actually equivalent with:
> > >> >
> > >> > CREATE TEMPORARY TABLE `curent_db`.t1
> > >> >
> > >> > If user change current database, they will not be able to access t1
> > >> without
> > >> > fully qualified name, .i.e db1.t1 (assuming db1 is current database
> > when
> > >> > this temporary table is created).
> > >> >
> > >> > Only #2 and #3 followed this behavior and I would vote for this
> since
> > >> this
> > >> > makes such behavior consistent through temporal tables and
> functions.
> > >> >
> > >> > Why I'm not voting for #3 is a special catalog and database just
> looks
> > >> very
> > >> > hacky to me. It gave a imply that our built-in functions saved at a
> > >> > special
> > >> > catalog and database, which is actually not. Introducing a dedicated
> > >> > keyword
> > >> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > >> > straightforward. One can argue that we should avoid introducing new
> > >> > keyword,
> > >> > but it's also very rare that a system can overwrite built-in
> > functions.
> > >> > Since we
> > >> > decided to support this, introduce a new keyword is not a big deal
> > IMO.
> > >> >
> > >> > Best,
> > >> > Kurt
> > >> >
> > >> >
> > >> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <piotr@ververica.com
> >
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> It is a quite long discussion to follow and I hope I didn’t
> > >> misunderstand
> > >> >> anything. From the proposals presented by Xuefu I would vote:
> > >> >>
> > >> >> -1 for #1 and #2
> > >> >> +1 for #3
> > >> >>
> > >> >> Besides #3 being IMO more general and more consistent, having
> > qualified
> > >> >> names (#3) would help/make easier for someone to use cross
> > >> >> databases/catalogs queries (joining multiple data sets/streams).
> For
> > >> >> example with some functions to manipulate/clean up/convert the
> stored
> > >> data
> > >> >> in different catalogs registered in the respective catalogs.
> > >> >>
> > >> >> Piotrek
> > >> >>
> > >> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> > >> >> >
> > >> >> > I agree with Xuefu that inconsistent handling with all the other
> > >> >> objects is
> > >> >> > not a big problem.
> > >> >> >
> > >> >> > Regarding to option#3, the special "system.system" namespace may
> > >> confuse
> > >> >> > users.
> > >> >> > Users need to know the set of built-in function names to know
> when
> > to
> > >> >> use
> > >> >> > "system.system" namespace.
> > >> >> > What will happen if user registers a non-builtin function name
> > under
> > >> the
> > >> >> > "system.system" namespace?
> > >> >> > Besides, I think it doesn't solve the "explode" problem I
> mentioned
> > >> at
> > >> >> the
> > >> >> > beginning of this thread.
> > >> >> >
> > >> >> > So here is my vote:
> > >> >> >
> > >> >> > +1 for #1
> > >> >> > 0 for #2
> > >> >> > -1 for #3
> > >> >> >
> > >> >> > Best,
> > >> >> > Jark
> > >> >> >
> > >> >> >
> > >> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
> > >> >> >
> > >> >> >> @Dawid, Re: we also don't need additional referencing the
> > >> >> specialcatalog
> > >> >> >> anywhere.
> > >> >> >>
> > >> >> >> True. But once we allow such reference, then user can do so in
> any
> > >> >> possible
> > >> >> >> place where a function name is expected, for which we have to
> > >> handle.
> > >> >> >> That's a big difference, I think.
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >> Xuefu
> > >> >> >>
> > >> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> > >> >> >> wysakowicz.dawid@gmail.com>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >>> @Bowen I am not suggesting introducing additional catalog. I
> > think
> > >> we
> > >> >> >> need
> > >> >> >>> to get rid of the current built-in catalog.
> > >> >> >>>
> > >> >> >>> @Xuefu in option #3 we also don't need additional referencing
> the
> > >> >> special
> > >> >> >>> catalog anywhere else besides in the CREATE statement. The
> > >> resolution
> > >> >> >>> behaviour is exactly the same in both options.
> > >> >> >>>
> > >> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
> > >> >> >>>
> > >> >> >>>> Hi Dawid,
> > >> >> >>>>
> > >> >> >>>> "GLOBAL" is a temporary keyword that was given to the
> approach.
> > It
> > >> >> can
> > >> >> >> be
> > >> >> >>>> changed to something else for better.
> > >> >> >>>>
> > >> >> >>>> The difference between this and the #3 approach is that we
> only
> > >> need
> > >> >> >> the
> > >> >> >>>> keyword for this create DDL. For other places (such as
> function
> > >> >> >>>> referencing), no keyword or special namespace is needed.
> > >> >> >>>>
> > >> >> >>>> Thanks,
> > >> >> >>>> Xuefu
> > >> >> >>>>
> > >> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > >> >> >>>> wysakowicz.dawid@gmail.com>
> > >> >> >>>> wrote:
> > >> >> >>>>
> > >> >> >>>>> Hi,
> > >> >> >>>>> I think it makes sense to start voting at this point.
> > >> >> >>>>>
> > >> >> >>>>> Option 1: Only 1-part identifiers
> > >> >> >>>>> PROS:
> > >> >> >>>>> - allows shadowing built-in functions
> > >> >> >>>>> CONS:
> > >> >> >>>>> - incosistent with all the other objects, both permanent &
> > >> temporary
> > >> >> >>>>> - does not allow shadowing catalog functions
> > >> >> >>>>>
> > >> >> >>>>> Option 2: Special keyword for built-in function
> > >> >> >>>>> I think this is quite similar to the special catalog/db. The
> > >> thing I
> > >> >> >> am
> > >> >> >>>>> strongly against in this proposal is the GLOBAL keyword. This
> > >> >> keyword
> > >> >> >>>> has a
> > >> >> >>>>> meaning in rdbms systems and means a function that is present
> > >> for a
> > >> >> >>>>> lifetime of a session in which it was created, but available
> in
> > >> all
> > >> >> >>> other
> > >> >> >>>>> sessions. Therefore I really don't want to use this keyword
> in
> > a
> > >> >> >>>> different
> > >> >> >>>>> context.
> > >> >> >>>>>
> > >> >> >>>>> Option 3: Special catalog/db
> > >> >> >>>>>
> > >> >> >>>>> PROS:
> > >> >> >>>>> - allows shadowing built-in functions
> > >> >> >>>>> - allows shadowing catalog functions
> > >> >> >>>>> - consistent with other objects
> > >> >> >>>>> CONS:
> > >> >> >>>>> - we introduce a special namespace for built-in functions
> > >> >> >>>>>
> > >> >> >>>>> I don't see a problem with introducing the special namespace.
> > In
> > >> the
> > >> >> >>> end
> > >> >> >>>> it
> > >> >> >>>>> is very similar to the keyword approach. In this case the
> > >> catalog/db
> > >> >> >>>>> combination would be the "keyword"
> > >> >> >>>>>
> > >> >> >>>>> Therefore my votes:
> > >> >> >>>>> Option 1: -0
> > >> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a
> > >> better
> > >> >> >>>> keyword)
> > >> >> >>>>> Option 3: +1
> > >> >> >>>>>
> > >> >> >>>>> Best,
> > >> >> >>>>> Dawid
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com>
> wrote:
> > >> >> >>>>>
> > >> >> >>>>>> Hi Aljoscha,
> > >> >> >>>>>>
> > >> >> >>>>>> Thanks for the summary and these are great questions to be
> > >> >> >> answered.
> > >> >> >>>> The
> > >> >> >>>>>> answer to your first question is clear: there is a general
> > >> >> >> agreement
> > >> >> >>> to
> > >> >> >>>>>> override built-in functions with temp functions.
> > >> >> >>>>>>
> > >> >> >>>>>> However, your second and third questions are sort of
> related,
> > >> as a
> > >> >> >>>>> function
> > >> >> >>>>>> reference can be either just function name (like "func") or
> in
> > >> the
> > >> >> >>> form
> > >> >> >>>>> or
> > >> >> >>>>>> "cat.db.func". When a reference is just function name, it
> can
> > >> mean
> > >> >> >>>>> either a
> > >> >> >>>>>> built-in function or a function defined in the current
> cat/db.
> > >> If
> > >> >> >> we
> > >> >> >>>>>> support overriding a built-in function with a temp function,
> > >> such
> > >> >> >>>>>> overriding can also cover a function in the current cat/db.
> > >> >> >>>>>>
> > >> >> >>>>>> I think what Timo referred as "overriding a catalog
> function"
> > >> >> >> means a
> > >> >> >>>>> temp
> > >> >> >>>>>> function defined as "cat.db.func" overrides a catalog
> function
> > >> >> >> "func"
> > >> >> >>>> in
> > >> >> >>>>>> cat/db even if cat/db is not current. To support this, temp
> > >> >> >> function
> > >> >> >>>> has
> > >> >> >>>>> to
> > >> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd
> and
> > >> 3rd
> > >> >> >>>>> questions
> > >> >> >>>>>> are related. The problem with such support is the ambiguity
> > when
> > >> >> >> user
> > >> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION
> > >> func
> > >> >> >>> ...".
> > >> >> >>>>>> Here "func" can means a global temp function, or a temp
> > >> function in
> > >> >> >>>>> current
> > >> >> >>>>>> cat/db. If we can assume the former, this creates an
> > >> inconsistency
> > >> >> >>>>> because
> > >> >> >>>>>> "CREATE FUNCTION func" actually means a function in current
> > >> cat/db.
> > >> >> >>> If
> > >> >> >>>> we
> > >> >> >>>>>> assume the latter, then there is no way for user to create a
> > >> global
> > >> >> >>>> temp
> > >> >> >>>>>> function.
> > >> >> >>>>>>
> > >> >> >>>>>> Giving a special namespace for built-in functions may solve
> > the
> > >> >> >>>> ambiguity
> > >> >> >>>>>> problem above, but it also introduces artificial
> > >> catalog/database
> > >> >> >>> that
> > >> >> >>>>>> needs special treatment and pollutes the cleanness of  the
> > >> code. I
> > >> >> >>>> would
> > >> >> >>>>>> rather introduce a syntax in DDL to solve the problem, like
> > >> "CREATE
> > >> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> > >> >> >>>>>>
> > >> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for
> > voting
> > >> >> >>>>> purposes:
> > >> >> >>>>>>
> > >> >> >>>>>> 1. Support only global, temporary functions without
> namespace.
> > >> Such
> > >> >> >>>> temp
> > >> >> >>>>>> functions overrides built-in functions and catalog functions
> > in
> > >> >> >>> current
> > >> >> >>>>>> cat/db. The resolution order is: temp functions -> built-in
> > >> >> >> functions
> > >> >> >>>> ->
> > >> >> >>>>>> catalog functions. (Partially or fully qualified functions
> has
> > >> no
> > >> >> >>>>>> ambiguity!)
> > >> >> >>>>>>
> > >> >> >>>>>> 2. In addition to #1, support creating and referencing
> > temporary
> > >> >> >>>>> functions
> > >> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for
> > >> global
> > >> >> >>> temp
> > >> >> >>>>>> functions. The resolution order is: global temp functions ->
> > >> >> >> built-in
> > >> >> >>>>>> functions -> temp functions in current cat/db -> catalog
> > >> function.
> > >> >> >>>>>> (Resolution for partially or fully qualified function
> > reference
> > >> is:
> > >> >> >>>> temp
> > >> >> >>>>>> functions -> persistent functions.)
> > >> >> >>>>>>
> > >> >> >>>>>> 3. In addition to #1, support creating and referencing
> > temporary
> > >> >> >>>>> functions
> > >> >> >>>>>> associated with a cat/db with a special namespace for
> built-in
> > >> >> >>>> functions
> > >> >> >>>>>> and global temp functions. The resolution is the same as #2,
> > >> except
> > >> >> >>>> that
> > >> >> >>>>>> the special namespace might be prefixed to a reference to a
> > >> >> >> built-in
> > >> >> >>>>>> function or global temp function. (In absence of the special
> > >> >> >>> namespace,
> > >> >> >>>>> the
> > >> >> >>>>>> resolution order is the same as in #2.)
> > >> >> >>>>>>
> > >> >> >>>>>> My personal preference is #1, given the unknown use case and
> > >> >> >>> introduced
> > >> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
> > >> alternative.
> > >> >> >>>> Thus,
> > >> >> >>>>>> my votes are:
> > >> >> >>>>>>
> > >> >> >>>>>> +1 for #1
> > >> >> >>>>>> +0 for #2
> > >> >> >>>>>> -1 for #3
> > >> >> >>>>>>
> > >> >> >>>>>> Everyone, please cast your vote (in above format please!),
> or
> > >> let
> > >> >> >> me
> > >> >> >>>> know
> > >> >> >>>>>> if you have more questions or other candidates.
> > >> >> >>>>>>
> > >> >> >>>>>> Thanks,
> > >> >> >>>>>> Xuefu
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > >> >> >>> aljoscha@apache.org>
> > >> >> >>>>>> wrote:
> > >> >> >>>>>>
> > >> >> >>>>>>> Hi,
> > >> >> >>>>>>>
> > >> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
> > >> >> >> connected.
> > >> >> >>>> To
> > >> >> >>>>>>> resolve the differences, think we have to think about the
> > basic
> > >> >> >>>>>> principles
> > >> >> >>>>>>> and find consensus there. The basic questions I see are:
> > >> >> >>>>>>>
> > >> >> >>>>>>> - Do we want to support overriding builtin functions?
> > >> >> >>>>>>> - Do we want to support overriding catalog functions?
> > >> >> >>>>>>> - And then later: should temporary functions be tied to a
> > >> >> >>>>>>> catalog/database?
> > >> >> >>>>>>>
> > >> >> >>>>>>> I don’t have much to say about these, except that we should
> > >> >> >>> somewhat
> > >> >> >>>>>> stick
> > >> >> >>>>>>> to what the industry does. But I also understand that the
> > >> >> >> industry
> > >> >> >>> is
> > >> >> >>>>>>> already very divided on this.
> > >> >> >>>>>>>
> > >> >> >>>>>>> Best,
> > >> >> >>>>>>> Aljoscha
> > >> >> >>>>>>>
> > >> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
> > wrote:
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Hi,
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> +1 to strive for reaching consensus on the remaining
> topics.
> > >> We
> > >> >> >>> are
> > >> >> >>>>>>> close to the truth. It will waste a lot of time if we
> resume
> > >> the
> > >> >> >>>> topic
> > >> >> >>>>>> some
> > >> >> >>>>>>> time later.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> > >> >> >>> “cat.db.fun”
> > >> >> >>>>> way
> > >> >> >>>>>>> to override a catalog function.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> > >> >> >>> nonexistent
> > >> >> >>>>> cat
> > >> >> >>>>>>> & db? And we still need to do special treatment for the
> > >> dedicated
> > >> >> >>>>>>> system.system cat & db?
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Best,
> > >> >> >>>>>>>> Jark
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Hi everyone,
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
> > >> >> >>>> incrementally.
> > >> >> >>>>>>> Users should be able to override all catalog objects
> > >> consistently
> > >> >> >>>>>> according
> > >> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module).
> > If
> > >> >> >>>>> functions
> > >> >> >>>>>>> are treated completely different, we need more code and
> > special
> > >> >> >>>> cases.
> > >> >> >>>>>> From
> > >> >> >>>>>>> an implementation perspective, this topic only affects the
> > >> lookup
> > >> >> >>>> logic
> > >> >> >>>>>>> which is rather low implementation effort which is why I
> > would
> > >> >> >> like
> > >> >> >>>> to
> > >> >> >>>>>>> clarify the remaining items. As you said, we have a slight
> > >> >> >> consenus
> > >> >> >>>> on
> > >> >> >>>>>>> overriding built-in functions; we should also strive for
> > >> reaching
> > >> >> >>>>>> consensus
> > >> >> >>>>>>> on the remaining topics.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering
> catalog
> > >> >> >>> objects
> > >> >> >>>>>>> consistent and the overriding of built-in functions more
> > >> >> >> explicit.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Thanks,
> > >> >> >>>>>>>>> Timo
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> > >> >> >>>>>>>>>> hi, everyone
> > >> >> >>>>>>>>>> I think this flip is very meaningful. it supports
> > functions
> > >> >> >>> that
> > >> >> >>>>> can
> > >> >> >>>>>> be
> > >> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
> > >> >> >> duplication
> > >> >> >>> of
> > >> >> >>>>>>> functions.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Our group based on flink's sql parser module implements
> > >> >> >> create
> > >> >> >>>>>> function
> > >> >> >>>>>>>>>> feature, stores the parsed function metadata and schema
> > into
> > >> >> >>>> mysql,
> > >> >> >>>>>> and
> > >> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to
> > >> support
> > >> >> >>>>> custom
> > >> >> >>>>>>>>>> schemas and functions. Loaded, but the function is
> > currently
> > >> >> >>>>> global,
> > >> >> >>>>>>> and is
> > >> >> >>>>>>>>>> not subdivided according to catalog and db.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> In addition, I very much hope to participate in the
> > >> >> >> development
> > >> >> >>>> of
> > >> >> >>>>>> this
> > >> >> >>>>>>>>>> flip, I have been paying attention to the community, but
> > >> >> >> found
> > >> >> >>> it
> > >> >> >>>>> is
> > >> >> >>>>>>> more
> > >> >> >>>>>>>>>> difficult to join.
> > >> >> >>>>>>>>>> thank you.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> It seems to me that there is a general consensus on
> > having
> > >> >> >>> temp
> > >> >> >>>>>>> functions
> > >> >> >>>>>>>>>>> that have no namespaces and overwrite built-in
> functions.
> > >> >> >> (As
> > >> >> >>> a
> > >> >> >>>>> side
> > >> >> >>>>>>> note
> > >> >> >>>>>>>>>>> for comparability, the current user defined functions
> are
> > >> >> >> all
> > >> >> >>>>>>> temporary and
> > >> >> >>>>>>>>>>> having no namespaces.)
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having
> > namespaced
> > >> >> >>> temp
> > >> >> >>>>>>> functions
> > >> >> >>>>>>>>>>> that can overwrite functions defined in a specific
> > cat/db.
> > >> >> >>>>> However,
> > >> >> >>>>>>> this
> > >> >> >>>>>>>>>>> idea appears orthogonal to the former and can be added
> > >> >> >>>>>> incrementally.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> How about we first implement non-namespaced temp
> > functions
> > >> >> >> now
> > >> >> >>>> and
> > >> >> >>>>>>> leave
> > >> >> >>>>>>>>>>> the door open for namespaced ones for later releases as
> > the
> > >> >> >>>>>>> requirement
> > >> >> >>>>>>>>>>> might become more crystal? This also helps shorten the
> > >> >> >> debate
> > >> >> >>>> and
> > >> >> >>>>>>> allow us
> > >> >> >>>>>>>>>>> to make some progress along this direction.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host
> > the
> > >> >> >>>>>> temporary
> > >> >> >>>>>>> temp
> > >> >> >>>>>>>>>>> functions that don't have namespaces, my only concern
> is
> > >> the
> > >> >> >>>>> special
> > >> >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
> > >> >> >>> evident
> > >> >> >>>> in
> > >> >> >>>>>>> treating
> > >> >> >>>>>>>>>>> the built-in catalog currently.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> Thanks,
> > >> >> >>>>>>>>>>> Xuefiu
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > >> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> > >> >> >>>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>>> Hi,
> > >> >> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion.
> > How
> > >> >> >>> about
> > >> >> >>>>> we
> > >> >> >>>>>>> have a
> > >> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
> > >> >> >> objects?
> > >> >> >>>> This
> > >> >> >>>>>>> catalog
> > >> >> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> Then users could still override built-in functions, if
> > >> they
> > >> >> >>>> fully
> > >> >> >>>>>>> qualify
> > >> >> >>>>>>>>>>>> object with the built-in namespace, but by default the
> > >> >> >> common
> > >> >> >>>>> logic
> > >> >> >>>>>>> of
> > >> >> >>>>>>>>>>>> current dB & cat would be used.
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> > >> >> >>>>>>>>>>>> registers temporary function in current cat & dB
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > >> >> >>>>>>>>>>>> registers temporary function in cat db
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > >> >> >>>>>>>>>>>> Overrides built-in function with temporary function
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> The built-in/system namespace would not be writable
> for
> > >> >> >>>> permanent
> > >> >> >>>>>>>>>>> objects.
> > >> >> >>>>>>>>>>>> WDYT?
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> This way I think we can have benefits of both
> solutions.
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> Best,
> > >> >> >>>>>>>>>>>> Dawid
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> > >> >> >> twalthr@apache.org
> > >> >> >>>>
> > >> >> >>>>>> wrote:
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Hi Bowen,
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> I understand the potential benefit of overriding
> > certain
> > >> >> >>>>> built-in
> > >> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
> > >> >> >> agree.
> > >> >> >>>>>>> However, it
> > >> >> >>>>>>>>>>>>> would be great to still support overriding catalog
> > >> >> >> functions
> > >> >> >>>>> with
> > >> >> >>>>>>>>>>>>> temporary functions in order to prototype a query
> even
> > >> >> >>> though
> > >> >> >>>> a
> > >> >> >>>>>>>>>>>>> catalog/database might not be available currently or
> > >> >> >> should
> > >> >> >>>> not
> > >> >> >>>>> be
> > >> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> > >> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
> > >> >> >>> consideres
> > >> >> >>>>>>> current
> > >> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
> > >> >> >>>> acceptable
> > >> >> >>>>>> for
> > >> >> >>>>>>>>>>>>> functions I guess.
> > >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > >> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in
> objects
> > >> >> >>>> (tables,
> > >> >> >>>>>>> views)
> > >> >> >>>>>>>>>>>>> except functions", this might change in the near
> > future.
> > >> >> >>> Take
> > >> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as
> > an
> > >> >> >>>>> example.
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Thanks,
> > >> >> >>>>>>>>>>>>> Timo
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > >> >> >>>>>>>>>>>>>> Hi Fabian,
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least
> favorable
> > >> >> >>> thus I
> > >> >> >>>>>>> didn't
> > >> >> >>>>>>>>>>>>>> include that as a voting option, and the discussion
> is
> > >> >> >>> mainly
> > >> >> >>>>>>> between
> > >> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> > builtin.
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
> > >> >> >> differently
> > >> >> >>>>>> treated
> > >> >> >>>>>>>>>>> than
> > >> >> >>>>>>>>>>>>>> other db objects.
> > >> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact
> > that
> > >> >> >>>>>> functions
> > >> >> >>>>>>>>>>> are
> > >> >> >>>>>>>>>>>> a
> > >> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't have
> > any
> > >> >> >>> other
> > >> >> >>>>>>>>>>> built-in
> > >> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> Cheers,
> > >> >> >>>>>>>>>>>>>> Bowen
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> --
> > >> >> >>>>>>>>>>> Xuefu Zhang
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> "In Honey We Trust!"
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> --
> > >> >> >>>>>> Xuefu Zhang
> > >> >> >>>>>>
> > >> >> >>>>>> "In Honey We Trust!"
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>> --
> > >> >> >>>> Xuefu Zhang
> > >> >> >>>>
> > >> >> >>>> "In Honey We Trust!"
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >> >>
> > >> >> >> --
> > >> >> >> Xuefu Zhang
> > >> >> >>
> > >> >> >> "In Honey We Trust!"
> > >> >> >>
> > >> >>
> > >> >>
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
Hi Fabian,

I think it's almost the same with #2 with different keyword:

CREATE TEMPORARY BUILTIN FUNCTION xxx

Best,
Kurt


On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fh...@gmail.com> wrote:

> Hi,
>
> I thought about it a bit more and think that there is some good value in my
> last proposal.
>
> A lot of complexity comes from the fact that we want to allow overriding
> built-in functions which are differently addressed as other functions (and
> db objects).
> We could just have "CREATE TEMPORARY FUNCTION" do exactly the same thing as
> "CREATE FUNCTION" and treat both functions exactly the same except that:
> 1) temp functions disappear at the end of the session
> 2) temp function are resolved before other functions
>
> This would be Dawid's proposal from the beginning of this thread (in case
> you still remember... ;-) )
>
> Temporarily overriding built-in functions would be supported with an
> explicit command like
>
> ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
>
> This would also address the concerns about accidentally changing the
> semantics of built-in functions.
> IMO, it can't get much more explicit than the above command.
>
> Sorry for bringing up a new option in the middle of the discussion, but as
> I said, I think it has a bunch of benefits and I don't see major drawbacks
> (maybe you do?).
>
> What do you think?
>
> Fabian
>
> Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
> fhueske@gmail.com
> >:
>
> > Hi everyone,
> >
> > I thought again about option #1 and something that I don't like is that
> > the resolved address of xyz is different in "CREATE FUNCTION xyz" and
> > "CREATE TEMPORARY FUNCTION xyz".
> > IMO, adding the keyword "TEMPORARY" should only change the lifecycle of
> > the function, but not where it is located. This implicitly changed
> location
> > might be confusing for users.
> > After all, a temp function should behave pretty much like any other
> > function, except for the fact that it disappears when the session is
> closed.
> >
> > Approach #2 with the additional keyword would make that pretty clear,
> IMO.
> > However, I neither like GLOBAL (for reasons mentioned by Dawid) or
> BUILDIN
> > (we are not adding a built-in function).
> > So I'd be OK with #2 if we find a good keyword. In fact, approach #2
> could
> > also be an alias for approach #3 to avoid explicit specification of the
> > system catalog/db.
> >
> > Approach #3 would be consistent with other db objects and the "CREATE
> > FUNCTION" statement.
> > Adding system catalog/db seems rather complex, but then again how often
> do
> > we expect users to override built-in functions? If this becomes a major
> > issue, we can still add option #2 as an alias.
> >
> > Not sure what's the best approach from an internal point of view, but I
> > certainly think that consistent behavior is important.
> > Hence my votes are:
> >
> > -1 for #1
> > 0 for #2
> > 0 for #3
> >
> > Btw. Did we consider a completely separate command for overriding
> built-in
> > functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> >
> > Cheers, Fabian
> >
> >
> > Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > <lz...@aliyun.com.invalid>:
> >
> >> I know Hive and Spark can shadow built-in functions by temporary
> function.
> >> Mysql, Oracle, Sql server can not shadow.
> >> User can use full names to access functions instead of shadowing.
> >>
> >> So I think it is a completely new thing, and the direct way to deal with
> >> new things is to add new grammar. So,
> >> +1 for #2, +0 for #3, -1 for #1
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >>
> >> ------------------------------------------------------------------
> >> From:Kurt Young <yk...@gmail.com>
> >> Send Time:2019年9月19日(星期四) 16:43
> >> To:dev <de...@flink.apache.org>
> >> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> >>
> >> And let me make my vote complete:
> >>
> >> -1 for #1
> >> +1 for #2 with different keyword
> >> -0 for #3
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:
> >>
> >> > Looks like I'm the only person who is willing to +1 to #2 for now :-)
> >> > But I would suggest to change the keyword from GLOBAL to
> >> > something like BUILTIN.
> >> >
> >> > I think #2 and #3 are almost the same proposal, just with different
> >> > format to indicate whether it want to override built-in functions.
> >> >
> >> > My biggest reason to choose it is I want this behavior be consistent
> >> > with temporal tables. I will give some examples to show the behavior
> >> > and also make sure I'm not misunderstanding anything here.
> >> >
> >> > For most DBs, when user create a temporary table with:
> >> >
> >> > CREATE TEMPORARY TABLE t1
> >> >
> >> > It's actually equivalent with:
> >> >
> >> > CREATE TEMPORARY TABLE `curent_db`.t1
> >> >
> >> > If user change current database, they will not be able to access t1
> >> without
> >> > fully qualified name, .i.e db1.t1 (assuming db1 is current database
> when
> >> > this temporary table is created).
> >> >
> >> > Only #2 and #3 followed this behavior and I would vote for this since
> >> this
> >> > makes such behavior consistent through temporal tables and functions.
> >> >
> >> > Why I'm not voting for #3 is a special catalog and database just looks
> >> very
> >> > hacky to me. It gave a imply that our built-in functions saved at a
> >> > special
> >> > catalog and database, which is actually not. Introducing a dedicated
> >> > keyword
> >> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> >> > straightforward. One can argue that we should avoid introducing new
> >> > keyword,
> >> > but it's also very rare that a system can overwrite built-in
> functions.
> >> > Since we
> >> > decided to support this, introduce a new keyword is not a big deal
> IMO.
> >> >
> >> > Best,
> >> > Kurt
> >> >
> >> >
> >> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> It is a quite long discussion to follow and I hope I didn’t
> >> misunderstand
> >> >> anything. From the proposals presented by Xuefu I would vote:
> >> >>
> >> >> -1 for #1 and #2
> >> >> +1 for #3
> >> >>
> >> >> Besides #3 being IMO more general and more consistent, having
> qualified
> >> >> names (#3) would help/make easier for someone to use cross
> >> >> databases/catalogs queries (joining multiple data sets/streams). For
> >> >> example with some functions to manipulate/clean up/convert the stored
> >> data
> >> >> in different catalogs registered in the respective catalogs.
> >> >>
> >> >> Piotrek
> >> >>
> >> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> >> >> >
> >> >> > I agree with Xuefu that inconsistent handling with all the other
> >> >> objects is
> >> >> > not a big problem.
> >> >> >
> >> >> > Regarding to option#3, the special "system.system" namespace may
> >> confuse
> >> >> > users.
> >> >> > Users need to know the set of built-in function names to know when
> to
> >> >> use
> >> >> > "system.system" namespace.
> >> >> > What will happen if user registers a non-builtin function name
> under
> >> the
> >> >> > "system.system" namespace?
> >> >> > Besides, I think it doesn't solve the "explode" problem I mentioned
> >> at
> >> >> the
> >> >> > beginning of this thread.
> >> >> >
> >> >> > So here is my vote:
> >> >> >
> >> >> > +1 for #1
> >> >> > 0 for #2
> >> >> > -1 for #3
> >> >> >
> >> >> > Best,
> >> >> > Jark
> >> >> >
> >> >> >
> >> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
> >> >> >
> >> >> >> @Dawid, Re: we also don't need additional referencing the
> >> >> specialcatalog
> >> >> >> anywhere.
> >> >> >>
> >> >> >> True. But once we allow such reference, then user can do so in any
> >> >> possible
> >> >> >> place where a function name is expected, for which we have to
> >> handle.
> >> >> >> That's a big difference, I think.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Xuefu
> >> >> >>
> >> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> >> >> >> wysakowicz.dawid@gmail.com>
> >> >> >> wrote:
> >> >> >>
> >> >> >>> @Bowen I am not suggesting introducing additional catalog. I
> think
> >> we
> >> >> >> need
> >> >> >>> to get rid of the current built-in catalog.
> >> >> >>>
> >> >> >>> @Xuefu in option #3 we also don't need additional referencing the
> >> >> special
> >> >> >>> catalog anywhere else besides in the CREATE statement. The
> >> resolution
> >> >> >>> behaviour is exactly the same in both options.
> >> >> >>>
> >> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
> >> >> >>>
> >> >> >>>> Hi Dawid,
> >> >> >>>>
> >> >> >>>> "GLOBAL" is a temporary keyword that was given to the approach.
> It
> >> >> can
> >> >> >> be
> >> >> >>>> changed to something else for better.
> >> >> >>>>
> >> >> >>>> The difference between this and the #3 approach is that we only
> >> need
> >> >> >> the
> >> >> >>>> keyword for this create DDL. For other places (such as function
> >> >> >>>> referencing), no keyword or special namespace is needed.
> >> >> >>>>
> >> >> >>>> Thanks,
> >> >> >>>> Xuefu
> >> >> >>>>
> >> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> >> >> >>>> wysakowicz.dawid@gmail.com>
> >> >> >>>> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>> I think it makes sense to start voting at this point.
> >> >> >>>>>
> >> >> >>>>> Option 1: Only 1-part identifiers
> >> >> >>>>> PROS:
> >> >> >>>>> - allows shadowing built-in functions
> >> >> >>>>> CONS:
> >> >> >>>>> - incosistent with all the other objects, both permanent &
> >> temporary
> >> >> >>>>> - does not allow shadowing catalog functions
> >> >> >>>>>
> >> >> >>>>> Option 2: Special keyword for built-in function
> >> >> >>>>> I think this is quite similar to the special catalog/db. The
> >> thing I
> >> >> >> am
> >> >> >>>>> strongly against in this proposal is the GLOBAL keyword. This
> >> >> keyword
> >> >> >>>> has a
> >> >> >>>>> meaning in rdbms systems and means a function that is present
> >> for a
> >> >> >>>>> lifetime of a session in which it was created, but available in
> >> all
> >> >> >>> other
> >> >> >>>>> sessions. Therefore I really don't want to use this keyword in
> a
> >> >> >>>> different
> >> >> >>>>> context.
> >> >> >>>>>
> >> >> >>>>> Option 3: Special catalog/db
> >> >> >>>>>
> >> >> >>>>> PROS:
> >> >> >>>>> - allows shadowing built-in functions
> >> >> >>>>> - allows shadowing catalog functions
> >> >> >>>>> - consistent with other objects
> >> >> >>>>> CONS:
> >> >> >>>>> - we introduce a special namespace for built-in functions
> >> >> >>>>>
> >> >> >>>>> I don't see a problem with introducing the special namespace.
> In
> >> the
> >> >> >>> end
> >> >> >>>> it
> >> >> >>>>> is very similar to the keyword approach. In this case the
> >> catalog/db
> >> >> >>>>> combination would be the "keyword"
> >> >> >>>>>
> >> >> >>>>> Therefore my votes:
> >> >> >>>>> Option 1: -0
> >> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a
> >> better
> >> >> >>>> keyword)
> >> >> >>>>> Option 3: +1
> >> >> >>>>>
> >> >> >>>>> Best,
> >> >> >>>>> Dawid
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >> >> >>>>>
> >> >> >>>>>> Hi Aljoscha,
> >> >> >>>>>>
> >> >> >>>>>> Thanks for the summary and these are great questions to be
> >> >> >> answered.
> >> >> >>>> The
> >> >> >>>>>> answer to your first question is clear: there is a general
> >> >> >> agreement
> >> >> >>> to
> >> >> >>>>>> override built-in functions with temp functions.
> >> >> >>>>>>
> >> >> >>>>>> However, your second and third questions are sort of related,
> >> as a
> >> >> >>>>> function
> >> >> >>>>>> reference can be either just function name (like "func") or in
> >> the
> >> >> >>> form
> >> >> >>>>> or
> >> >> >>>>>> "cat.db.func". When a reference is just function name, it can
> >> mean
> >> >> >>>>> either a
> >> >> >>>>>> built-in function or a function defined in the current cat/db.
> >> If
> >> >> >> we
> >> >> >>>>>> support overriding a built-in function with a temp function,
> >> such
> >> >> >>>>>> overriding can also cover a function in the current cat/db.
> >> >> >>>>>>
> >> >> >>>>>> I think what Timo referred as "overriding a catalog function"
> >> >> >> means a
> >> >> >>>>> temp
> >> >> >>>>>> function defined as "cat.db.func" overrides a catalog function
> >> >> >> "func"
> >> >> >>>> in
> >> >> >>>>>> cat/db even if cat/db is not current. To support this, temp
> >> >> >> function
> >> >> >>>> has
> >> >> >>>>> to
> >> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and
> >> 3rd
> >> >> >>>>> questions
> >> >> >>>>>> are related. The problem with such support is the ambiguity
> when
> >> >> >> user
> >> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION
> >> func
> >> >> >>> ...".
> >> >> >>>>>> Here "func" can means a global temp function, or a temp
> >> function in
> >> >> >>>>> current
> >> >> >>>>>> cat/db. If we can assume the former, this creates an
> >> inconsistency
> >> >> >>>>> because
> >> >> >>>>>> "CREATE FUNCTION func" actually means a function in current
> >> cat/db.
> >> >> >>> If
> >> >> >>>> we
> >> >> >>>>>> assume the latter, then there is no way for user to create a
> >> global
> >> >> >>>> temp
> >> >> >>>>>> function.
> >> >> >>>>>>
> >> >> >>>>>> Giving a special namespace for built-in functions may solve
> the
> >> >> >>>> ambiguity
> >> >> >>>>>> problem above, but it also introduces artificial
> >> catalog/database
> >> >> >>> that
> >> >> >>>>>> needs special treatment and pollutes the cleanness of  the
> >> code. I
> >> >> >>>> would
> >> >> >>>>>> rather introduce a syntax in DDL to solve the problem, like
> >> "CREATE
> >> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> >> >> >>>>>>
> >> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for
> voting
> >> >> >>>>> purposes:
> >> >> >>>>>>
> >> >> >>>>>> 1. Support only global, temporary functions without namespace.
> >> Such
> >> >> >>>> temp
> >> >> >>>>>> functions overrides built-in functions and catalog functions
> in
> >> >> >>> current
> >> >> >>>>>> cat/db. The resolution order is: temp functions -> built-in
> >> >> >> functions
> >> >> >>>> ->
> >> >> >>>>>> catalog functions. (Partially or fully qualified functions has
> >> no
> >> >> >>>>>> ambiguity!)
> >> >> >>>>>>
> >> >> >>>>>> 2. In addition to #1, support creating and referencing
> temporary
> >> >> >>>>> functions
> >> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for
> >> global
> >> >> >>> temp
> >> >> >>>>>> functions. The resolution order is: global temp functions ->
> >> >> >> built-in
> >> >> >>>>>> functions -> temp functions in current cat/db -> catalog
> >> function.
> >> >> >>>>>> (Resolution for partially or fully qualified function
> reference
> >> is:
> >> >> >>>> temp
> >> >> >>>>>> functions -> persistent functions.)
> >> >> >>>>>>
> >> >> >>>>>> 3. In addition to #1, support creating and referencing
> temporary
> >> >> >>>>> functions
> >> >> >>>>>> associated with a cat/db with a special namespace for built-in
> >> >> >>>> functions
> >> >> >>>>>> and global temp functions. The resolution is the same as #2,
> >> except
> >> >> >>>> that
> >> >> >>>>>> the special namespace might be prefixed to a reference to a
> >> >> >> built-in
> >> >> >>>>>> function or global temp function. (In absence of the special
> >> >> >>> namespace,
> >> >> >>>>> the
> >> >> >>>>>> resolution order is the same as in #2.)
> >> >> >>>>>>
> >> >> >>>>>> My personal preference is #1, given the unknown use case and
> >> >> >>> introduced
> >> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
> >> alternative.
> >> >> >>>> Thus,
> >> >> >>>>>> my votes are:
> >> >> >>>>>>
> >> >> >>>>>> +1 for #1
> >> >> >>>>>> +0 for #2
> >> >> >>>>>> -1 for #3
> >> >> >>>>>>
> >> >> >>>>>> Everyone, please cast your vote (in above format please!), or
> >> let
> >> >> >> me
> >> >> >>>> know
> >> >> >>>>>> if you have more questions or other candidates.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Xuefu
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> >> >> >>> aljoscha@apache.org>
> >> >> >>>>>> wrote:
> >> >> >>>>>>
> >> >> >>>>>>> Hi,
> >> >> >>>>>>>
> >> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
> >> >> >> connected.
> >> >> >>>> To
> >> >> >>>>>>> resolve the differences, think we have to think about the
> basic
> >> >> >>>>>> principles
> >> >> >>>>>>> and find consensus there. The basic questions I see are:
> >> >> >>>>>>>
> >> >> >>>>>>> - Do we want to support overriding builtin functions?
> >> >> >>>>>>> - Do we want to support overriding catalog functions?
> >> >> >>>>>>> - And then later: should temporary functions be tied to a
> >> >> >>>>>>> catalog/database?
> >> >> >>>>>>>
> >> >> >>>>>>> I don’t have much to say about these, except that we should
> >> >> >>> somewhat
> >> >> >>>>>> stick
> >> >> >>>>>>> to what the industry does. But I also understand that the
> >> >> >> industry
> >> >> >>> is
> >> >> >>>>>>> already very divided on this.
> >> >> >>>>>>>
> >> >> >>>>>>> Best,
> >> >> >>>>>>> Aljoscha
> >> >> >>>>>>>
> >> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com>
> wrote:
> >> >> >>>>>>>>
> >> >> >>>>>>>> Hi,
> >> >> >>>>>>>>
> >> >> >>>>>>>> +1 to strive for reaching consensus on the remaining topics.
> >> We
> >> >> >>> are
> >> >> >>>>>>> close to the truth. It will waste a lot of time if we resume
> >> the
> >> >> >>>> topic
> >> >> >>>>>> some
> >> >> >>>>>>> time later.
> >> >> >>>>>>>>
> >> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> >> >> >>> “cat.db.fun”
> >> >> >>>>> way
> >> >> >>>>>>> to override a catalog function.
> >> >> >>>>>>>>
> >> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> >> >> >>> nonexistent
> >> >> >>>>> cat
> >> >> >>>>>>> & db? And we still need to do special treatment for the
> >> dedicated
> >> >> >>>>>>> system.system cat & db?
> >> >> >>>>>>>>
> >> >> >>>>>>>> Best,
> >> >> >>>>>>>> Jark
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Hi everyone,
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
> >> >> >>>> incrementally.
> >> >> >>>>>>> Users should be able to override all catalog objects
> >> consistently
> >> >> >>>>>> according
> >> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module).
> If
> >> >> >>>>> functions
> >> >> >>>>>>> are treated completely different, we need more code and
> special
> >> >> >>>> cases.
> >> >> >>>>>> From
> >> >> >>>>>>> an implementation perspective, this topic only affects the
> >> lookup
> >> >> >>>> logic
> >> >> >>>>>>> which is rather low implementation effort which is why I
> would
> >> >> >> like
> >> >> >>>> to
> >> >> >>>>>>> clarify the remaining items. As you said, we have a slight
> >> >> >> consenus
> >> >> >>>> on
> >> >> >>>>>>> overriding built-in functions; we should also strive for
> >> reaching
> >> >> >>>>>> consensus
> >> >> >>>>>>> on the remaining topics.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
> >> >> >>> objects
> >> >> >>>>>>> consistent and the overriding of built-in functions more
> >> >> >> explicit.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Thanks,
> >> >> >>>>>>>>> Timo
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> >> >> >>>>>>>>>> hi, everyone
> >> >> >>>>>>>>>> I think this flip is very meaningful. it supports
> functions
> >> >> >>> that
> >> >> >>>>> can
> >> >> >>>>>> be
> >> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
> >> >> >> duplication
> >> >> >>> of
> >> >> >>>>>>> functions.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> Our group based on flink's sql parser module implements
> >> >> >> create
> >> >> >>>>>> function
> >> >> >>>>>>>>>> feature, stores the parsed function metadata and schema
> into
> >> >> >>>> mysql,
> >> >> >>>>>> and
> >> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to
> >> support
> >> >> >>>>> custom
> >> >> >>>>>>>>>> schemas and functions. Loaded, but the function is
> currently
> >> >> >>>>> global,
> >> >> >>>>>>> and is
> >> >> >>>>>>>>>> not subdivided according to catalog and db.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> In addition, I very much hope to participate in the
> >> >> >> development
> >> >> >>>> of
> >> >> >>>>>> this
> >> >> >>>>>>>>>> flip, I have been paying attention to the community, but
> >> >> >> found
> >> >> >>> it
> >> >> >>>>> is
> >> >> >>>>>>> more
> >> >> >>>>>>>>>> difficult to join.
> >> >> >>>>>>>>>> thank you.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> It seems to me that there is a general consensus on
> having
> >> >> >>> temp
> >> >> >>>>>>> functions
> >> >> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
> >> >> >> (As
> >> >> >>> a
> >> >> >>>>> side
> >> >> >>>>>>> note
> >> >> >>>>>>>>>>> for comparability, the current user defined functions are
> >> >> >> all
> >> >> >>>>>>> temporary and
> >> >> >>>>>>>>>>> having no namespaces.)
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having
> namespaced
> >> >> >>> temp
> >> >> >>>>>>> functions
> >> >> >>>>>>>>>>> that can overwrite functions defined in a specific
> cat/db.
> >> >> >>>>> However,
> >> >> >>>>>>> this
> >> >> >>>>>>>>>>> idea appears orthogonal to the former and can be added
> >> >> >>>>>> incrementally.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> How about we first implement non-namespaced temp
> functions
> >> >> >> now
> >> >> >>>> and
> >> >> >>>>>>> leave
> >> >> >>>>>>>>>>> the door open for namespaced ones for later releases as
> the
> >> >> >>>>>>> requirement
> >> >> >>>>>>>>>>> might become more crystal? This also helps shorten the
> >> >> >> debate
> >> >> >>>> and
> >> >> >>>>>>> allow us
> >> >> >>>>>>>>>>> to make some progress along this direction.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host
> the
> >> >> >>>>>> temporary
> >> >> >>>>>>> temp
> >> >> >>>>>>>>>>> functions that don't have namespaces, my only concern is
> >> the
> >> >> >>>>> special
> >> >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
> >> >> >>> evident
> >> >> >>>> in
> >> >> >>>>>>> treating
> >> >> >>>>>>>>>>> the built-in catalog currently.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> Thanks,
> >> >> >>>>>>>>>>> Xuefiu
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >> >> >>>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>> Hi,
> >> >> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion.
> How
> >> >> >>> about
> >> >> >>>>> we
> >> >> >>>>>>> have a
> >> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
> >> >> >> objects?
> >> >> >>>> This
> >> >> >>>>>>> catalog
> >> >> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> Then users could still override built-in functions, if
> >> they
> >> >> >>>> fully
> >> >> >>>>>>> qualify
> >> >> >>>>>>>>>>>> object with the built-in namespace, but by default the
> >> >> >> common
> >> >> >>>>> logic
> >> >> >>>>>>> of
> >> >> >>>>>>>>>>>> current dB & cat would be used.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> >> >> >>>>>>>>>>>> registers temporary function in current cat & dB
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >> >> >>>>>>>>>>>> registers temporary function in cat db
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >> >> >>>>>>>>>>>> Overrides built-in function with temporary function
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> The built-in/system namespace would not be writable for
> >> >> >>>> permanent
> >> >> >>>>>>>>>>> objects.
> >> >> >>>>>>>>>>>> WDYT?
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> Best,
> >> >> >>>>>>>>>>>> Dawid
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> >> >> >> twalthr@apache.org
> >> >> >>>>
> >> >> >>>>>> wrote:
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Hi Bowen,
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> I understand the potential benefit of overriding
> certain
> >> >> >>>>> built-in
> >> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
> >> >> >> agree.
> >> >> >>>>>>> However, it
> >> >> >>>>>>>>>>>>> would be great to still support overriding catalog
> >> >> >> functions
> >> >> >>>>> with
> >> >> >>>>>>>>>>>>> temporary functions in order to prototype a query even
> >> >> >>> though
> >> >> >>>> a
> >> >> >>>>>>>>>>>>> catalog/database might not be available currently or
> >> >> >> should
> >> >> >>>> not
> >> >> >>>>> be
> >> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> >> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
> >> >> >>> consideres
> >> >> >>>>>>> current
> >> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
> >> >> >>>> acceptable
> >> >> >>>>>> for
> >> >> >>>>>>>>>>>>> functions I guess.
> >> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
> >> >> >>>> (tables,
> >> >> >>>>>>> views)
> >> >> >>>>>>>>>>>>> except functions", this might change in the near
> future.
> >> >> >>> Take
> >> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as
> an
> >> >> >>>>> example.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Thanks,
> >> >> >>>>>>>>>>>>> Timo
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >> >> >>>>>>>>>>>>>> Hi Fabian,
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
> >> >> >>> thus I
> >> >> >>>>>>> didn't
> >> >> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
> >> >> >>> mainly
> >> >> >>>>>>> between
> >> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override
> builtin.
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
> >> >> >> differently
> >> >> >>>>>> treated
> >> >> >>>>>>>>>>> than
> >> >> >>>>>>>>>>>>>> other db objects.
> >> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact
> that
> >> >> >>>>>> functions
> >> >> >>>>>>>>>>> are
> >> >> >>>>>>>>>>>> a
> >> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't have
> any
> >> >> >>> other
> >> >> >>>>>>>>>>> built-in
> >> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> Cheers,
> >> >> >>>>>>>>>>>>>> Bowen
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> --
> >> >> >>>>>>>>>>> Xuefu Zhang
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> "In Honey We Trust!"
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>> --
> >> >> >>>>>> Xuefu Zhang
> >> >> >>>>>>
> >> >> >>>>>> "In Honey We Trust!"
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>> Xuefu Zhang
> >> >> >>>>
> >> >> >>>> "In Honey We Trust!"
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Xuefu Zhang
> >> >> >>
> >> >> >> "In Honey We Trust!"
> >> >> >>
> >> >>
> >> >>
> >>
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

I thought about it a bit more and think that there is some good value in my
last proposal.

A lot of complexity comes from the fact that we want to allow overriding
built-in functions which are differently addressed as other functions (and
db objects).
We could just have "CREATE TEMPORARY FUNCTION" do exactly the same thing as
"CREATE FUNCTION" and treat both functions exactly the same except that:
1) temp functions disappear at the end of the session
2) temp function are resolved before other functions

This would be Dawid's proposal from the beginning of this thread (in case
you still remember... ;-) )

Temporarily overriding built-in functions would be supported with an
explicit command like

ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...

This would also address the concerns about accidentally changing the
semantics of built-in functions.
IMO, it can't get much more explicit than the above command.

Sorry for bringing up a new option in the middle of the discussion, but as
I said, I think it has a bunch of benefits and I don't see major drawbacks
(maybe you do?).

What do you think?

Fabian

Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <fhueske@gmail.com
>:

> Hi everyone,
>
> I thought again about option #1 and something that I don't like is that
> the resolved address of xyz is different in "CREATE FUNCTION xyz" and
> "CREATE TEMPORARY FUNCTION xyz".
> IMO, adding the keyword "TEMPORARY" should only change the lifecycle of
> the function, but not where it is located. This implicitly changed location
> might be confusing for users.
> After all, a temp function should behave pretty much like any other
> function, except for the fact that it disappears when the session is closed.
>
> Approach #2 with the additional keyword would make that pretty clear, IMO.
> However, I neither like GLOBAL (for reasons mentioned by Dawid) or BUILDIN
> (we are not adding a built-in function).
> So I'd be OK with #2 if we find a good keyword. In fact, approach #2 could
> also be an alias for approach #3 to avoid explicit specification of the
> system catalog/db.
>
> Approach #3 would be consistent with other db objects and the "CREATE
> FUNCTION" statement.
> Adding system catalog/db seems rather complex, but then again how often do
> we expect users to override built-in functions? If this becomes a major
> issue, we can still add option #2 as an alias.
>
> Not sure what's the best approach from an internal point of view, but I
> certainly think that consistent behavior is important.
> Hence my votes are:
>
> -1 for #1
> 0 for #2
> 0 for #3
>
> Btw. Did we consider a completely separate command for overriding built-in
> functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
>
> Cheers, Fabian
>
>
> Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> <lz...@aliyun.com.invalid>:
>
>> I know Hive and Spark can shadow built-in functions by temporary function.
>> Mysql, Oracle, Sql server can not shadow.
>> User can use full names to access functions instead of shadowing.
>>
>> So I think it is a completely new thing, and the direct way to deal with
>> new things is to add new grammar. So,
>> +1 for #2, +0 for #3, -1 for #1
>>
>> Best,
>> Jingsong Lee
>>
>>
>> ------------------------------------------------------------------
>> From:Kurt Young <yk...@gmail.com>
>> Send Time:2019年9月19日(星期四) 16:43
>> To:dev <de...@flink.apache.org>
>> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
>>
>> And let me make my vote complete:
>>
>> -1 for #1
>> +1 for #2 with different keyword
>> -0 for #3
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:
>>
>> > Looks like I'm the only person who is willing to +1 to #2 for now :-)
>> > But I would suggest to change the keyword from GLOBAL to
>> > something like BUILTIN.
>> >
>> > I think #2 and #3 are almost the same proposal, just with different
>> > format to indicate whether it want to override built-in functions.
>> >
>> > My biggest reason to choose it is I want this behavior be consistent
>> > with temporal tables. I will give some examples to show the behavior
>> > and also make sure I'm not misunderstanding anything here.
>> >
>> > For most DBs, when user create a temporary table with:
>> >
>> > CREATE TEMPORARY TABLE t1
>> >
>> > It's actually equivalent with:
>> >
>> > CREATE TEMPORARY TABLE `curent_db`.t1
>> >
>> > If user change current database, they will not be able to access t1
>> without
>> > fully qualified name, .i.e db1.t1 (assuming db1 is current database when
>> > this temporary table is created).
>> >
>> > Only #2 and #3 followed this behavior and I would vote for this since
>> this
>> > makes such behavior consistent through temporal tables and functions.
>> >
>> > Why I'm not voting for #3 is a special catalog and database just looks
>> very
>> > hacky to me. It gave a imply that our built-in functions saved at a
>> > special
>> > catalog and database, which is actually not. Introducing a dedicated
>> > keyword
>> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
>> > straightforward. One can argue that we should avoid introducing new
>> > keyword,
>> > but it's also very rare that a system can overwrite built-in functions.
>> > Since we
>> > decided to support this, introduce a new keyword is not a big deal IMO.
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> It is a quite long discussion to follow and I hope I didn’t
>> misunderstand
>> >> anything. From the proposals presented by Xuefu I would vote:
>> >>
>> >> -1 for #1 and #2
>> >> +1 for #3
>> >>
>> >> Besides #3 being IMO more general and more consistent, having qualified
>> >> names (#3) would help/make easier for someone to use cross
>> >> databases/catalogs queries (joining multiple data sets/streams). For
>> >> example with some functions to manipulate/clean up/convert the stored
>> data
>> >> in different catalogs registered in the respective catalogs.
>> >>
>> >> Piotrek
>> >>
>> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>> >> >
>> >> > I agree with Xuefu that inconsistent handling with all the other
>> >> objects is
>> >> > not a big problem.
>> >> >
>> >> > Regarding to option#3, the special "system.system" namespace may
>> confuse
>> >> > users.
>> >> > Users need to know the set of built-in function names to know when to
>> >> use
>> >> > "system.system" namespace.
>> >> > What will happen if user registers a non-builtin function name under
>> the
>> >> > "system.system" namespace?
>> >> > Besides, I think it doesn't solve the "explode" problem I mentioned
>> at
>> >> the
>> >> > beginning of this thread.
>> >> >
>> >> > So here is my vote:
>> >> >
>> >> > +1 for #1
>> >> > 0 for #2
>> >> > -1 for #3
>> >> >
>> >> > Best,
>> >> > Jark
>> >> >
>> >> >
>> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
>> >> >
>> >> >> @Dawid, Re: we also don't need additional referencing the
>> >> specialcatalog
>> >> >> anywhere.
>> >> >>
>> >> >> True. But once we allow such reference, then user can do so in any
>> >> possible
>> >> >> place where a function name is expected, for which we have to
>> handle.
>> >> >> That's a big difference, I think.
>> >> >>
>> >> >> Thanks,
>> >> >> Xuefu
>> >> >>
>> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>> >> >> wysakowicz.dawid@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> @Bowen I am not suggesting introducing additional catalog. I think
>> we
>> >> >> need
>> >> >>> to get rid of the current built-in catalog.
>> >> >>>
>> >> >>> @Xuefu in option #3 we also don't need additional referencing the
>> >> special
>> >> >>> catalog anywhere else besides in the CREATE statement. The
>> resolution
>> >> >>> behaviour is exactly the same in both options.
>> >> >>>
>> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>> >> >>>
>> >> >>>> Hi Dawid,
>> >> >>>>
>> >> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It
>> >> can
>> >> >> be
>> >> >>>> changed to something else for better.
>> >> >>>>
>> >> >>>> The difference between this and the #3 approach is that we only
>> need
>> >> >> the
>> >> >>>> keyword for this create DDL. For other places (such as function
>> >> >>>> referencing), no keyword or special namespace is needed.
>> >> >>>>
>> >> >>>> Thanks,
>> >> >>>> Xuefu
>> >> >>>>
>> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>> >> >>>> wysakowicz.dawid@gmail.com>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>> I think it makes sense to start voting at this point.
>> >> >>>>>
>> >> >>>>> Option 1: Only 1-part identifiers
>> >> >>>>> PROS:
>> >> >>>>> - allows shadowing built-in functions
>> >> >>>>> CONS:
>> >> >>>>> - incosistent with all the other objects, both permanent &
>> temporary
>> >> >>>>> - does not allow shadowing catalog functions
>> >> >>>>>
>> >> >>>>> Option 2: Special keyword for built-in function
>> >> >>>>> I think this is quite similar to the special catalog/db. The
>> thing I
>> >> >> am
>> >> >>>>> strongly against in this proposal is the GLOBAL keyword. This
>> >> keyword
>> >> >>>> has a
>> >> >>>>> meaning in rdbms systems and means a function that is present
>> for a
>> >> >>>>> lifetime of a session in which it was created, but available in
>> all
>> >> >>> other
>> >> >>>>> sessions. Therefore I really don't want to use this keyword in a
>> >> >>>> different
>> >> >>>>> context.
>> >> >>>>>
>> >> >>>>> Option 3: Special catalog/db
>> >> >>>>>
>> >> >>>>> PROS:
>> >> >>>>> - allows shadowing built-in functions
>> >> >>>>> - allows shadowing catalog functions
>> >> >>>>> - consistent with other objects
>> >> >>>>> CONS:
>> >> >>>>> - we introduce a special namespace for built-in functions
>> >> >>>>>
>> >> >>>>> I don't see a problem with introducing the special namespace. In
>> the
>> >> >>> end
>> >> >>>> it
>> >> >>>>> is very similar to the keyword approach. In this case the
>> catalog/db
>> >> >>>>> combination would be the "keyword"
>> >> >>>>>
>> >> >>>>> Therefore my votes:
>> >> >>>>> Option 1: -0
>> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a
>> better
>> >> >>>> keyword)
>> >> >>>>> Option 3: +1
>> >> >>>>>
>> >> >>>>> Best,
>> >> >>>>> Dawid
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>> >> >>>>>
>> >> >>>>>> Hi Aljoscha,
>> >> >>>>>>
>> >> >>>>>> Thanks for the summary and these are great questions to be
>> >> >> answered.
>> >> >>>> The
>> >> >>>>>> answer to your first question is clear: there is a general
>> >> >> agreement
>> >> >>> to
>> >> >>>>>> override built-in functions with temp functions.
>> >> >>>>>>
>> >> >>>>>> However, your second and third questions are sort of related,
>> as a
>> >> >>>>> function
>> >> >>>>>> reference can be either just function name (like "func") or in
>> the
>> >> >>> form
>> >> >>>>> or
>> >> >>>>>> "cat.db.func". When a reference is just function name, it can
>> mean
>> >> >>>>> either a
>> >> >>>>>> built-in function or a function defined in the current cat/db.
>> If
>> >> >> we
>> >> >>>>>> support overriding a built-in function with a temp function,
>> such
>> >> >>>>>> overriding can also cover a function in the current cat/db.
>> >> >>>>>>
>> >> >>>>>> I think what Timo referred as "overriding a catalog function"
>> >> >> means a
>> >> >>>>> temp
>> >> >>>>>> function defined as "cat.db.func" overrides a catalog function
>> >> >> "func"
>> >> >>>> in
>> >> >>>>>> cat/db even if cat/db is not current. To support this, temp
>> >> >> function
>> >> >>>> has
>> >> >>>>> to
>> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and
>> 3rd
>> >> >>>>> questions
>> >> >>>>>> are related. The problem with such support is the ambiguity when
>> >> >> user
>> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION
>> func
>> >> >>> ...".
>> >> >>>>>> Here "func" can means a global temp function, or a temp
>> function in
>> >> >>>>> current
>> >> >>>>>> cat/db. If we can assume the former, this creates an
>> inconsistency
>> >> >>>>> because
>> >> >>>>>> "CREATE FUNCTION func" actually means a function in current
>> cat/db.
>> >> >>> If
>> >> >>>> we
>> >> >>>>>> assume the latter, then there is no way for user to create a
>> global
>> >> >>>> temp
>> >> >>>>>> function.
>> >> >>>>>>
>> >> >>>>>> Giving a special namespace for built-in functions may solve the
>> >> >>>> ambiguity
>> >> >>>>>> problem above, but it also introduces artificial
>> catalog/database
>> >> >>> that
>> >> >>>>>> needs special treatment and pollutes the cleanness of  the
>> code. I
>> >> >>>> would
>> >> >>>>>> rather introduce a syntax in DDL to solve the problem, like
>> "CREATE
>> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>> >> >>>>>>
>> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting
>> >> >>>>> purposes:
>> >> >>>>>>
>> >> >>>>>> 1. Support only global, temporary functions without namespace.
>> Such
>> >> >>>> temp
>> >> >>>>>> functions overrides built-in functions and catalog functions in
>> >> >>> current
>> >> >>>>>> cat/db. The resolution order is: temp functions -> built-in
>> >> >> functions
>> >> >>>> ->
>> >> >>>>>> catalog functions. (Partially or fully qualified functions has
>> no
>> >> >>>>>> ambiguity!)
>> >> >>>>>>
>> >> >>>>>> 2. In addition to #1, support creating and referencing temporary
>> >> >>>>> functions
>> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for
>> global
>> >> >>> temp
>> >> >>>>>> functions. The resolution order is: global temp functions ->
>> >> >> built-in
>> >> >>>>>> functions -> temp functions in current cat/db -> catalog
>> function.
>> >> >>>>>> (Resolution for partially or fully qualified function reference
>> is:
>> >> >>>> temp
>> >> >>>>>> functions -> persistent functions.)
>> >> >>>>>>
>> >> >>>>>> 3. In addition to #1, support creating and referencing temporary
>> >> >>>>> functions
>> >> >>>>>> associated with a cat/db with a special namespace for built-in
>> >> >>>> functions
>> >> >>>>>> and global temp functions. The resolution is the same as #2,
>> except
>> >> >>>> that
>> >> >>>>>> the special namespace might be prefixed to a reference to a
>> >> >> built-in
>> >> >>>>>> function or global temp function. (In absence of the special
>> >> >>> namespace,
>> >> >>>>> the
>> >> >>>>>> resolution order is the same as in #2.)
>> >> >>>>>>
>> >> >>>>>> My personal preference is #1, given the unknown use case and
>> >> >>> introduced
>> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
>> alternative.
>> >> >>>> Thus,
>> >> >>>>>> my votes are:
>> >> >>>>>>
>> >> >>>>>> +1 for #1
>> >> >>>>>> +0 for #2
>> >> >>>>>> -1 for #3
>> >> >>>>>>
>> >> >>>>>> Everyone, please cast your vote (in above format please!), or
>> let
>> >> >> me
>> >> >>>> know
>> >> >>>>>> if you have more questions or other candidates.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Xuefu
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>> >> >>> aljoscha@apache.org>
>> >> >>>>>> wrote:
>> >> >>>>>>
>> >> >>>>>>> Hi,
>> >> >>>>>>>
>> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
>> >> >> connected.
>> >> >>>> To
>> >> >>>>>>> resolve the differences, think we have to think about the basic
>> >> >>>>>> principles
>> >> >>>>>>> and find consensus there. The basic questions I see are:
>> >> >>>>>>>
>> >> >>>>>>> - Do we want to support overriding builtin functions?
>> >> >>>>>>> - Do we want to support overriding catalog functions?
>> >> >>>>>>> - And then later: should temporary functions be tied to a
>> >> >>>>>>> catalog/database?
>> >> >>>>>>>
>> >> >>>>>>> I don’t have much to say about these, except that we should
>> >> >>> somewhat
>> >> >>>>>> stick
>> >> >>>>>>> to what the industry does. But I also understand that the
>> >> >> industry
>> >> >>> is
>> >> >>>>>>> already very divided on this.
>> >> >>>>>>>
>> >> >>>>>>> Best,
>> >> >>>>>>> Aljoscha
>> >> >>>>>>>
>> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>> Hi,
>> >> >>>>>>>>
>> >> >>>>>>>> +1 to strive for reaching consensus on the remaining topics.
>> We
>> >> >>> are
>> >> >>>>>>> close to the truth. It will waste a lot of time if we resume
>> the
>> >> >>>> topic
>> >> >>>>>> some
>> >> >>>>>>> time later.
>> >> >>>>>>>>
>> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>> >> >>> “cat.db.fun”
>> >> >>>>> way
>> >> >>>>>>> to override a catalog function.
>> >> >>>>>>>>
>> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>> >> >>> nonexistent
>> >> >>>>> cat
>> >> >>>>>>> & db? And we still need to do special treatment for the
>> dedicated
>> >> >>>>>>> system.system cat & db?
>> >> >>>>>>>>
>> >> >>>>>>>> Best,
>> >> >>>>>>>> Jark
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>> >> >>>>>>>>>
>> >> >>>>>>>>> Hi everyone,
>> >> >>>>>>>>>
>> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
>> >> >>>> incrementally.
>> >> >>>>>>> Users should be able to override all catalog objects
>> consistently
>> >> >>>>>> according
>> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
>> >> >>>>> functions
>> >> >>>>>>> are treated completely different, we need more code and special
>> >> >>>> cases.
>> >> >>>>>> From
>> >> >>>>>>> an implementation perspective, this topic only affects the
>> lookup
>> >> >>>> logic
>> >> >>>>>>> which is rather low implementation effort which is why I would
>> >> >> like
>> >> >>>> to
>> >> >>>>>>> clarify the remaining items. As you said, we have a slight
>> >> >> consenus
>> >> >>>> on
>> >> >>>>>>> overriding built-in functions; we should also strive for
>> reaching
>> >> >>>>>> consensus
>> >> >>>>>>> on the remaining topics.
>> >> >>>>>>>>>
>> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
>> >> >>> objects
>> >> >>>>>>> consistent and the overriding of built-in functions more
>> >> >> explicit.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Thanks,
>> >> >>>>>>>>> Timo
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>> >> >>>>>>>>>> hi, everyone
>> >> >>>>>>>>>> I think this flip is very meaningful. it supports functions
>> >> >>> that
>> >> >>>>> can
>> >> >>>>>> be
>> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
>> >> >> duplication
>> >> >>> of
>> >> >>>>>>> functions.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Our group based on flink's sql parser module implements
>> >> >> create
>> >> >>>>>> function
>> >> >>>>>>>>>> feature, stores the parsed function metadata and schema into
>> >> >>>> mysql,
>> >> >>>>>> and
>> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to
>> support
>> >> >>>>> custom
>> >> >>>>>>>>>> schemas and functions. Loaded, but the function is currently
>> >> >>>>> global,
>> >> >>>>>>> and is
>> >> >>>>>>>>>> not subdivided according to catalog and db.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> In addition, I very much hope to participate in the
>> >> >> development
>> >> >>>> of
>> >> >>>>>> this
>> >> >>>>>>>>>> flip, I have been paying attention to the community, but
>> >> >> found
>> >> >>> it
>> >> >>>>> is
>> >> >>>>>>> more
>> >> >>>>>>>>>> difficult to join.
>> >> >>>>>>>>>> thank you.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> It seems to me that there is a general consensus on having
>> >> >>> temp
>> >> >>>>>>> functions
>> >> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
>> >> >> (As
>> >> >>> a
>> >> >>>>> side
>> >> >>>>>>> note
>> >> >>>>>>>>>>> for comparability, the current user defined functions are
>> >> >> all
>> >> >>>>>>> temporary and
>> >> >>>>>>>>>>> having no namespaces.)
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
>> >> >>> temp
>> >> >>>>>>> functions
>> >> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
>> >> >>>>> However,
>> >> >>>>>>> this
>> >> >>>>>>>>>>> idea appears orthogonal to the former and can be added
>> >> >>>>>> incrementally.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> How about we first implement non-namespaced temp functions
>> >> >> now
>> >> >>>> and
>> >> >>>>>>> leave
>> >> >>>>>>>>>>> the door open for namespaced ones for later releases as the
>> >> >>>>>>> requirement
>> >> >>>>>>>>>>> might become more crystal? This also helps shorten the
>> >> >> debate
>> >> >>>> and
>> >> >>>>>>> allow us
>> >> >>>>>>>>>>> to make some progress along this direction.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
>> >> >>>>>> temporary
>> >> >>>>>>> temp
>> >> >>>>>>>>>>> functions that don't have namespaces, my only concern is
>> the
>> >> >>>>> special
>> >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
>> >> >>> evident
>> >> >>>> in
>> >> >>>>>>> treating
>> >> >>>>>>>>>>> the built-in catalog currently.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Thanks,
>> >> >>>>>>>>>>> Xuefiu
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
>> >> >>>>>>>>>>> wrote:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>> Hi,
>> >> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
>> >> >>> about
>> >> >>>>> we
>> >> >>>>>>> have a
>> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
>> >> >> objects?
>> >> >>>> This
>> >> >>>>>>> catalog
>> >> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Then users could still override built-in functions, if
>> they
>> >> >>>> fully
>> >> >>>>>>> qualify
>> >> >>>>>>>>>>>> object with the built-in namespace, but by default the
>> >> >> common
>> >> >>>>> logic
>> >> >>>>>>> of
>> >> >>>>>>>>>>>> current dB & cat would be used.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>> >> >>>>>>>>>>>> registers temporary function in current cat & dB
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>> >> >>>>>>>>>>>> registers temporary function in cat db
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>> >> >>>>>>>>>>>> Overrides built-in function with temporary function
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> The built-in/system namespace would not be writable for
>> >> >>>> permanent
>> >> >>>>>>>>>>> objects.
>> >> >>>>>>>>>>>> WDYT?
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Best,
>> >> >>>>>>>>>>>> Dawid
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>> >> >> twalthr@apache.org
>> >> >>>>
>> >> >>>>>> wrote:
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Hi Bowen,
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> I understand the potential benefit of overriding certain
>> >> >>>>> built-in
>> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
>> >> >> agree.
>> >> >>>>>>> However, it
>> >> >>>>>>>>>>>>> would be great to still support overriding catalog
>> >> >> functions
>> >> >>>>> with
>> >> >>>>>>>>>>>>> temporary functions in order to prototype a query even
>> >> >>> though
>> >> >>>> a
>> >> >>>>>>>>>>>>> catalog/database might not be available currently or
>> >> >> should
>> >> >>>> not
>> >> >>>>> be
>> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
>> >> >>> consideres
>> >> >>>>>>> current
>> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
>> >> >>>> acceptable
>> >> >>>>>> for
>> >> >>>>>>>>>>>>> functions I guess.
>> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
>> >> >>>> (tables,
>> >> >>>>>>> views)
>> >> >>>>>>>>>>>>> except functions", this might change in the near future.
>> >> >>> Take
>> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
>> >> >>>>> example.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Thanks,
>> >> >>>>>>>>>>>>> Timo
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>> >> >>>>>>>>>>>>>> Hi Fabian,
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
>> >> >>> thus I
>> >> >>>>>>> didn't
>> >> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
>> >> >>> mainly
>> >> >>>>>>> between
>> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
>> >> >> differently
>> >> >>>>>> treated
>> >> >>>>>>>>>>> than
>> >> >>>>>>>>>>>>>> other db objects.
>> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
>> >> >>>>>> functions
>> >> >>>>>>>>>>> are
>> >> >>>>>>>>>>>> a
>> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any
>> >> >>> other
>> >> >>>>>>>>>>> built-in
>> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> Cheers,
>> >> >>>>>>>>>>>>>> Bowen
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> --
>> >> >>>>>>>>>>> Xuefu Zhang
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> "In Honey We Trust!"
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Xuefu Zhang
>> >> >>>>>>
>> >> >>>>>> "In Honey We Trust!"
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Xuefu Zhang
>> >> >>>>
>> >> >>>> "In Honey We Trust!"
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Xuefu Zhang
>> >> >>
>> >> >> "In Honey We Trust!"
>> >> >>
>> >>
>> >>
>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
Hi everyone,

I thought again about option #1 and something that I don't like is that the
resolved address of xyz is different in "CREATE FUNCTION xyz" and "CREATE
TEMPORARY FUNCTION xyz".
IMO, adding the keyword "TEMPORARY" should only change the lifecycle of the
function, but not where it is located. This implicitly changed location
might be confusing for users.
After all, a temp function should behave pretty much like any other
function, except for the fact that it disappears when the session is closed.

Approach #2 with the additional keyword would make that pretty clear, IMO.
However, I neither like GLOBAL (for reasons mentioned by Dawid) or BUILDIN
(we are not adding a built-in function).
So I'd be OK with #2 if we find a good keyword. In fact, approach #2 could
also be an alias for approach #3 to avoid explicit specification of the
system catalog/db.

Approach #3 would be consistent with other db objects and the "CREATE
FUNCTION" statement.
Adding system catalog/db seems rather complex, but then again how often do
we expect users to override built-in functions? If this becomes a major
issue, we can still add option #2 as an alias.

Not sure what's the best approach from an internal point of view, but I
certainly think that consistent behavior is important.
Hence my votes are:

-1 for #1
0 for #2
0 for #3

Btw. Did we consider a completely separate command for overriding built-in
functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?

Cheers, Fabian


Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
<lz...@aliyun.com.invalid>:

> I know Hive and Spark can shadow built-in functions by temporary function.
> Mysql, Oracle, Sql server can not shadow.
> User can use full names to access functions instead of shadowing.
>
> So I think it is a completely new thing, and the direct way to deal with
> new things is to add new grammar. So,
> +1 for #2, +0 for #3, -1 for #1
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <yk...@gmail.com>
> Send Time:2019年9月19日(星期四) 16:43
> To:dev <de...@flink.apache.org>
> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
>
> And let me make my vote complete:
>
> -1 for #1
> +1 for #2 with different keyword
> -0 for #3
>
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:
>
> > Looks like I'm the only person who is willing to +1 to #2 for now :-)
> > But I would suggest to change the keyword from GLOBAL to
> > something like BUILTIN.
> >
> > I think #2 and #3 are almost the same proposal, just with different
> > format to indicate whether it want to override built-in functions.
> >
> > My biggest reason to choose it is I want this behavior be consistent
> > with temporal tables. I will give some examples to show the behavior
> > and also make sure I'm not misunderstanding anything here.
> >
> > For most DBs, when user create a temporary table with:
> >
> > CREATE TEMPORARY TABLE t1
> >
> > It's actually equivalent with:
> >
> > CREATE TEMPORARY TABLE `curent_db`.t1
> >
> > If user change current database, they will not be able to access t1
> without
> > fully qualified name, .i.e db1.t1 (assuming db1 is current database when
> > this temporary table is created).
> >
> > Only #2 and #3 followed this behavior and I would vote for this since
> this
> > makes such behavior consistent through temporal tables and functions.
> >
> > Why I'm not voting for #3 is a special catalog and database just looks
> very
> > hacky to me. It gave a imply that our built-in functions saved at a
> > special
> > catalog and database, which is actually not. Introducing a dedicated
> > keyword
> > like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> > straightforward. One can argue that we should avoid introducing new
> > keyword,
> > but it's also very rare that a system can overwrite built-in functions.
> > Since we
> > decided to support this, introduce a new keyword is not a big deal IMO.
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
> > wrote:
> >
> >> Hi,
> >>
> >> It is a quite long discussion to follow and I hope I didn’t
> misunderstand
> >> anything. From the proposals presented by Xuefu I would vote:
> >>
> >> -1 for #1 and #2
> >> +1 for #3
> >>
> >> Besides #3 being IMO more general and more consistent, having qualified
> >> names (#3) would help/make easier for someone to use cross
> >> databases/catalogs queries (joining multiple data sets/streams). For
> >> example with some functions to manipulate/clean up/convert the stored
> data
> >> in different catalogs registered in the respective catalogs.
> >>
> >> Piotrek
> >>
> >> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> >> >
> >> > I agree with Xuefu that inconsistent handling with all the other
> >> objects is
> >> > not a big problem.
> >> >
> >> > Regarding to option#3, the special "system.system" namespace may
> confuse
> >> > users.
> >> > Users need to know the set of built-in function names to know when to
> >> use
> >> > "system.system" namespace.
> >> > What will happen if user registers a non-builtin function name under
> the
> >> > "system.system" namespace?
> >> > Besides, I think it doesn't solve the "explode" problem I mentioned at
> >> the
> >> > beginning of this thread.
> >> >
> >> > So here is my vote:
> >> >
> >> > +1 for #1
> >> > 0 for #2
> >> > -1 for #3
> >> >
> >> > Best,
> >> > Jark
> >> >
> >> >
> >> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
> >> >
> >> >> @Dawid, Re: we also don't need additional referencing the
> >> specialcatalog
> >> >> anywhere.
> >> >>
> >> >> True. But once we allow such reference, then user can do so in any
> >> possible
> >> >> place where a function name is expected, for which we have to handle.
> >> >> That's a big difference, I think.
> >> >>
> >> >> Thanks,
> >> >> Xuefu
> >> >>
> >> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> >> >> wysakowicz.dawid@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> @Bowen I am not suggesting introducing additional catalog. I think
> we
> >> >> need
> >> >>> to get rid of the current built-in catalog.
> >> >>>
> >> >>> @Xuefu in option #3 we also don't need additional referencing the
> >> special
> >> >>> catalog anywhere else besides in the CREATE statement. The
> resolution
> >> >>> behaviour is exactly the same in both options.
> >> >>>
> >> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
> >> >>>
> >> >>>> Hi Dawid,
> >> >>>>
> >> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It
> >> can
> >> >> be
> >> >>>> changed to something else for better.
> >> >>>>
> >> >>>> The difference between this and the #3 approach is that we only
> need
> >> >> the
> >> >>>> keyword for this create DDL. For other places (such as function
> >> >>>> referencing), no keyword or special namespace is needed.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Xuefu
> >> >>>>
> >> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> >> >>>> wysakowicz.dawid@gmail.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>> I think it makes sense to start voting at this point.
> >> >>>>>
> >> >>>>> Option 1: Only 1-part identifiers
> >> >>>>> PROS:
> >> >>>>> - allows shadowing built-in functions
> >> >>>>> CONS:
> >> >>>>> - incosistent with all the other objects, both permanent &
> temporary
> >> >>>>> - does not allow shadowing catalog functions
> >> >>>>>
> >> >>>>> Option 2: Special keyword for built-in function
> >> >>>>> I think this is quite similar to the special catalog/db. The
> thing I
> >> >> am
> >> >>>>> strongly against in this proposal is the GLOBAL keyword. This
> >> keyword
> >> >>>> has a
> >> >>>>> meaning in rdbms systems and means a function that is present for
> a
> >> >>>>> lifetime of a session in which it was created, but available in
> all
> >> >>> other
> >> >>>>> sessions. Therefore I really don't want to use this keyword in a
> >> >>>> different
> >> >>>>> context.
> >> >>>>>
> >> >>>>> Option 3: Special catalog/db
> >> >>>>>
> >> >>>>> PROS:
> >> >>>>> - allows shadowing built-in functions
> >> >>>>> - allows shadowing catalog functions
> >> >>>>> - consistent with other objects
> >> >>>>> CONS:
> >> >>>>> - we introduce a special namespace for built-in functions
> >> >>>>>
> >> >>>>> I don't see a problem with introducing the special namespace. In
> the
> >> >>> end
> >> >>>> it
> >> >>>>> is very similar to the keyword approach. In this case the
> catalog/db
> >> >>>>> combination would be the "keyword"
> >> >>>>>
> >> >>>>> Therefore my votes:
> >> >>>>> Option 1: -0
> >> >>>>> Option 2: -1 (I might change to +0 if we can come up with a better
> >> >>>> keyword)
> >> >>>>> Option 3: +1
> >> >>>>>
> >> >>>>> Best,
> >> >>>>> Dawid
> >> >>>>>
> >> >>>>>
> >> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >> >>>>>
> >> >>>>>> Hi Aljoscha,
> >> >>>>>>
> >> >>>>>> Thanks for the summary and these are great questions to be
> >> >> answered.
> >> >>>> The
> >> >>>>>> answer to your first question is clear: there is a general
> >> >> agreement
> >> >>> to
> >> >>>>>> override built-in functions with temp functions.
> >> >>>>>>
> >> >>>>>> However, your second and third questions are sort of related, as
> a
> >> >>>>> function
> >> >>>>>> reference can be either just function name (like "func") or in
> the
> >> >>> form
> >> >>>>> or
> >> >>>>>> "cat.db.func". When a reference is just function name, it can
> mean
> >> >>>>> either a
> >> >>>>>> built-in function or a function defined in the current cat/db. If
> >> >> we
> >> >>>>>> support overriding a built-in function with a temp function, such
> >> >>>>>> overriding can also cover a function in the current cat/db.
> >> >>>>>>
> >> >>>>>> I think what Timo referred as "overriding a catalog function"
> >> >> means a
> >> >>>>> temp
> >> >>>>>> function defined as "cat.db.func" overrides a catalog function
> >> >> "func"
> >> >>>> in
> >> >>>>>> cat/db even if cat/db is not current. To support this, temp
> >> >> function
> >> >>>> has
> >> >>>>> to
> >> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >> >>>>> questions
> >> >>>>>> are related. The problem with such support is the ambiguity when
> >> >> user
> >> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> >> >>> ...".
> >> >>>>>> Here "func" can means a global temp function, or a temp function
> in
> >> >>>>> current
> >> >>>>>> cat/db. If we can assume the former, this creates an
> inconsistency
> >> >>>>> because
> >> >>>>>> "CREATE FUNCTION func" actually means a function in current
> cat/db.
> >> >>> If
> >> >>>> we
> >> >>>>>> assume the latter, then there is no way for user to create a
> global
> >> >>>> temp
> >> >>>>>> function.
> >> >>>>>>
> >> >>>>>> Giving a special namespace for built-in functions may solve the
> >> >>>> ambiguity
> >> >>>>>> problem above, but it also introduces artificial catalog/database
> >> >>> that
> >> >>>>>> needs special treatment and pollutes the cleanness of  the code.
> I
> >> >>>> would
> >> >>>>>> rather introduce a syntax in DDL to solve the problem, like
> "CREATE
> >> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> >> >>>>>>
> >> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting
> >> >>>>> purposes:
> >> >>>>>>
> >> >>>>>> 1. Support only global, temporary functions without namespace.
> Such
> >> >>>> temp
> >> >>>>>> functions overrides built-in functions and catalog functions in
> >> >>> current
> >> >>>>>> cat/db. The resolution order is: temp functions -> built-in
> >> >> functions
> >> >>>> ->
> >> >>>>>> catalog functions. (Partially or fully qualified functions has no
> >> >>>>>> ambiguity!)
> >> >>>>>>
> >> >>>>>> 2. In addition to #1, support creating and referencing temporary
> >> >>>>> functions
> >> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for
> global
> >> >>> temp
> >> >>>>>> functions. The resolution order is: global temp functions ->
> >> >> built-in
> >> >>>>>> functions -> temp functions in current cat/db -> catalog
> function.
> >> >>>>>> (Resolution for partially or fully qualified function reference
> is:
> >> >>>> temp
> >> >>>>>> functions -> persistent functions.)
> >> >>>>>>
> >> >>>>>> 3. In addition to #1, support creating and referencing temporary
> >> >>>>> functions
> >> >>>>>> associated with a cat/db with a special namespace for built-in
> >> >>>> functions
> >> >>>>>> and global temp functions. The resolution is the same as #2,
> except
> >> >>>> that
> >> >>>>>> the special namespace might be prefixed to a reference to a
> >> >> built-in
> >> >>>>>> function or global temp function. (In absence of the special
> >> >>> namespace,
> >> >>>>> the
> >> >>>>>> resolution order is the same as in #2.)
> >> >>>>>>
> >> >>>>>> My personal preference is #1, given the unknown use case and
> >> >>> introduced
> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
> alternative.
> >> >>>> Thus,
> >> >>>>>> my votes are:
> >> >>>>>>
> >> >>>>>> +1 for #1
> >> >>>>>> +0 for #2
> >> >>>>>> -1 for #3
> >> >>>>>>
> >> >>>>>> Everyone, please cast your vote (in above format please!), or let
> >> >> me
> >> >>>> know
> >> >>>>>> if you have more questions or other candidates.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Xuefu
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> >> >>> aljoscha@apache.org>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>> Hi,
> >> >>>>>>>
> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
> >> >> connected.
> >> >>>> To
> >> >>>>>>> resolve the differences, think we have to think about the basic
> >> >>>>>> principles
> >> >>>>>>> and find consensus there. The basic questions I see are:
> >> >>>>>>>
> >> >>>>>>> - Do we want to support overriding builtin functions?
> >> >>>>>>> - Do we want to support overriding catalog functions?
> >> >>>>>>> - And then later: should temporary functions be tied to a
> >> >>>>>>> catalog/database?
> >> >>>>>>>
> >> >>>>>>> I don’t have much to say about these, except that we should
> >> >>> somewhat
> >> >>>>>> stick
> >> >>>>>>> to what the industry does. But I also understand that the
> >> >> industry
> >> >>> is
> >> >>>>>>> already very divided on this.
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> Aljoscha
> >> >>>>>>>
> >> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> >> >>>>>>>>
> >> >>>>>>>> Hi,
> >> >>>>>>>>
> >> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
> >> >>> are
> >> >>>>>>> close to the truth. It will waste a lot of time if we resume the
> >> >>>> topic
> >> >>>>>> some
> >> >>>>>>> time later.
> >> >>>>>>>>
> >> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> >> >>> “cat.db.fun”
> >> >>>>> way
> >> >>>>>>> to override a catalog function.
> >> >>>>>>>>
> >> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> >> >>> nonexistent
> >> >>>>> cat
> >> >>>>>>> & db? And we still need to do special treatment for the
> dedicated
> >> >>>>>>> system.system cat & db?
> >> >>>>>>>>
> >> >>>>>>>> Best,
> >> >>>>>>>> Jark
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >> >>>>>>>>>
> >> >>>>>>>>> Hi everyone,
> >> >>>>>>>>>
> >> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
> >> >>>> incrementally.
> >> >>>>>>> Users should be able to override all catalog objects
> consistently
> >> >>>>>> according
> >> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
> >> >>>>> functions
> >> >>>>>>> are treated completely different, we need more code and special
> >> >>>> cases.
> >> >>>>>> From
> >> >>>>>>> an implementation perspective, this topic only affects the
> lookup
> >> >>>> logic
> >> >>>>>>> which is rather low implementation effort which is why I would
> >> >> like
> >> >>>> to
> >> >>>>>>> clarify the remaining items. As you said, we have a slight
> >> >> consenus
> >> >>>> on
> >> >>>>>>> overriding built-in functions; we should also strive for
> reaching
> >> >>>>>> consensus
> >> >>>>>>> on the remaining topics.
> >> >>>>>>>>>
> >> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
> >> >>> objects
> >> >>>>>>> consistent and the overriding of built-in functions more
> >> >> explicit.
> >> >>>>>>>>>
> >> >>>>>>>>> Thanks,
> >> >>>>>>>>> Timo
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> >> >>>>>>>>>> hi, everyone
> >> >>>>>>>>>> I think this flip is very meaningful. it supports functions
> >> >>> that
> >> >>>>> can
> >> >>>>>> be
> >> >>>>>>>>>> shared by different catalogs and dbs, reducing the
> >> >> duplication
> >> >>> of
> >> >>>>>>> functions.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Our group based on flink's sql parser module implements
> >> >> create
> >> >>>>>> function
> >> >>>>>>>>>> feature, stores the parsed function metadata and schema into
> >> >>>> mysql,
> >> >>>>>> and
> >> >>>>>>>>>> also customizes the catalog, customizes sql-client to support
> >> >>>>> custom
> >> >>>>>>>>>> schemas and functions. Loaded, but the function is currently
> >> >>>>> global,
> >> >>>>>>> and is
> >> >>>>>>>>>> not subdivided according to catalog and db.
> >> >>>>>>>>>>
> >> >>>>>>>>>> In addition, I very much hope to participate in the
> >> >> development
> >> >>>> of
> >> >>>>>> this
> >> >>>>>>>>>> flip, I have been paying attention to the community, but
> >> >> found
> >> >>> it
> >> >>>>> is
> >> >>>>>>> more
> >> >>>>>>>>>> difficult to join.
> >> >>>>>>>>>> thank you.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> It seems to me that there is a general consensus on having
> >> >>> temp
> >> >>>>>>> functions
> >> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
> >> >> (As
> >> >>> a
> >> >>>>> side
> >> >>>>>>> note
> >> >>>>>>>>>>> for comparability, the current user defined functions are
> >> >> all
> >> >>>>>>> temporary and
> >> >>>>>>>>>>> having no namespaces.)
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
> >> >>> temp
> >> >>>>>>> functions
> >> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
> >> >>>>> However,
> >> >>>>>>> this
> >> >>>>>>>>>>> idea appears orthogonal to the former and can be added
> >> >>>>>> incrementally.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> How about we first implement non-namespaced temp functions
> >> >> now
> >> >>>> and
> >> >>>>>>> leave
> >> >>>>>>>>>>> the door open for namespaced ones for later releases as the
> >> >>>>>>> requirement
> >> >>>>>>>>>>> might become more crystal? This also helps shorten the
> >> >> debate
> >> >>>> and
> >> >>>>>>> allow us
> >> >>>>>>>>>>> to make some progress along this direction.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
> >> >>>>>> temporary
> >> >>>>>>> temp
> >> >>>>>>>>>>> functions that don't have namespaces, my only concern is the
> >> >>>>> special
> >> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
> >> >>> evident
> >> >>>> in
> >> >>>>>>> treating
> >> >>>>>>>>>>> the built-in catalog currently.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Thanks,
> >> >>>>>>>>>>> Xuefiu
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >> >>>>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Hi,
> >> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
> >> >>> about
> >> >>>>> we
> >> >>>>>>> have a
> >> >>>>>>>>>>>> special namespace (catalog + database) for built-in
> >> >> objects?
> >> >>>> This
> >> >>>>>>> catalog
> >> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Then users could still override built-in functions, if they
> >> >>>> fully
> >> >>>>>>> qualify
> >> >>>>>>>>>>>> object with the built-in namespace, but by default the
> >> >> common
> >> >>>>> logic
> >> >>>>>>> of
> >> >>>>>>>>>>>> current dB & cat would be used.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> >> >>>>>>>>>>>> registers temporary function in current cat & dB
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >> >>>>>>>>>>>> registers temporary function in cat db
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >> >>>>>>>>>>>> Overrides built-in function with temporary function
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> The built-in/system namespace would not be writable for
> >> >>>> permanent
> >> >>>>>>>>>>> objects.
> >> >>>>>>>>>>>> WDYT?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Best,
> >> >>>>>>>>>>>> Dawid
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> >> >> twalthr@apache.org
> >> >>>>
> >> >>>>>> wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>> Hi Bowen,
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> I understand the potential benefit of overriding certain
> >> >>>>> built-in
> >> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
> >> >> agree.
> >> >>>>>>> However, it
> >> >>>>>>>>>>>>> would be great to still support overriding catalog
> >> >> functions
> >> >>>>> with
> >> >>>>>>>>>>>>> temporary functions in order to prototype a query even
> >> >>> though
> >> >>>> a
> >> >>>>>>>>>>>>> catalog/database might not be available currently or
> >> >> should
> >> >>>> not
> >> >>>>> be
> >> >>>>>>>>>>>>> modified yet. How about we support both cases?
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> >> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
> >> >>> consideres
> >> >>>>>>> current
> >> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
> >> >>>> acceptable
> >> >>>>>> for
> >> >>>>>>>>>>>>> functions I guess.
> >> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >> >>>>>>>>>>>>> -> creates/overrides a catalog function
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
> >> >>>> (tables,
> >> >>>>>>> views)
> >> >>>>>>>>>>>>> except functions", this might change in the near future.
> >> >>> Take
> >> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> >> >>>>> example.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>>> Timo
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >> >>>>>>>>>>>>>> Hi Fabian,
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
> >> >>> thus I
> >> >>>>>>> didn't
> >> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
> >> >>> mainly
> >> >>>>>>> between
> >> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Re > However, it means that temp functions are
> >> >> differently
> >> >>>>>> treated
> >> >>>>>>>>>>> than
> >> >>>>>>>>>>>>>> other db objects.
> >> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
> >> >>>>>> functions
> >> >>>>>>>>>>> are
> >> >>>>>>>>>>>> a
> >> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any
> >> >>> other
> >> >>>>>>>>>>> built-in
> >> >>>>>>>>>>>>>> objects (tables, views) except functions.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Cheers,
> >> >>>>>>>>>>>>>> Bowen
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> --
> >> >>>>>>>>>>> Xuefu Zhang
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> "In Honey We Trust!"
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Xuefu Zhang
> >> >>>>>>
> >> >>>>>> "In Honey We Trust!"
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Xuefu Zhang
> >> >>>>
> >> >>>> "In Honey We Trust!"
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> Xuefu Zhang
> >> >>
> >> >> "In Honey We Trust!"
> >> >>
> >>
> >>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Piotr Nowojski <pi...@ververica.com>.
After reading Kurt’s reasoning I’m bumping my vote for #2 from -1 to +0, or even +0.5, so my final vote is:

-1 for #1
+0.5 for #2
+1 for #3

Re confusion about “system_db”. I think quite a lot of DBs are storing some meta tables in some system and often hidden db/schema, so I don’t think that if we do the same with built in functions will be that big of a deal. In the end, both for #2 and #3 user will have to check in the documentation what’s the syntax for overriding built-in functions for the first time he will want to do it.

Piotrek

> On 19 Sep 2019, at 11:03, JingsongLee <lz...@aliyun.com.INVALID> wrote:
> 
> I know Hive and Spark can shadow built-in functions by temporary function.
> Mysql, Oracle, Sql server can not shadow.
> User can use full names to access functions instead of shadowing.
> 
> So I think it is a completely new thing, and the direct way to deal with new things is to add new grammar. So,
> +1 for #2, +0 for #3, -1 for #1
> 
> Best,
> Jingsong Lee
> 
> 
> ------------------------------------------------------------------
> From:Kurt Young <yk...@gmail.com>
> Send Time:2019年9月19日(星期四) 16:43
> To:dev <de...@flink.apache.org>
> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
> 
> And let me make my vote complete:
> 
> -1 for #1
> +1 for #2 with different keyword
> -0 for #3
> 
> Best,
> Kurt
> 
> 
> On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:
> 
>> Looks like I'm the only person who is willing to +1 to #2 for now :-)
>> But I would suggest to change the keyword from GLOBAL to
>> something like BUILTIN.
>> 
>> I think #2 and #3 are almost the same proposal, just with different
>> format to indicate whether it want to override built-in functions.
>> 
>> My biggest reason to choose it is I want this behavior be consistent
>> with temporal tables. I will give some examples to show the behavior
>> and also make sure I'm not misunderstanding anything here.
>> 
>> For most DBs, when user create a temporary table with:
>> 
>> CREATE TEMPORARY TABLE t1
>> 
>> It's actually equivalent with:
>> 
>> CREATE TEMPORARY TABLE `curent_db`.t1
>> 
>> If user change current database, they will not be able to access t1 without
>> fully qualified name, .i.e db1.t1 (assuming db1 is current database when
>> this temporary table is created).
>> 
>> Only #2 and #3 followed this behavior and I would vote for this since this
>> makes such behavior consistent through temporal tables and functions.
>> 
>> Why I'm not voting for #3 is a special catalog and database just looks very
>> hacky to me. It gave a imply that our built-in functions saved at a
>> special
>> catalog and database, which is actually not. Introducing a dedicated
>> keyword
>> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
>> straightforward. One can argue that we should avoid introducing new
>> keyword,
>> but it's also very rare that a system can overwrite built-in functions.
>> Since we
>> decided to support this, introduce a new keyword is not a big deal IMO.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> It is a quite long discussion to follow and I hope I didn’t misunderstand
>>> anything. From the proposals presented by Xuefu I would vote:
>>> 
>>> -1 for #1 and #2
>>> +1 for #3
>>> 
>>> Besides #3 being IMO more general and more consistent, having qualified
>>> names (#3) would help/make easier for someone to use cross
>>> databases/catalogs queries (joining multiple data sets/streams). For
>>> example with some functions to manipulate/clean up/convert the stored data
>>> in different catalogs registered in the respective catalogs.
>>> 
>>> Piotrek
>>> 
>>>> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>>>> 
>>>> I agree with Xuefu that inconsistent handling with all the other
>>> objects is
>>>> not a big problem.
>>>> 
>>>> Regarding to option#3, the special "system.system" namespace may confuse
>>>> users.
>>>> Users need to know the set of built-in function names to know when to
>>> use
>>>> "system.system" namespace.
>>>> What will happen if user registers a non-builtin function name under the
>>>> "system.system" namespace?
>>>> Besides, I think it doesn't solve the "explode" problem I mentioned at
>>> the
>>>> beginning of this thread.
>>>> 
>>>> So here is my vote:
>>>> 
>>>> +1 for #1
>>>> 0 for #2
>>>> -1 for #3
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> 
>>>> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
>>>> 
>>>>> @Dawid, Re: we also don't need additional referencing the
>>> specialcatalog
>>>>> anywhere.
>>>>> 
>>>>> True. But once we allow such reference, then user can do so in any
>>> possible
>>>>> place where a function name is expected, for which we have to handle.
>>>>> That's a big difference, I think.
>>>>> 
>>>>> Thanks,
>>>>> Xuefu
>>>>> 
>>>>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>>>>> wysakowicz.dawid@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> @Bowen I am not suggesting introducing additional catalog. I think we
>>>>> need
>>>>>> to get rid of the current built-in catalog.
>>>>>> 
>>>>>> @Xuefu in option #3 we also don't need additional referencing the
>>> special
>>>>>> catalog anywhere else besides in the CREATE statement. The resolution
>>>>>> behaviour is exactly the same in both options.
>>>>>> 
>>>>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Dawid,
>>>>>>> 
>>>>>>> "GLOBAL" is a temporary keyword that was given to the approach. It
>>> can
>>>>> be
>>>>>>> changed to something else for better.
>>>>>>> 
>>>>>>> The difference between this and the #3 approach is that we only need
>>>>> the
>>>>>>> keyword for this create DDL. For other places (such as function
>>>>>>> referencing), no keyword or special namespace is needed.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Xuefu
>>>>>>> 
>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> I think it makes sense to start voting at this point.
>>>>>>>> 
>>>>>>>> Option 1: Only 1-part identifiers
>>>>>>>> PROS:
>>>>>>>> - allows shadowing built-in functions
>>>>>>>> CONS:
>>>>>>>> - incosistent with all the other objects, both permanent & temporary
>>>>>>>> - does not allow shadowing catalog functions
>>>>>>>> 
>>>>>>>> Option 2: Special keyword for built-in function
>>>>>>>> I think this is quite similar to the special catalog/db. The thing I
>>>>> am
>>>>>>>> strongly against in this proposal is the GLOBAL keyword. This
>>> keyword
>>>>>>> has a
>>>>>>>> meaning in rdbms systems and means a function that is present for a
>>>>>>>> lifetime of a session in which it was created, but available in all
>>>>>> other
>>>>>>>> sessions. Therefore I really don't want to use this keyword in a
>>>>>>> different
>>>>>>>> context.
>>>>>>>> 
>>>>>>>> Option 3: Special catalog/db
>>>>>>>> 
>>>>>>>> PROS:
>>>>>>>> - allows shadowing built-in functions
>>>>>>>> - allows shadowing catalog functions
>>>>>>>> - consistent with other objects
>>>>>>>> CONS:
>>>>>>>> - we introduce a special namespace for built-in functions
>>>>>>>> 
>>>>>>>> I don't see a problem with introducing the special namespace. In the
>>>>>> end
>>>>>>> it
>>>>>>>> is very similar to the keyword approach. In this case the catalog/db
>>>>>>>> combination would be the "keyword"
>>>>>>>> 
>>>>>>>> Therefore my votes:
>>>>>>>> Option 1: -0
>>>>>>>> Option 2: -1 (I might change to +0 if we can come up with a better
>>>>>>> keyword)
>>>>>>>> Option 3: +1
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Dawid
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Aljoscha,
>>>>>>>>> 
>>>>>>>>> Thanks for the summary and these are great questions to be
>>>>> answered.
>>>>>>> The
>>>>>>>>> answer to your first question is clear: there is a general
>>>>> agreement
>>>>>> to
>>>>>>>>> override built-in functions with temp functions.
>>>>>>>>> 
>>>>>>>>> However, your second and third questions are sort of related, as a
>>>>>>>> function
>>>>>>>>> reference can be either just function name (like "func") or in the
>>>>>> form
>>>>>>>> or
>>>>>>>>> "cat.db.func". When a reference is just function name, it can mean
>>>>>>>> either a
>>>>>>>>> built-in function or a function defined in the current cat/db. If
>>>>> we
>>>>>>>>> support overriding a built-in function with a temp function, such
>>>>>>>>> overriding can also cover a function in the current cat/db.
>>>>>>>>> 
>>>>>>>>> I think what Timo referred as "overriding a catalog function"
>>>>> means a
>>>>>>>> temp
>>>>>>>>> function defined as "cat.db.func" overrides a catalog function
>>>>> "func"
>>>>>>> in
>>>>>>>>> cat/db even if cat/db is not current. To support this, temp
>>>>> function
>>>>>>> has
>>>>>>>> to
>>>>>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>>>>>>>> questions
>>>>>>>>> are related. The problem with such support is the ambiguity when
>>>>> user
>>>>>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
>>>>>> ...".
>>>>>>>>> Here "func" can means a global temp function, or a temp function in
>>>>>>>> current
>>>>>>>>> cat/db. If we can assume the former, this creates an inconsistency
>>>>>>>> because
>>>>>>>>> "CREATE FUNCTION func" actually means a function in current cat/db.
>>>>>> If
>>>>>>> we
>>>>>>>>> assume the latter, then there is no way for user to create a global
>>>>>>> temp
>>>>>>>>> function.
>>>>>>>>> 
>>>>>>>>> Giving a special namespace for built-in functions may solve the
>>>>>>> ambiguity
>>>>>>>>> problem above, but it also introduces artificial catalog/database
>>>>>> that
>>>>>>>>> needs special treatment and pollutes the cleanness of  the code. I
>>>>>>> would
>>>>>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>>>>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>>>>>>>>> 
>>>>>>>>> Thus, I'd like to summarize a few candidate proposals for voting
>>>>>>>> purposes:
>>>>>>>>> 
>>>>>>>>> 1. Support only global, temporary functions without namespace. Such
>>>>>>> temp
>>>>>>>>> functions overrides built-in functions and catalog functions in
>>>>>> current
>>>>>>>>> cat/db. The resolution order is: temp functions -> built-in
>>>>> functions
>>>>>>> ->
>>>>>>>>> catalog functions. (Partially or fully qualified functions has no
>>>>>>>>> ambiguity!)
>>>>>>>>> 
>>>>>>>>> 2. In addition to #1, support creating and referencing temporary
>>>>>>>> functions
>>>>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global
>>>>>> temp
>>>>>>>>> functions. The resolution order is: global temp functions ->
>>>>> built-in
>>>>>>>>> functions -> temp functions in current cat/db -> catalog function.
>>>>>>>>> (Resolution for partially or fully qualified function reference is:
>>>>>>> temp
>>>>>>>>> functions -> persistent functions.)
>>>>>>>>> 
>>>>>>>>> 3. In addition to #1, support creating and referencing temporary
>>>>>>>> functions
>>>>>>>>> associated with a cat/db with a special namespace for built-in
>>>>>>> functions
>>>>>>>>> and global temp functions. The resolution is the same as #2, except
>>>>>>> that
>>>>>>>>> the special namespace might be prefixed to a reference to a
>>>>> built-in
>>>>>>>>> function or global temp function. (In absence of the special
>>>>>> namespace,
>>>>>>>> the
>>>>>>>>> resolution order is the same as in #2.)
>>>>>>>>> 
>>>>>>>>> My personal preference is #1, given the unknown use case and
>>>>>> introduced
>>>>>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative.
>>>>>>> Thus,
>>>>>>>>> my votes are:
>>>>>>>>> 
>>>>>>>>> +1 for #1
>>>>>>>>> +0 for #2
>>>>>>>>> -1 for #3
>>>>>>>>> 
>>>>>>>>> Everyone, please cast your vote (in above format please!), or let
>>>>> me
>>>>>>> know
>>>>>>>>> if you have more questions or other candidates.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Xuefu
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>>>>>> aljoscha@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I think this discussion and the one for FLIP-64 are very
>>>>> connected.
>>>>>>> To
>>>>>>>>>> resolve the differences, think we have to think about the basic
>>>>>>>>> principles
>>>>>>>>>> and find consensus there. The basic questions I see are:
>>>>>>>>>> 
>>>>>>>>>> - Do we want to support overriding builtin functions?
>>>>>>>>>> - Do we want to support overriding catalog functions?
>>>>>>>>>> - And then later: should temporary functions be tied to a
>>>>>>>>>> catalog/database?
>>>>>>>>>> 
>>>>>>>>>> I don’t have much to say about these, except that we should
>>>>>> somewhat
>>>>>>>>> stick
>>>>>>>>>> to what the industry does. But I also understand that the
>>>>> industry
>>>>>> is
>>>>>>>>>> already very divided on this.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Aljoscha
>>>>>>>>>> 
>>>>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
>>>>>> are
>>>>>>>>>> close to the truth. It will waste a lot of time if we resume the
>>>>>>> topic
>>>>>>>>> some
>>>>>>>>>> time later.
>>>>>>>>>>> 
>>>>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>>>>>> “cat.db.fun”
>>>>>>>> way
>>>>>>>>>> to override a catalog function.
>>>>>>>>>>> 
>>>>>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>>>>>> nonexistent
>>>>>>>> cat
>>>>>>>>>> & db? And we still need to do special treatment for the dedicated
>>>>>>>>>> system.system cat & db?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> @Xuefu: I would like to avoid adding too many things
>>>>>>> incrementally.
>>>>>>>>>> Users should be able to override all catalog objects consistently
>>>>>>>>> according
>>>>>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
>>>>>>>> functions
>>>>>>>>>> are treated completely different, we need more code and special
>>>>>>> cases.
>>>>>>>>> From
>>>>>>>>>> an implementation perspective, this topic only affects the lookup
>>>>>>> logic
>>>>>>>>>> which is rather low implementation effort which is why I would
>>>>> like
>>>>>>> to
>>>>>>>>>> clarify the remaining items. As you said, we have a slight
>>>>> consenus
>>>>>>> on
>>>>>>>>>> overriding built-in functions; we should also strive for reaching
>>>>>>>>> consensus
>>>>>>>>>> on the remaining topics.
>>>>>>>>>>>> 
>>>>>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
>>>>>> objects
>>>>>>>>>> consistent and the overriding of built-in functions more
>>>>> explicit.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Timo
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>>>>>>>>>>>>> hi, everyone
>>>>>>>>>>>>> I think this flip is very meaningful. it supports functions
>>>>>> that
>>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>> shared by different catalogs and dbs, reducing the
>>>>> duplication
>>>>>> of
>>>>>>>>>> functions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Our group based on flink's sql parser module implements
>>>>> create
>>>>>>>>> function
>>>>>>>>>>>>> feature, stores the parsed function metadata and schema into
>>>>>>> mysql,
>>>>>>>>> and
>>>>>>>>>>>>> also customizes the catalog, customizes sql-client to support
>>>>>>>> custom
>>>>>>>>>>>>> schemas and functions. Loaded, but the function is currently
>>>>>>>> global,
>>>>>>>>>> and is
>>>>>>>>>>>>> not subdivided according to catalog and db.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In addition, I very much hope to participate in the
>>>>> development
>>>>>>> of
>>>>>>>>> this
>>>>>>>>>>>>> flip, I have been paying attention to the community, but
>>>>> found
>>>>>> it
>>>>>>>> is
>>>>>>>>>> more
>>>>>>>>>>>>> difficult to join.
>>>>>>>>>>>>> thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It seems to me that there is a general consensus on having
>>>>>> temp
>>>>>>>>>> functions
>>>>>>>>>>>>>> that have no namespaces and overwrite built-in functions.
>>>>> (As
>>>>>> a
>>>>>>>> side
>>>>>>>>>> note
>>>>>>>>>>>>>> for comparability, the current user defined functions are
>>>>> all
>>>>>>>>>> temporary and
>>>>>>>>>>>>>> having no namespaces.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
>>>>>> temp
>>>>>>>>>> functions
>>>>>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
>>>>>>>> However,
>>>>>>>>>> this
>>>>>>>>>>>>>> idea appears orthogonal to the former and can be added
>>>>>>>>> incrementally.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> How about we first implement non-namespaced temp functions
>>>>> now
>>>>>>> and
>>>>>>>>>> leave
>>>>>>>>>>>>>> the door open for namespaced ones for later releases as the
>>>>>>>>>> requirement
>>>>>>>>>>>>>> might become more crystal? This also helps shorten the
>>>>> debate
>>>>>>> and
>>>>>>>>>> allow us
>>>>>>>>>>>>>> to make some progress along this direction.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
>>>>>>>>> temporary
>>>>>>>>>> temp
>>>>>>>>>>>>>> functions that don't have namespaces, my only concern is the
>>>>>>>> special
>>>>>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
>>>>>> evident
>>>>>>> in
>>>>>>>>>> treating
>>>>>>>>>>>>>> the built-in catalog currently.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Xuefiu
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>>>>>>>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
>>>>>> about
>>>>>>>> we
>>>>>>>>>> have a
>>>>>>>>>>>>>>> special namespace (catalog + database) for built-in
>>>>> objects?
>>>>>>> This
>>>>>>>>>> catalog
>>>>>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Then users could still override built-in functions, if they
>>>>>>> fully
>>>>>>>>>> qualify
>>>>>>>>>>>>>>> object with the built-in namespace, but by default the
>>>>> common
>>>>>>>> logic
>>>>>>>>>> of
>>>>>>>>>>>>>>> current dB & cat would be used.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>>>>>>>>>>>>>>> registers temporary function in current cat & dB
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>>>>>>>>>>>>> registers temporary function in cat db
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>>>>>>>>>>>>> Overrides built-in function with temporary function
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The built-in/system namespace would not be writable for
>>>>>>> permanent
>>>>>>>>>>>>>> objects.
>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This way I think we can have benefits of both solutions.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>>>>> twalthr@apache.org
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I understand the potential benefit of overriding certain
>>>>>>>> built-in
>>>>>>>>>>>>>>>> functions. I'm open to such a feature if many people
>>>>> agree.
>>>>>>>>>> However, it
>>>>>>>>>>>>>>>> would be great to still support overriding catalog
>>>>> functions
>>>>>>>> with
>>>>>>>>>>>>>>>> temporary functions in order to prototype a query even
>>>>>> though
>>>>>>> a
>>>>>>>>>>>>>>>> catalog/database might not be available currently or
>>>>> should
>>>>>>> not
>>>>>>>> be
>>>>>>>>>>>>>>>> modified yet. How about we support both cases?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>>>>>>>>>>>>>>>> -> creates/overrides a built-in function and never
>>>>>> consideres
>>>>>>>>>> current
>>>>>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
>>>>>>> acceptable
>>>>>>>>> for
>>>>>>>>>>>>>>>> functions I guess.
>>>>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>>>>>>>>>>>>>> -> creates/overrides a catalog function
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
>>>>>>> (tables,
>>>>>>>>>> views)
>>>>>>>>>>>>>>>> except functions", this might change in the near future.
>>>>>> Take
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
>>>>>>>> example.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>>>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
>>>>>> thus I
>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>> include that as a voting option, and the discussion is
>>>>>> mainly
>>>>>>>>>> between
>>>>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Re > However, it means that temp functions are
>>>>> differently
>>>>>>>>> treated
>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>> other db objects.
>>>>>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
>>>>>>>>> functions
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> bit different from other objects - Flink don't have any
>>>>>> other
>>>>>>>>>>>>>> built-in
>>>>>>>>>>>>>>>>> objects (tables, views) except functions.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Xuefu Zhang
>>>>>>>>> 
>>>>>>>>> "In Honey We Trust!"
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Xuefu Zhang
>>>>>>> 
>>>>>>> "In Honey We Trust!"
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Xuefu Zhang
>>>>> 
>>>>> "In Honey We Trust!"
>>>>> 
>>> 
>>> 


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by JingsongLee <lz...@aliyun.com.INVALID>.
I know Hive and Spark can shadow built-in functions by temporary function.
Mysql, Oracle, Sql server can not shadow.
User can use full names to access functions instead of shadowing.

So I think it is a completely new thing, and the direct way to deal with new things is to add new grammar. So,
+1 for #2, +0 for #3, -1 for #1

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <yk...@gmail.com>
Send Time:2019年9月19日(星期四) 16:43
To:dev <de...@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

And let me make my vote complete:

-1 for #1
+1 for #2 with different keyword
-0 for #3

Best,
Kurt


On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:

> Looks like I'm the only person who is willing to +1 to #2 for now :-)
> But I would suggest to change the keyword from GLOBAL to
> something like BUILTIN.
>
> I think #2 and #3 are almost the same proposal, just with different
> format to indicate whether it want to override built-in functions.
>
> My biggest reason to choose it is I want this behavior be consistent
> with temporal tables. I will give some examples to show the behavior
> and also make sure I'm not misunderstanding anything here.
>
> For most DBs, when user create a temporary table with:
>
> CREATE TEMPORARY TABLE t1
>
> It's actually equivalent with:
>
> CREATE TEMPORARY TABLE `curent_db`.t1
>
> If user change current database, they will not be able to access t1 without
> fully qualified name, .i.e db1.t1 (assuming db1 is current database when
> this temporary table is created).
>
> Only #2 and #3 followed this behavior and I would vote for this since this
> makes such behavior consistent through temporal tables and functions.
>
> Why I'm not voting for #3 is a special catalog and database just looks very
> hacky to me. It gave a imply that our built-in functions saved at a
> special
> catalog and database, which is actually not. Introducing a dedicated
> keyword
> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> straightforward. One can argue that we should avoid introducing new
> keyword,
> but it's also very rare that a system can overwrite built-in functions.
> Since we
> decided to support this, introduce a new keyword is not a big deal IMO.
>
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
> wrote:
>
>> Hi,
>>
>> It is a quite long discussion to follow and I hope I didn’t misunderstand
>> anything. From the proposals presented by Xuefu I would vote:
>>
>> -1 for #1 and #2
>> +1 for #3
>>
>> Besides #3 being IMO more general and more consistent, having qualified
>> names (#3) would help/make easier for someone to use cross
>> databases/catalogs queries (joining multiple data sets/streams). For
>> example with some functions to manipulate/clean up/convert the stored data
>> in different catalogs registered in the respective catalogs.
>>
>> Piotrek
>>
>> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>> >
>> > I agree with Xuefu that inconsistent handling with all the other
>> objects is
>> > not a big problem.
>> >
>> > Regarding to option#3, the special "system.system" namespace may confuse
>> > users.
>> > Users need to know the set of built-in function names to know when to
>> use
>> > "system.system" namespace.
>> > What will happen if user registers a non-builtin function name under the
>> > "system.system" namespace?
>> > Besides, I think it doesn't solve the "explode" problem I mentioned at
>> the
>> > beginning of this thread.
>> >
>> > So here is my vote:
>> >
>> > +1 for #1
>> > 0 for #2
>> > -1 for #3
>> >
>> > Best,
>> > Jark
>> >
>> >
>> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
>> >
>> >> @Dawid, Re: we also don't need additional referencing the
>> specialcatalog
>> >> anywhere.
>> >>
>> >> True. But once we allow such reference, then user can do so in any
>> possible
>> >> place where a function name is expected, for which we have to handle.
>> >> That's a big difference, I think.
>> >>
>> >> Thanks,
>> >> Xuefu
>> >>
>> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>> >> wysakowicz.dawid@gmail.com>
>> >> wrote:
>> >>
>> >>> @Bowen I am not suggesting introducing additional catalog. I think we
>> >> need
>> >>> to get rid of the current built-in catalog.
>> >>>
>> >>> @Xuefu in option #3 we also don't need additional referencing the
>> special
>> >>> catalog anywhere else besides in the CREATE statement. The resolution
>> >>> behaviour is exactly the same in both options.
>> >>>
>> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>> >>>
>> >>>> Hi Dawid,
>> >>>>
>> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It
>> can
>> >> be
>> >>>> changed to something else for better.
>> >>>>
>> >>>> The difference between this and the #3 approach is that we only need
>> >> the
>> >>>> keyword for this create DDL. For other places (such as function
>> >>>> referencing), no keyword or special namespace is needed.
>> >>>>
>> >>>> Thanks,
>> >>>> Xuefu
>> >>>>
>> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>> >>>> wysakowicz.dawid@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>> I think it makes sense to start voting at this point.
>> >>>>>
>> >>>>> Option 1: Only 1-part identifiers
>> >>>>> PROS:
>> >>>>> - allows shadowing built-in functions
>> >>>>> CONS:
>> >>>>> - incosistent with all the other objects, both permanent & temporary
>> >>>>> - does not allow shadowing catalog functions
>> >>>>>
>> >>>>> Option 2: Special keyword for built-in function
>> >>>>> I think this is quite similar to the special catalog/db. The thing I
>> >> am
>> >>>>> strongly against in this proposal is the GLOBAL keyword. This
>> keyword
>> >>>> has a
>> >>>>> meaning in rdbms systems and means a function that is present for a
>> >>>>> lifetime of a session in which it was created, but available in all
>> >>> other
>> >>>>> sessions. Therefore I really don't want to use this keyword in a
>> >>>> different
>> >>>>> context.
>> >>>>>
>> >>>>> Option 3: Special catalog/db
>> >>>>>
>> >>>>> PROS:
>> >>>>> - allows shadowing built-in functions
>> >>>>> - allows shadowing catalog functions
>> >>>>> - consistent with other objects
>> >>>>> CONS:
>> >>>>> - we introduce a special namespace for built-in functions
>> >>>>>
>> >>>>> I don't see a problem with introducing the special namespace. In the
>> >>> end
>> >>>> it
>> >>>>> is very similar to the keyword approach. In this case the catalog/db
>> >>>>> combination would be the "keyword"
>> >>>>>
>> >>>>> Therefore my votes:
>> >>>>> Option 1: -0
>> >>>>> Option 2: -1 (I might change to +0 if we can come up with a better
>> >>>> keyword)
>> >>>>> Option 3: +1
>> >>>>>
>> >>>>> Best,
>> >>>>> Dawid
>> >>>>>
>> >>>>>
>> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>> >>>>>
>> >>>>>> Hi Aljoscha,
>> >>>>>>
>> >>>>>> Thanks for the summary and these are great questions to be
>> >> answered.
>> >>>> The
>> >>>>>> answer to your first question is clear: there is a general
>> >> agreement
>> >>> to
>> >>>>>> override built-in functions with temp functions.
>> >>>>>>
>> >>>>>> However, your second and third questions are sort of related, as a
>> >>>>> function
>> >>>>>> reference can be either just function name (like "func") or in the
>> >>> form
>> >>>>> or
>> >>>>>> "cat.db.func". When a reference is just function name, it can mean
>> >>>>> either a
>> >>>>>> built-in function or a function defined in the current cat/db. If
>> >> we
>> >>>>>> support overriding a built-in function with a temp function, such
>> >>>>>> overriding can also cover a function in the current cat/db.
>> >>>>>>
>> >>>>>> I think what Timo referred as "overriding a catalog function"
>> >> means a
>> >>>>> temp
>> >>>>>> function defined as "cat.db.func" overrides a catalog function
>> >> "func"
>> >>>> in
>> >>>>>> cat/db even if cat/db is not current. To support this, temp
>> >> function
>> >>>> has
>> >>>>> to
>> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>> >>>>> questions
>> >>>>>> are related. The problem with such support is the ambiguity when
>> >> user
>> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
>> >>> ...".
>> >>>>>> Here "func" can means a global temp function, or a temp function in
>> >>>>> current
>> >>>>>> cat/db. If we can assume the former, this creates an inconsistency
>> >>>>> because
>> >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db.
>> >>> If
>> >>>> we
>> >>>>>> assume the latter, then there is no way for user to create a global
>> >>>> temp
>> >>>>>> function.
>> >>>>>>
>> >>>>>> Giving a special namespace for built-in functions may solve the
>> >>>> ambiguity
>> >>>>>> problem above, but it also introduces artificial catalog/database
>> >>> that
>> >>>>>> needs special treatment and pollutes the cleanness of  the code. I
>> >>>> would
>> >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>> >>>>>>
>> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting
>> >>>>> purposes:
>> >>>>>>
>> >>>>>> 1. Support only global, temporary functions without namespace. Such
>> >>>> temp
>> >>>>>> functions overrides built-in functions and catalog functions in
>> >>> current
>> >>>>>> cat/db. The resolution order is: temp functions -> built-in
>> >> functions
>> >>>> ->
>> >>>>>> catalog functions. (Partially or fully qualified functions has no
>> >>>>>> ambiguity!)
>> >>>>>>
>> >>>>>> 2. In addition to #1, support creating and referencing temporary
>> >>>>> functions
>> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global
>> >>> temp
>> >>>>>> functions. The resolution order is: global temp functions ->
>> >> built-in
>> >>>>>> functions -> temp functions in current cat/db -> catalog function.
>> >>>>>> (Resolution for partially or fully qualified function reference is:
>> >>>> temp
>> >>>>>> functions -> persistent functions.)
>> >>>>>>
>> >>>>>> 3. In addition to #1, support creating and referencing temporary
>> >>>>> functions
>> >>>>>> associated with a cat/db with a special namespace for built-in
>> >>>> functions
>> >>>>>> and global temp functions. The resolution is the same as #2, except
>> >>>> that
>> >>>>>> the special namespace might be prefixed to a reference to a
>> >> built-in
>> >>>>>> function or global temp function. (In absence of the special
>> >>> namespace,
>> >>>>> the
>> >>>>>> resolution order is the same as in #2.)
>> >>>>>>
>> >>>>>> My personal preference is #1, given the unknown use case and
>> >>> introduced
>> >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative.
>> >>>> Thus,
>> >>>>>> my votes are:
>> >>>>>>
>> >>>>>> +1 for #1
>> >>>>>> +0 for #2
>> >>>>>> -1 for #3
>> >>>>>>
>> >>>>>> Everyone, please cast your vote (in above format please!), or let
>> >> me
>> >>>> know
>> >>>>>> if you have more questions or other candidates.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Xuefu
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>> >>> aljoscha@apache.org>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I think this discussion and the one for FLIP-64 are very
>> >> connected.
>> >>>> To
>> >>>>>>> resolve the differences, think we have to think about the basic
>> >>>>>> principles
>> >>>>>>> and find consensus there. The basic questions I see are:
>> >>>>>>>
>> >>>>>>> - Do we want to support overriding builtin functions?
>> >>>>>>> - Do we want to support overriding catalog functions?
>> >>>>>>> - And then later: should temporary functions be tied to a
>> >>>>>>> catalog/database?
>> >>>>>>>
>> >>>>>>> I don’t have much to say about these, except that we should
>> >>> somewhat
>> >>>>>> stick
>> >>>>>>> to what the industry does. But I also understand that the
>> >> industry
>> >>> is
>> >>>>>>> already very divided on this.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Aljoscha
>> >>>>>>>
>> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
>> >>> are
>> >>>>>>> close to the truth. It will waste a lot of time if we resume the
>> >>>> topic
>> >>>>>> some
>> >>>>>>> time later.
>> >>>>>>>>
>> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>> >>> “cat.db.fun”
>> >>>>> way
>> >>>>>>> to override a catalog function.
>> >>>>>>>>
>> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>> >>> nonexistent
>> >>>>> cat
>> >>>>>>> & db? And we still need to do special treatment for the dedicated
>> >>>>>>> system.system cat & db?
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Jark
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>> >>>>>>>>>
>> >>>>>>>>> Hi everyone,
>> >>>>>>>>>
>> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
>> >>>> incrementally.
>> >>>>>>> Users should be able to override all catalog objects consistently
>> >>>>>> according
>> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
>> >>>>> functions
>> >>>>>>> are treated completely different, we need more code and special
>> >>>> cases.
>> >>>>>> From
>> >>>>>>> an implementation perspective, this topic only affects the lookup
>> >>>> logic
>> >>>>>>> which is rather low implementation effort which is why I would
>> >> like
>> >>>> to
>> >>>>>>> clarify the remaining items. As you said, we have a slight
>> >> consenus
>> >>>> on
>> >>>>>>> overriding built-in functions; we should also strive for reaching
>> >>>>>> consensus
>> >>>>>>> on the remaining topics.
>> >>>>>>>>>
>> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
>> >>> objects
>> >>>>>>> consistent and the overriding of built-in functions more
>> >> explicit.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Timo
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>> >>>>>>>>>> hi, everyone
>> >>>>>>>>>> I think this flip is very meaningful. it supports functions
>> >>> that
>> >>>>> can
>> >>>>>> be
>> >>>>>>>>>> shared by different catalogs and dbs, reducing the
>> >> duplication
>> >>> of
>> >>>>>>> functions.
>> >>>>>>>>>>
>> >>>>>>>>>> Our group based on flink's sql parser module implements
>> >> create
>> >>>>>> function
>> >>>>>>>>>> feature, stores the parsed function metadata and schema into
>> >>>> mysql,
>> >>>>>> and
>> >>>>>>>>>> also customizes the catalog, customizes sql-client to support
>> >>>>> custom
>> >>>>>>>>>> schemas and functions. Loaded, but the function is currently
>> >>>>> global,
>> >>>>>>> and is
>> >>>>>>>>>> not subdivided according to catalog and db.
>> >>>>>>>>>>
>> >>>>>>>>>> In addition, I very much hope to participate in the
>> >> development
>> >>>> of
>> >>>>>> this
>> >>>>>>>>>> flip, I have been paying attention to the community, but
>> >> found
>> >>> it
>> >>>>> is
>> >>>>>>> more
>> >>>>>>>>>> difficult to join.
>> >>>>>>>>>> thank you.
>> >>>>>>>>>>
>> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>> >>>>>>>>>>>
>> >>>>>>>>>>> It seems to me that there is a general consensus on having
>> >>> temp
>> >>>>>>> functions
>> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
>> >> (As
>> >>> a
>> >>>>> side
>> >>>>>>> note
>> >>>>>>>>>>> for comparability, the current user defined functions are
>> >> all
>> >>>>>>> temporary and
>> >>>>>>>>>>> having no namespaces.)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
>> >>> temp
>> >>>>>>> functions
>> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
>> >>>>> However,
>> >>>>>>> this
>> >>>>>>>>>>> idea appears orthogonal to the former and can be added
>> >>>>>> incrementally.
>> >>>>>>>>>>>
>> >>>>>>>>>>> How about we first implement non-namespaced temp functions
>> >> now
>> >>>> and
>> >>>>>>> leave
>> >>>>>>>>>>> the door open for namespaced ones for later releases as the
>> >>>>>>> requirement
>> >>>>>>>>>>> might become more crystal? This also helps shorten the
>> >> debate
>> >>>> and
>> >>>>>>> allow us
>> >>>>>>>>>>> to make some progress along this direction.
>> >>>>>>>>>>>
>> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
>> >>>>>> temporary
>> >>>>>>> temp
>> >>>>>>>>>>> functions that don't have namespaces, my only concern is the
>> >>>>> special
>> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
>> >>> evident
>> >>>> in
>> >>>>>>> treating
>> >>>>>>>>>>> the built-in catalog currently.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Xuefiu
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi,
>> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
>> >>> about
>> >>>>> we
>> >>>>>>> have a
>> >>>>>>>>>>>> special namespace (catalog + database) for built-in
>> >> objects?
>> >>>> This
>> >>>>>>> catalog
>> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Then users could still override built-in functions, if they
>> >>>> fully
>> >>>>>>> qualify
>> >>>>>>>>>>>> object with the built-in namespace, but by default the
>> >> common
>> >>>>> logic
>> >>>>>>> of
>> >>>>>>>>>>>> current dB & cat would be used.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>> >>>>>>>>>>>> registers temporary function in current cat & dB
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>> >>>>>>>>>>>> registers temporary function in cat db
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>> >>>>>>>>>>>> Overrides built-in function with temporary function
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The built-in/system namespace would not be writable for
>> >>>> permanent
>> >>>>>>>>>>> objects.
>> >>>>>>>>>>>> WDYT?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best,
>> >>>>>>>>>>>> Dawid
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>> >> twalthr@apache.org
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I understand the potential benefit of overriding certain
>> >>>>> built-in
>> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
>> >> agree.
>> >>>>>>> However, it
>> >>>>>>>>>>>>> would be great to still support overriding catalog
>> >> functions
>> >>>>> with
>> >>>>>>>>>>>>> temporary functions in order to prototype a query even
>> >>> though
>> >>>> a
>> >>>>>>>>>>>>> catalog/database might not be available currently or
>> >> should
>> >>>> not
>> >>>>> be
>> >>>>>>>>>>>>> modified yet. How about we support both cases?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
>> >>> consideres
>> >>>>>>> current
>> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
>> >>>> acceptable
>> >>>>>> for
>> >>>>>>>>>>>>> functions I guess.
>> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>> >>>>>>>>>>>>> -> creates/overrides a catalog function
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
>> >>>> (tables,
>> >>>>>>> views)
>> >>>>>>>>>>>>> except functions", this might change in the near future.
>> >>> Take
>> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
>> >>>>> example.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>> >>>>>>>>>>>>>> Hi Fabian,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
>> >>> thus I
>> >>>>>>> didn't
>> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
>> >>> mainly
>> >>>>>>> between
>> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Re > However, it means that temp functions are
>> >> differently
>> >>>>>> treated
>> >>>>>>>>>>> than
>> >>>>>>>>>>>>>> other db objects.
>> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
>> >>>>>> functions
>> >>>>>>>>>>> are
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any
>> >>> other
>> >>>>>>>>>>> built-in
>> >>>>>>>>>>>>>> objects (tables, views) except functions.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>> Bowen
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Xuefu Zhang
>> >>>>>>>>>>>
>> >>>>>>>>>>> "In Honey We Trust!"
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Xuefu Zhang
>> >>>>>>
>> >>>>>> "In Honey We Trust!"
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Xuefu Zhang
>> >>>>
>> >>>> "In Honey We Trust!"
>> >>>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Xuefu Zhang
>> >>
>> >> "In Honey We Trust!"
>> >>
>>
>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
And let me make my vote complete:

-1 for #1
+1 for #2 with different keyword
-0 for #3

Best,
Kurt


On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <yk...@gmail.com> wrote:

> Looks like I'm the only person who is willing to +1 to #2 for now :-)
> But I would suggest to change the keyword from GLOBAL to
> something like BUILTIN.
>
> I think #2 and #3 are almost the same proposal, just with different
> format to indicate whether it want to override built-in functions.
>
> My biggest reason to choose it is I want this behavior be consistent
> with temporal tables. I will give some examples to show the behavior
> and also make sure I'm not misunderstanding anything here.
>
> For most DBs, when user create a temporary table with:
>
> CREATE TEMPORARY TABLE t1
>
> It's actually equivalent with:
>
> CREATE TEMPORARY TABLE `curent_db`.t1
>
> If user change current database, they will not be able to access t1 without
> fully qualified name, .i.e db1.t1 (assuming db1 is current database when
> this temporary table is created).
>
> Only #2 and #3 followed this behavior and I would vote for this since this
> makes such behavior consistent through temporal tables and functions.
>
> Why I'm not voting for #3 is a special catalog and database just looks very
> hacky to me. It gave a imply that our built-in functions saved at a
> special
> catalog and database, which is actually not. Introducing a dedicated
> keyword
> like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
> straightforward. One can argue that we should avoid introducing new
> keyword,
> but it's also very rare that a system can overwrite built-in functions.
> Since we
> decided to support this, introduce a new keyword is not a big deal IMO.
>
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com>
> wrote:
>
>> Hi,
>>
>> It is a quite long discussion to follow and I hope I didn’t misunderstand
>> anything. From the proposals presented by Xuefu I would vote:
>>
>> -1 for #1 and #2
>> +1 for #3
>>
>> Besides #3 being IMO more general and more consistent, having qualified
>> names (#3) would help/make easier for someone to use cross
>> databases/catalogs queries (joining multiple data sets/streams). For
>> example with some functions to manipulate/clean up/convert the stored data
>> in different catalogs registered in the respective catalogs.
>>
>> Piotrek
>>
>> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
>> >
>> > I agree with Xuefu that inconsistent handling with all the other
>> objects is
>> > not a big problem.
>> >
>> > Regarding to option#3, the special "system.system" namespace may confuse
>> > users.
>> > Users need to know the set of built-in function names to know when to
>> use
>> > "system.system" namespace.
>> > What will happen if user registers a non-builtin function name under the
>> > "system.system" namespace?
>> > Besides, I think it doesn't solve the "explode" problem I mentioned at
>> the
>> > beginning of this thread.
>> >
>> > So here is my vote:
>> >
>> > +1 for #1
>> > 0 for #2
>> > -1 for #3
>> >
>> > Best,
>> > Jark
>> >
>> >
>> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
>> >
>> >> @Dawid, Re: we also don't need additional referencing the
>> specialcatalog
>> >> anywhere.
>> >>
>> >> True. But once we allow such reference, then user can do so in any
>> possible
>> >> place where a function name is expected, for which we have to handle.
>> >> That's a big difference, I think.
>> >>
>> >> Thanks,
>> >> Xuefu
>> >>
>> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>> >> wysakowicz.dawid@gmail.com>
>> >> wrote:
>> >>
>> >>> @Bowen I am not suggesting introducing additional catalog. I think we
>> >> need
>> >>> to get rid of the current built-in catalog.
>> >>>
>> >>> @Xuefu in option #3 we also don't need additional referencing the
>> special
>> >>> catalog anywhere else besides in the CREATE statement. The resolution
>> >>> behaviour is exactly the same in both options.
>> >>>
>> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>> >>>
>> >>>> Hi Dawid,
>> >>>>
>> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It
>> can
>> >> be
>> >>>> changed to something else for better.
>> >>>>
>> >>>> The difference between this and the #3 approach is that we only need
>> >> the
>> >>>> keyword for this create DDL. For other places (such as function
>> >>>> referencing), no keyword or special namespace is needed.
>> >>>>
>> >>>> Thanks,
>> >>>> Xuefu
>> >>>>
>> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>> >>>> wysakowicz.dawid@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>> I think it makes sense to start voting at this point.
>> >>>>>
>> >>>>> Option 1: Only 1-part identifiers
>> >>>>> PROS:
>> >>>>> - allows shadowing built-in functions
>> >>>>> CONS:
>> >>>>> - incosistent with all the other objects, both permanent & temporary
>> >>>>> - does not allow shadowing catalog functions
>> >>>>>
>> >>>>> Option 2: Special keyword for built-in function
>> >>>>> I think this is quite similar to the special catalog/db. The thing I
>> >> am
>> >>>>> strongly against in this proposal is the GLOBAL keyword. This
>> keyword
>> >>>> has a
>> >>>>> meaning in rdbms systems and means a function that is present for a
>> >>>>> lifetime of a session in which it was created, but available in all
>> >>> other
>> >>>>> sessions. Therefore I really don't want to use this keyword in a
>> >>>> different
>> >>>>> context.
>> >>>>>
>> >>>>> Option 3: Special catalog/db
>> >>>>>
>> >>>>> PROS:
>> >>>>> - allows shadowing built-in functions
>> >>>>> - allows shadowing catalog functions
>> >>>>> - consistent with other objects
>> >>>>> CONS:
>> >>>>> - we introduce a special namespace for built-in functions
>> >>>>>
>> >>>>> I don't see a problem with introducing the special namespace. In the
>> >>> end
>> >>>> it
>> >>>>> is very similar to the keyword approach. In this case the catalog/db
>> >>>>> combination would be the "keyword"
>> >>>>>
>> >>>>> Therefore my votes:
>> >>>>> Option 1: -0
>> >>>>> Option 2: -1 (I might change to +0 if we can come up with a better
>> >>>> keyword)
>> >>>>> Option 3: +1
>> >>>>>
>> >>>>> Best,
>> >>>>> Dawid
>> >>>>>
>> >>>>>
>> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>> >>>>>
>> >>>>>> Hi Aljoscha,
>> >>>>>>
>> >>>>>> Thanks for the summary and these are great questions to be
>> >> answered.
>> >>>> The
>> >>>>>> answer to your first question is clear: there is a general
>> >> agreement
>> >>> to
>> >>>>>> override built-in functions with temp functions.
>> >>>>>>
>> >>>>>> However, your second and third questions are sort of related, as a
>> >>>>> function
>> >>>>>> reference can be either just function name (like "func") or in the
>> >>> form
>> >>>>> or
>> >>>>>> "cat.db.func". When a reference is just function name, it can mean
>> >>>>> either a
>> >>>>>> built-in function or a function defined in the current cat/db. If
>> >> we
>> >>>>>> support overriding a built-in function with a temp function, such
>> >>>>>> overriding can also cover a function in the current cat/db.
>> >>>>>>
>> >>>>>> I think what Timo referred as "overriding a catalog function"
>> >> means a
>> >>>>> temp
>> >>>>>> function defined as "cat.db.func" overrides a catalog function
>> >> "func"
>> >>>> in
>> >>>>>> cat/db even if cat/db is not current. To support this, temp
>> >> function
>> >>>> has
>> >>>>> to
>> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>> >>>>> questions
>> >>>>>> are related. The problem with such support is the ambiguity when
>> >> user
>> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
>> >>> ...".
>> >>>>>> Here "func" can means a global temp function, or a temp function in
>> >>>>> current
>> >>>>>> cat/db. If we can assume the former, this creates an inconsistency
>> >>>>> because
>> >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db.
>> >>> If
>> >>>> we
>> >>>>>> assume the latter, then there is no way for user to create a global
>> >>>> temp
>> >>>>>> function.
>> >>>>>>
>> >>>>>> Giving a special namespace for built-in functions may solve the
>> >>>> ambiguity
>> >>>>>> problem above, but it also introduces artificial catalog/database
>> >>> that
>> >>>>>> needs special treatment and pollutes the cleanness of  the code. I
>> >>>> would
>> >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>> >>>>>>
>> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting
>> >>>>> purposes:
>> >>>>>>
>> >>>>>> 1. Support only global, temporary functions without namespace. Such
>> >>>> temp
>> >>>>>> functions overrides built-in functions and catalog functions in
>> >>> current
>> >>>>>> cat/db. The resolution order is: temp functions -> built-in
>> >> functions
>> >>>> ->
>> >>>>>> catalog functions. (Partially or fully qualified functions has no
>> >>>>>> ambiguity!)
>> >>>>>>
>> >>>>>> 2. In addition to #1, support creating and referencing temporary
>> >>>>> functions
>> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global
>> >>> temp
>> >>>>>> functions. The resolution order is: global temp functions ->
>> >> built-in
>> >>>>>> functions -> temp functions in current cat/db -> catalog function.
>> >>>>>> (Resolution for partially or fully qualified function reference is:
>> >>>> temp
>> >>>>>> functions -> persistent functions.)
>> >>>>>>
>> >>>>>> 3. In addition to #1, support creating and referencing temporary
>> >>>>> functions
>> >>>>>> associated with a cat/db with a special namespace for built-in
>> >>>> functions
>> >>>>>> and global temp functions. The resolution is the same as #2, except
>> >>>> that
>> >>>>>> the special namespace might be prefixed to a reference to a
>> >> built-in
>> >>>>>> function or global temp function. (In absence of the special
>> >>> namespace,
>> >>>>> the
>> >>>>>> resolution order is the same as in #2.)
>> >>>>>>
>> >>>>>> My personal preference is #1, given the unknown use case and
>> >>> introduced
>> >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative.
>> >>>> Thus,
>> >>>>>> my votes are:
>> >>>>>>
>> >>>>>> +1 for #1
>> >>>>>> +0 for #2
>> >>>>>> -1 for #3
>> >>>>>>
>> >>>>>> Everyone, please cast your vote (in above format please!), or let
>> >> me
>> >>>> know
>> >>>>>> if you have more questions or other candidates.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Xuefu
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>> >>> aljoscha@apache.org>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I think this discussion and the one for FLIP-64 are very
>> >> connected.
>> >>>> To
>> >>>>>>> resolve the differences, think we have to think about the basic
>> >>>>>> principles
>> >>>>>>> and find consensus there. The basic questions I see are:
>> >>>>>>>
>> >>>>>>> - Do we want to support overriding builtin functions?
>> >>>>>>> - Do we want to support overriding catalog functions?
>> >>>>>>> - And then later: should temporary functions be tied to a
>> >>>>>>> catalog/database?
>> >>>>>>>
>> >>>>>>> I don’t have much to say about these, except that we should
>> >>> somewhat
>> >>>>>> stick
>> >>>>>>> to what the industry does. But I also understand that the
>> >> industry
>> >>> is
>> >>>>>>> already very divided on this.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Aljoscha
>> >>>>>>>
>> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
>> >>> are
>> >>>>>>> close to the truth. It will waste a lot of time if we resume the
>> >>>> topic
>> >>>>>> some
>> >>>>>>> time later.
>> >>>>>>>>
>> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>> >>> “cat.db.fun”
>> >>>>> way
>> >>>>>>> to override a catalog function.
>> >>>>>>>>
>> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>> >>> nonexistent
>> >>>>> cat
>> >>>>>>> & db? And we still need to do special treatment for the dedicated
>> >>>>>>> system.system cat & db?
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Jark
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>> >>>>>>>>>
>> >>>>>>>>> Hi everyone,
>> >>>>>>>>>
>> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
>> >>>> incrementally.
>> >>>>>>> Users should be able to override all catalog objects consistently
>> >>>>>> according
>> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
>> >>>>> functions
>> >>>>>>> are treated completely different, we need more code and special
>> >>>> cases.
>> >>>>>> From
>> >>>>>>> an implementation perspective, this topic only affects the lookup
>> >>>> logic
>> >>>>>>> which is rather low implementation effort which is why I would
>> >> like
>> >>>> to
>> >>>>>>> clarify the remaining items. As you said, we have a slight
>> >> consenus
>> >>>> on
>> >>>>>>> overriding built-in functions; we should also strive for reaching
>> >>>>>> consensus
>> >>>>>>> on the remaining topics.
>> >>>>>>>>>
>> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
>> >>> objects
>> >>>>>>> consistent and the overriding of built-in functions more
>> >> explicit.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Timo
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>> >>>>>>>>>> hi, everyone
>> >>>>>>>>>> I think this flip is very meaningful. it supports functions
>> >>> that
>> >>>>> can
>> >>>>>> be
>> >>>>>>>>>> shared by different catalogs and dbs, reducing the
>> >> duplication
>> >>> of
>> >>>>>>> functions.
>> >>>>>>>>>>
>> >>>>>>>>>> Our group based on flink's sql parser module implements
>> >> create
>> >>>>>> function
>> >>>>>>>>>> feature, stores the parsed function metadata and schema into
>> >>>> mysql,
>> >>>>>> and
>> >>>>>>>>>> also customizes the catalog, customizes sql-client to support
>> >>>>> custom
>> >>>>>>>>>> schemas and functions. Loaded, but the function is currently
>> >>>>> global,
>> >>>>>>> and is
>> >>>>>>>>>> not subdivided according to catalog and db.
>> >>>>>>>>>>
>> >>>>>>>>>> In addition, I very much hope to participate in the
>> >> development
>> >>>> of
>> >>>>>> this
>> >>>>>>>>>> flip, I have been paying attention to the community, but
>> >> found
>> >>> it
>> >>>>> is
>> >>>>>>> more
>> >>>>>>>>>> difficult to join.
>> >>>>>>>>>> thank you.
>> >>>>>>>>>>
>> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>> >>>>>>>>>>>
>> >>>>>>>>>>> It seems to me that there is a general consensus on having
>> >>> temp
>> >>>>>>> functions
>> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
>> >> (As
>> >>> a
>> >>>>> side
>> >>>>>>> note
>> >>>>>>>>>>> for comparability, the current user defined functions are
>> >> all
>> >>>>>>> temporary and
>> >>>>>>>>>>> having no namespaces.)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
>> >>> temp
>> >>>>>>> functions
>> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
>> >>>>> However,
>> >>>>>>> this
>> >>>>>>>>>>> idea appears orthogonal to the former and can be added
>> >>>>>> incrementally.
>> >>>>>>>>>>>
>> >>>>>>>>>>> How about we first implement non-namespaced temp functions
>> >> now
>> >>>> and
>> >>>>>>> leave
>> >>>>>>>>>>> the door open for namespaced ones for later releases as the
>> >>>>>>> requirement
>> >>>>>>>>>>> might become more crystal? This also helps shorten the
>> >> debate
>> >>>> and
>> >>>>>>> allow us
>> >>>>>>>>>>> to make some progress along this direction.
>> >>>>>>>>>>>
>> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
>> >>>>>> temporary
>> >>>>>>> temp
>> >>>>>>>>>>> functions that don't have namespaces, my only concern is the
>> >>>>> special
>> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
>> >>> evident
>> >>>> in
>> >>>>>>> treating
>> >>>>>>>>>>> the built-in catalog currently.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Xuefiu
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi,
>> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
>> >>> about
>> >>>>> we
>> >>>>>>> have a
>> >>>>>>>>>>>> special namespace (catalog + database) for built-in
>> >> objects?
>> >>>> This
>> >>>>>>> catalog
>> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Then users could still override built-in functions, if they
>> >>>> fully
>> >>>>>>> qualify
>> >>>>>>>>>>>> object with the built-in namespace, but by default the
>> >> common
>> >>>>> logic
>> >>>>>>> of
>> >>>>>>>>>>>> current dB & cat would be used.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>> >>>>>>>>>>>> registers temporary function in current cat & dB
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>> >>>>>>>>>>>> registers temporary function in cat db
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>> >>>>>>>>>>>> Overrides built-in function with temporary function
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The built-in/system namespace would not be writable for
>> >>>> permanent
>> >>>>>>>>>>> objects.
>> >>>>>>>>>>>> WDYT?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best,
>> >>>>>>>>>>>> Dawid
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>> >> twalthr@apache.org
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi Bowen,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I understand the potential benefit of overriding certain
>> >>>>> built-in
>> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
>> >> agree.
>> >>>>>>> However, it
>> >>>>>>>>>>>>> would be great to still support overriding catalog
>> >> functions
>> >>>>> with
>> >>>>>>>>>>>>> temporary functions in order to prototype a query even
>> >>> though
>> >>>> a
>> >>>>>>>>>>>>> catalog/database might not be available currently or
>> >> should
>> >>>> not
>> >>>>> be
>> >>>>>>>>>>>>> modified yet. How about we support both cases?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
>> >>> consideres
>> >>>>>>> current
>> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
>> >>>> acceptable
>> >>>>>> for
>> >>>>>>>>>>>>> functions I guess.
>> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>> >>>>>>>>>>>>> -> creates/overrides a catalog function
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
>> >>>> (tables,
>> >>>>>>> views)
>> >>>>>>>>>>>>> except functions", this might change in the near future.
>> >>> Take
>> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
>> >>>>> example.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>> >>>>>>>>>>>>>> Hi Fabian,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
>> >>> thus I
>> >>>>>>> didn't
>> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
>> >>> mainly
>> >>>>>>> between
>> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Re > However, it means that temp functions are
>> >> differently
>> >>>>>> treated
>> >>>>>>>>>>> than
>> >>>>>>>>>>>>>> other db objects.
>> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
>> >>>>>> functions
>> >>>>>>>>>>> are
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any
>> >>> other
>> >>>>>>>>>>> built-in
>> >>>>>>>>>>>>>> objects (tables, views) except functions.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>> Bowen
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Xuefu Zhang
>> >>>>>>>>>>>
>> >>>>>>>>>>> "In Honey We Trust!"
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Xuefu Zhang
>> >>>>>>
>> >>>>>> "In Honey We Trust!"
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Xuefu Zhang
>> >>>>
>> >>>> "In Honey We Trust!"
>> >>>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Xuefu Zhang
>> >>
>> >> "In Honey We Trust!"
>> >>
>>
>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
Looks like I'm the only person who is willing to +1 to #2 for now :-)
But I would suggest to change the keyword from GLOBAL to
something like BUILTIN.

I think #2 and #3 are almost the same proposal, just with different
format to indicate whether it want to override built-in functions.

My biggest reason to choose it is I want this behavior be consistent
with temporal tables. I will give some examples to show the behavior
and also make sure I'm not misunderstanding anything here.

For most DBs, when user create a temporary table with:

CREATE TEMPORARY TABLE t1

It's actually equivalent with:

CREATE TEMPORARY TABLE `curent_db`.t1

If user change current database, they will not be able to access t1 without
fully qualified name, .i.e db1.t1 (assuming db1 is current database when
this temporary table is created).

Only #2 and #3 followed this behavior and I would vote for this since this
makes such behavior consistent through temporal tables and functions.

Why I'm not voting for #3 is a special catalog and database just looks very
hacky to me. It gave a imply that our built-in functions saved at a special
catalog and database, which is actually not. Introducing a dedicated keyword
like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
straightforward. One can argue that we should avoid introducing new keyword,
but it's also very rare that a system can overwrite built-in functions.
Since we
decided to support this, introduce a new keyword is not a big deal IMO.

Best,
Kurt


On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <pi...@ververica.com> wrote:

> Hi,
>
> It is a quite long discussion to follow and I hope I didn’t misunderstand
> anything. From the proposals presented by Xuefu I would vote:
>
> -1 for #1 and #2
> +1 for #3
>
> Besides #3 being IMO more general and more consistent, having qualified
> names (#3) would help/make easier for someone to use cross
> databases/catalogs queries (joining multiple data sets/streams). For
> example with some functions to manipulate/clean up/convert the stored data
> in different catalogs registered in the respective catalogs.
>
> Piotrek
>
> > On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> >
> > I agree with Xuefu that inconsistent handling with all the other objects
> is
> > not a big problem.
> >
> > Regarding to option#3, the special "system.system" namespace may confuse
> > users.
> > Users need to know the set of built-in function names to know when to use
> > "system.system" namespace.
> > What will happen if user registers a non-builtin function name under the
> > "system.system" namespace?
> > Besides, I think it doesn't solve the "explode" problem I mentioned at
> the
> > beginning of this thread.
> >
> > So here is my vote:
> >
> > +1 for #1
> > 0 for #2
> > -1 for #3
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
> >
> >> @Dawid, Re: we also don't need additional referencing the specialcatalog
> >> anywhere.
> >>
> >> True. But once we allow such reference, then user can do so in any
> possible
> >> place where a function name is expected, for which we have to handle.
> >> That's a big difference, I think.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> >> wysakowicz.dawid@gmail.com>
> >> wrote:
> >>
> >>> @Bowen I am not suggesting introducing additional catalog. I think we
> >> need
> >>> to get rid of the current built-in catalog.
> >>>
> >>> @Xuefu in option #3 we also don't need additional referencing the
> special
> >>> catalog anywhere else besides in the CREATE statement. The resolution
> >>> behaviour is exactly the same in both options.
> >>>
> >>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
> >>>
> >>>> Hi Dawid,
> >>>>
> >>>> "GLOBAL" is a temporary keyword that was given to the approach. It can
> >> be
> >>>> changed to something else for better.
> >>>>
> >>>> The difference between this and the #3 approach is that we only need
> >> the
> >>>> keyword for this create DDL. For other places (such as function
> >>>> referencing), no keyword or special namespace is needed.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> >>>> wysakowicz.dawid@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I think it makes sense to start voting at this point.
> >>>>>
> >>>>> Option 1: Only 1-part identifiers
> >>>>> PROS:
> >>>>> - allows shadowing built-in functions
> >>>>> CONS:
> >>>>> - incosistent with all the other objects, both permanent & temporary
> >>>>> - does not allow shadowing catalog functions
> >>>>>
> >>>>> Option 2: Special keyword for built-in function
> >>>>> I think this is quite similar to the special catalog/db. The thing I
> >> am
> >>>>> strongly against in this proposal is the GLOBAL keyword. This keyword
> >>>> has a
> >>>>> meaning in rdbms systems and means a function that is present for a
> >>>>> lifetime of a session in which it was created, but available in all
> >>> other
> >>>>> sessions. Therefore I really don't want to use this keyword in a
> >>>> different
> >>>>> context.
> >>>>>
> >>>>> Option 3: Special catalog/db
> >>>>>
> >>>>> PROS:
> >>>>> - allows shadowing built-in functions
> >>>>> - allows shadowing catalog functions
> >>>>> - consistent with other objects
> >>>>> CONS:
> >>>>> - we introduce a special namespace for built-in functions
> >>>>>
> >>>>> I don't see a problem with introducing the special namespace. In the
> >>> end
> >>>> it
> >>>>> is very similar to the keyword approach. In this case the catalog/db
> >>>>> combination would be the "keyword"
> >>>>>
> >>>>> Therefore my votes:
> >>>>> Option 1: -0
> >>>>> Option 2: -1 (I might change to +0 if we can come up with a better
> >>>> keyword)
> >>>>> Option 3: +1
> >>>>>
> >>>>> Best,
> >>>>> Dawid
> >>>>>
> >>>>>
> >>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Aljoscha,
> >>>>>>
> >>>>>> Thanks for the summary and these are great questions to be
> >> answered.
> >>>> The
> >>>>>> answer to your first question is clear: there is a general
> >> agreement
> >>> to
> >>>>>> override built-in functions with temp functions.
> >>>>>>
> >>>>>> However, your second and third questions are sort of related, as a
> >>>>> function
> >>>>>> reference can be either just function name (like "func") or in the
> >>> form
> >>>>> or
> >>>>>> "cat.db.func". When a reference is just function name, it can mean
> >>>>> either a
> >>>>>> built-in function or a function defined in the current cat/db. If
> >> we
> >>>>>> support overriding a built-in function with a temp function, such
> >>>>>> overriding can also cover a function in the current cat/db.
> >>>>>>
> >>>>>> I think what Timo referred as "overriding a catalog function"
> >> means a
> >>>>> temp
> >>>>>> function defined as "cat.db.func" overrides a catalog function
> >> "func"
> >>>> in
> >>>>>> cat/db even if cat/db is not current. To support this, temp
> >> function
> >>>> has
> >>>>> to
> >>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >>>>> questions
> >>>>>> are related. The problem with such support is the ambiguity when
> >> user
> >>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> >>> ...".
> >>>>>> Here "func" can means a global temp function, or a temp function in
> >>>>> current
> >>>>>> cat/db. If we can assume the former, this creates an inconsistency
> >>>>> because
> >>>>>> "CREATE FUNCTION func" actually means a function in current cat/db.
> >>> If
> >>>> we
> >>>>>> assume the latter, then there is no way for user to create a global
> >>>> temp
> >>>>>> function.
> >>>>>>
> >>>>>> Giving a special namespace for built-in functions may solve the
> >>>> ambiguity
> >>>>>> problem above, but it also introduces artificial catalog/database
> >>> that
> >>>>>> needs special treatment and pollutes the cleanness of  the code. I
> >>>> would
> >>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE
> >>>>>> [GLOBAL] TEMPORARY FUNCTION func".
> >>>>>>
> >>>>>> Thus, I'd like to summarize a few candidate proposals for voting
> >>>>> purposes:
> >>>>>>
> >>>>>> 1. Support only global, temporary functions without namespace. Such
> >>>> temp
> >>>>>> functions overrides built-in functions and catalog functions in
> >>> current
> >>>>>> cat/db. The resolution order is: temp functions -> built-in
> >> functions
> >>>> ->
> >>>>>> catalog functions. (Partially or fully qualified functions has no
> >>>>>> ambiguity!)
> >>>>>>
> >>>>>> 2. In addition to #1, support creating and referencing temporary
> >>>>> functions
> >>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global
> >>> temp
> >>>>>> functions. The resolution order is: global temp functions ->
> >> built-in
> >>>>>> functions -> temp functions in current cat/db -> catalog function.
> >>>>>> (Resolution for partially or fully qualified function reference is:
> >>>> temp
> >>>>>> functions -> persistent functions.)
> >>>>>>
> >>>>>> 3. In addition to #1, support creating and referencing temporary
> >>>>> functions
> >>>>>> associated with a cat/db with a special namespace for built-in
> >>>> functions
> >>>>>> and global temp functions. The resolution is the same as #2, except
> >>>> that
> >>>>>> the special namespace might be prefixed to a reference to a
> >> built-in
> >>>>>> function or global temp function. (In absence of the special
> >>> namespace,
> >>>>> the
> >>>>>> resolution order is the same as in #2.)
> >>>>>>
> >>>>>> My personal preference is #1, given the unknown use case and
> >>> introduced
> >>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative.
> >>>> Thus,
> >>>>>> my votes are:
> >>>>>>
> >>>>>> +1 for #1
> >>>>>> +0 for #2
> >>>>>> -1 for #3
> >>>>>>
> >>>>>> Everyone, please cast your vote (in above format please!), or let
> >> me
> >>>> know
> >>>>>> if you have more questions or other candidates.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Xuefu
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> >>> aljoscha@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I think this discussion and the one for FLIP-64 are very
> >> connected.
> >>>> To
> >>>>>>> resolve the differences, think we have to think about the basic
> >>>>>> principles
> >>>>>>> and find consensus there. The basic questions I see are:
> >>>>>>>
> >>>>>>> - Do we want to support overriding builtin functions?
> >>>>>>> - Do we want to support overriding catalog functions?
> >>>>>>> - And then later: should temporary functions be tied to a
> >>>>>>> catalog/database?
> >>>>>>>
> >>>>>>> I don’t have much to say about these, except that we should
> >>> somewhat
> >>>>>> stick
> >>>>>>> to what the industry does. But I also understand that the
> >> industry
> >>> is
> >>>>>>> already very divided on this.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Aljoscha
> >>>>>>>
> >>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
> >>> are
> >>>>>>> close to the truth. It will waste a lot of time if we resume the
> >>>> topic
> >>>>>> some
> >>>>>>> time later.
> >>>>>>>>
> >>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
> >>> “cat.db.fun”
> >>>>> way
> >>>>>>> to override a catalog function.
> >>>>>>>>
> >>>>>>>> I’m not sure about “system.system.fun”, it introduces a
> >>> nonexistent
> >>>>> cat
> >>>>>>> & db? And we still need to do special treatment for the dedicated
> >>>>>>> system.system cat & db?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >>>>>>>>>
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>> @Xuefu: I would like to avoid adding too many things
> >>>> incrementally.
> >>>>>>> Users should be able to override all catalog objects consistently
> >>>>>> according
> >>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
> >>>>> functions
> >>>>>>> are treated completely different, we need more code and special
> >>>> cases.
> >>>>>> From
> >>>>>>> an implementation perspective, this topic only affects the lookup
> >>>> logic
> >>>>>>> which is rather low implementation effort which is why I would
> >> like
> >>>> to
> >>>>>>> clarify the remaining items. As you said, we have a slight
> >> consenus
> >>>> on
> >>>>>>> overriding built-in functions; we should also strive for reaching
> >>>>>> consensus
> >>>>>>> on the remaining topics.
> >>>>>>>>>
> >>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
> >>> objects
> >>>>>>> consistent and the overriding of built-in functions more
> >> explicit.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 17.09.19 11:59, kai wang wrote:
> >>>>>>>>>> hi, everyone
> >>>>>>>>>> I think this flip is very meaningful. it supports functions
> >>> that
> >>>>> can
> >>>>>> be
> >>>>>>>>>> shared by different catalogs and dbs, reducing the
> >> duplication
> >>> of
> >>>>>>> functions.
> >>>>>>>>>>
> >>>>>>>>>> Our group based on flink's sql parser module implements
> >> create
> >>>>>> function
> >>>>>>>>>> feature, stores the parsed function metadata and schema into
> >>>> mysql,
> >>>>>> and
> >>>>>>>>>> also customizes the catalog, customizes sql-client to support
> >>>>> custom
> >>>>>>>>>> schemas and functions. Loaded, but the function is currently
> >>>>> global,
> >>>>>>> and is
> >>>>>>>>>> not subdivided according to catalog and db.
> >>>>>>>>>>
> >>>>>>>>>> In addition, I very much hope to participate in the
> >> development
> >>>> of
> >>>>>> this
> >>>>>>>>>> flip, I have been paying attention to the community, but
> >> found
> >>> it
> >>>>> is
> >>>>>>> more
> >>>>>>>>>> difficult to join.
> >>>>>>>>>> thank you.
> >>>>>>>>>>
> >>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
> >>>>>>>>>>>
> >>>>>>>>>>> It seems to me that there is a general consensus on having
> >>> temp
> >>>>>>> functions
> >>>>>>>>>>> that have no namespaces and overwrite built-in functions.
> >> (As
> >>> a
> >>>>> side
> >>>>>>> note
> >>>>>>>>>>> for comparability, the current user defined functions are
> >> all
> >>>>>>> temporary and
> >>>>>>>>>>> having no namespaces.)
> >>>>>>>>>>>
> >>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
> >>> temp
> >>>>>>> functions
> >>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
> >>>>> However,
> >>>>>>> this
> >>>>>>>>>>> idea appears orthogonal to the former and can be added
> >>>>>> incrementally.
> >>>>>>>>>>>
> >>>>>>>>>>> How about we first implement non-namespaced temp functions
> >> now
> >>>> and
> >>>>>>> leave
> >>>>>>>>>>> the door open for namespaced ones for later releases as the
> >>>>>>> requirement
> >>>>>>>>>>> might become more crystal? This also helps shorten the
> >> debate
> >>>> and
> >>>>>>> allow us
> >>>>>>>>>>> to make some progress along this direction.
> >>>>>>>>>>>
> >>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
> >>>>>> temporary
> >>>>>>> temp
> >>>>>>>>>>> functions that don't have namespaces, my only concern is the
> >>>>> special
> >>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
> >>> evident
> >>>> in
> >>>>>>> treating
> >>>>>>>>>>> the built-in catalog currently.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Xuefiu
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >>>>>>>>>>> wysakowicz.dawid@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
> >>> about
> >>>>> we
> >>>>>>> have a
> >>>>>>>>>>>> special namespace (catalog + database) for built-in
> >> objects?
> >>>> This
> >>>>>>> catalog
> >>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then users could still override built-in functions, if they
> >>>> fully
> >>>>>>> qualify
> >>>>>>>>>>>> object with the built-in namespace, but by default the
> >> common
> >>>>> logic
> >>>>>>> of
> >>>>>>>>>>>> current dB & cat would be used.
> >>>>>>>>>>>>
> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
> >>>>>>>>>>>> registers temporary function in current cat & dB
> >>>>>>>>>>>>
> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >>>>>>>>>>>> registers temporary function in cat db
> >>>>>>>>>>>>
> >>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >>>>>>>>>>>> Overrides built-in function with temporary function
> >>>>>>>>>>>>
> >>>>>>>>>>>> The built-in/system namespace would not be writable for
> >>>> permanent
> >>>>>>>>>>> objects.
> >>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>
> >>>>>>>>>>>> This way I think we can have benefits of both solutions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Dawid
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> >> twalthr@apache.org
> >>>>
> >>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I understand the potential benefit of overriding certain
> >>>>> built-in
> >>>>>>>>>>>>> functions. I'm open to such a feature if many people
> >> agree.
> >>>>>>> However, it
> >>>>>>>>>>>>> would be great to still support overriding catalog
> >> functions
> >>>>> with
> >>>>>>>>>>>>> temporary functions in order to prototype a query even
> >>> though
> >>>> a
> >>>>>>>>>>>>> catalog/database might not be available currently or
> >> should
> >>>> not
> >>>>> be
> >>>>>>>>>>>>> modified yet. How about we support both cases?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
> >>>>>>>>>>>>> -> creates/overrides a built-in function and never
> >>> consideres
> >>>>>>> current
> >>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
> >>>> acceptable
> >>>>>> for
> >>>>>>>>>>>>> functions I guess.
> >>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >>>>>>>>>>>>> -> creates/overrides a catalog function
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
> >>>> (tables,
> >>>>>>> views)
> >>>>>>>>>>>>> except functions", this might change in the near future.
> >>> Take
> >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> >>>>> example.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >>>>>>>>>>>>>> Hi Fabian,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
> >>> thus I
> >>>>>>> didn't
> >>>>>>>>>>>>>> include that as a voting option, and the discussion is
> >>> mainly
> >>>>>>> between
> >>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re > However, it means that temp functions are
> >> differently
> >>>>>> treated
> >>>>>>>>>>> than
> >>>>>>>>>>>>>> other db objects.
> >>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
> >>>>>> functions
> >>>>>>>>>>> are
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>> bit different from other objects - Flink don't have any
> >>> other
> >>>>>>>>>>> built-in
> >>>>>>>>>>>>>> objects (tables, views) except functions.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>
> >>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Xuefu Zhang
> >>>>>>
> >>>>>> "In Honey We Trust!"
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Xuefu Zhang
> >>>>
> >>>> "In Honey We Trust!"
> >>>>
> >>>
> >>
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

It is a quite long discussion to follow and I hope I didn’t misunderstand anything. From the proposals presented by Xuefu I would vote:

-1 for #1 and #2 
+1 for #3

Besides #3 being IMO more general and more consistent, having qualified names (#3) would help/make easier for someone to use cross databases/catalogs queries (joining multiple data sets/streams). For example with some functions to manipulate/clean up/convert the stored data in different catalogs registered in the respective catalogs.

Piotrek 

> On 19 Sep 2019, at 06:35, Jark Wu <im...@gmail.com> wrote:
> 
> I agree with Xuefu that inconsistent handling with all the other objects is
> not a big problem.
> 
> Regarding to option#3, the special "system.system" namespace may confuse
> users.
> Users need to know the set of built-in function names to know when to use
> "system.system" namespace.
> What will happen if user registers a non-builtin function name under the
> "system.system" namespace?
> Besides, I think it doesn't solve the "explode" problem I mentioned at the
> beginning of this thread.
> 
> So here is my vote:
> 
> +1 for #1
> 0 for #2
> -1 for #3
> 
> Best,
> Jark
> 
> 
> On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:
> 
>> @Dawid, Re: we also don't need additional referencing the specialcatalog
>> anywhere.
>> 
>> True. But once we allow such reference, then user can do so in any possible
>> place where a function name is expected, for which we have to handle.
>> That's a big difference, I think.
>> 
>> Thanks,
>> Xuefu
>> 
>> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
>> wysakowicz.dawid@gmail.com>
>> wrote:
>> 
>>> @Bowen I am not suggesting introducing additional catalog. I think we
>> need
>>> to get rid of the current built-in catalog.
>>> 
>>> @Xuefu in option #3 we also don't need additional referencing the special
>>> catalog anywhere else besides in the CREATE statement. The resolution
>>> behaviour is exactly the same in both options.
>>> 
>>> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>>> 
>>>> Hi Dawid,
>>>> 
>>>> "GLOBAL" is a temporary keyword that was given to the approach. It can
>> be
>>>> changed to something else for better.
>>>> 
>>>> The difference between this and the #3 approach is that we only need
>> the
>>>> keyword for this create DDL. For other places (such as function
>>>> referencing), no keyword or special namespace is needed.
>>>> 
>>>> Thanks,
>>>> Xuefu
>>>> 
>>>> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
>>>> wysakowicz.dawid@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> I think it makes sense to start voting at this point.
>>>>> 
>>>>> Option 1: Only 1-part identifiers
>>>>> PROS:
>>>>> - allows shadowing built-in functions
>>>>> CONS:
>>>>> - incosistent with all the other objects, both permanent & temporary
>>>>> - does not allow shadowing catalog functions
>>>>> 
>>>>> Option 2: Special keyword for built-in function
>>>>> I think this is quite similar to the special catalog/db. The thing I
>> am
>>>>> strongly against in this proposal is the GLOBAL keyword. This keyword
>>>> has a
>>>>> meaning in rdbms systems and means a function that is present for a
>>>>> lifetime of a session in which it was created, but available in all
>>> other
>>>>> sessions. Therefore I really don't want to use this keyword in a
>>>> different
>>>>> context.
>>>>> 
>>>>> Option 3: Special catalog/db
>>>>> 
>>>>> PROS:
>>>>> - allows shadowing built-in functions
>>>>> - allows shadowing catalog functions
>>>>> - consistent with other objects
>>>>> CONS:
>>>>> - we introduce a special namespace for built-in functions
>>>>> 
>>>>> I don't see a problem with introducing the special namespace. In the
>>> end
>>>> it
>>>>> is very similar to the keyword approach. In this case the catalog/db
>>>>> combination would be the "keyword"
>>>>> 
>>>>> Therefore my votes:
>>>>> Option 1: -0
>>>>> Option 2: -1 (I might change to +0 if we can come up with a better
>>>> keyword)
>>>>> Option 3: +1
>>>>> 
>>>>> Best,
>>>>> Dawid
>>>>> 
>>>>> 
>>>>> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>>>>> 
>>>>>> Hi Aljoscha,
>>>>>> 
>>>>>> Thanks for the summary and these are great questions to be
>> answered.
>>>> The
>>>>>> answer to your first question is clear: there is a general
>> agreement
>>> to
>>>>>> override built-in functions with temp functions.
>>>>>> 
>>>>>> However, your second and third questions are sort of related, as a
>>>>> function
>>>>>> reference can be either just function name (like "func") or in the
>>> form
>>>>> or
>>>>>> "cat.db.func". When a reference is just function name, it can mean
>>>>> either a
>>>>>> built-in function or a function defined in the current cat/db. If
>> we
>>>>>> support overriding a built-in function with a temp function, such
>>>>>> overriding can also cover a function in the current cat/db.
>>>>>> 
>>>>>> I think what Timo referred as "overriding a catalog function"
>> means a
>>>>> temp
>>>>>> function defined as "cat.db.func" overrides a catalog function
>> "func"
>>>> in
>>>>>> cat/db even if cat/db is not current. To support this, temp
>> function
>>>> has
>>>>> to
>>>>>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>>>>> questions
>>>>>> are related. The problem with such support is the ambiguity when
>> user
>>>>>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
>>> ...".
>>>>>> Here "func" can means a global temp function, or a temp function in
>>>>> current
>>>>>> cat/db. If we can assume the former, this creates an inconsistency
>>>>> because
>>>>>> "CREATE FUNCTION func" actually means a function in current cat/db.
>>> If
>>>> we
>>>>>> assume the latter, then there is no way for user to create a global
>>>> temp
>>>>>> function.
>>>>>> 
>>>>>> Giving a special namespace for built-in functions may solve the
>>>> ambiguity
>>>>>> problem above, but it also introduces artificial catalog/database
>>> that
>>>>>> needs special treatment and pollutes the cleanness of  the code. I
>>>> would
>>>>>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>>>>>> [GLOBAL] TEMPORARY FUNCTION func".
>>>>>> 
>>>>>> Thus, I'd like to summarize a few candidate proposals for voting
>>>>> purposes:
>>>>>> 
>>>>>> 1. Support only global, temporary functions without namespace. Such
>>>> temp
>>>>>> functions overrides built-in functions and catalog functions in
>>> current
>>>>>> cat/db. The resolution order is: temp functions -> built-in
>> functions
>>>> ->
>>>>>> catalog functions. (Partially or fully qualified functions has no
>>>>>> ambiguity!)
>>>>>> 
>>>>>> 2. In addition to #1, support creating and referencing temporary
>>>>> functions
>>>>>> associated with a cat/db with "GLOBAL" qualifier in DDL for global
>>> temp
>>>>>> functions. The resolution order is: global temp functions ->
>> built-in
>>>>>> functions -> temp functions in current cat/db -> catalog function.
>>>>>> (Resolution for partially or fully qualified function reference is:
>>>> temp
>>>>>> functions -> persistent functions.)
>>>>>> 
>>>>>> 3. In addition to #1, support creating and referencing temporary
>>>>> functions
>>>>>> associated with a cat/db with a special namespace for built-in
>>>> functions
>>>>>> and global temp functions. The resolution is the same as #2, except
>>>> that
>>>>>> the special namespace might be prefixed to a reference to a
>> built-in
>>>>>> function or global temp function. (In absence of the special
>>> namespace,
>>>>> the
>>>>>> resolution order is the same as in #2.)
>>>>>> 
>>>>>> My personal preference is #1, given the unknown use case and
>>> introduced
>>>>>> complexity for #2 and #3. However, #2 is an acceptable alternative.
>>>> Thus,
>>>>>> my votes are:
>>>>>> 
>>>>>> +1 for #1
>>>>>> +0 for #2
>>>>>> -1 for #3
>>>>>> 
>>>>>> Everyone, please cast your vote (in above format please!), or let
>> me
>>>> know
>>>>>> if you have more questions or other candidates.
>>>>>> 
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
>>> aljoscha@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I think this discussion and the one for FLIP-64 are very
>> connected.
>>>> To
>>>>>>> resolve the differences, think we have to think about the basic
>>>>>> principles
>>>>>>> and find consensus there. The basic questions I see are:
>>>>>>> 
>>>>>>> - Do we want to support overriding builtin functions?
>>>>>>> - Do we want to support overriding catalog functions?
>>>>>>> - And then later: should temporary functions be tied to a
>>>>>>> catalog/database?
>>>>>>> 
>>>>>>> I don’t have much to say about these, except that we should
>>> somewhat
>>>>>> stick
>>>>>>> to what the industry does. But I also understand that the
>> industry
>>> is
>>>>>>> already very divided on this.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Aljoscha
>>>>>>> 
>>>>>>>> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> +1 to strive for reaching consensus on the remaining topics. We
>>> are
>>>>>>> close to the truth. It will waste a lot of time if we resume the
>>>> topic
>>>>>> some
>>>>>>> time later.
>>>>>>>> 
>>>>>>>> +1 to “1-part/override” and I’m also fine with Timo’s
>>> “cat.db.fun”
>>>>> way
>>>>>>> to override a catalog function.
>>>>>>>> 
>>>>>>>> I’m not sure about “system.system.fun”, it introduces a
>>> nonexistent
>>>>> cat
>>>>>>> & db? And we still need to do special treatment for the dedicated
>>>>>>> system.system cat & db?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>>>>>>>>> 
>>>>>>>>> Hi everyone,
>>>>>>>>> 
>>>>>>>>> @Xuefu: I would like to avoid adding too many things
>>>> incrementally.
>>>>>>> Users should be able to override all catalog objects consistently
>>>>>> according
>>>>>>> to FLIP-64 (Support for Temporary Objects in Table module). If
>>>>> functions
>>>>>>> are treated completely different, we need more code and special
>>>> cases.
>>>>>> From
>>>>>>> an implementation perspective, this topic only affects the lookup
>>>> logic
>>>>>>> which is rather low implementation effort which is why I would
>> like
>>>> to
>>>>>>> clarify the remaining items. As you said, we have a slight
>> consenus
>>>> on
>>>>>>> overriding built-in functions; we should also strive for reaching
>>>>>> consensus
>>>>>>> on the remaining topics.
>>>>>>>>> 
>>>>>>>>> @Dawid: I like your idea as it ensures registering catalog
>>> objects
>>>>>>> consistent and the overriding of built-in functions more
>> explicit.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Timo
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 17.09.19 11:59, kai wang wrote:
>>>>>>>>>> hi, everyone
>>>>>>>>>> I think this flip is very meaningful. it supports functions
>>> that
>>>>> can
>>>>>> be
>>>>>>>>>> shared by different catalogs and dbs, reducing the
>> duplication
>>> of
>>>>>>> functions.
>>>>>>>>>> 
>>>>>>>>>> Our group based on flink's sql parser module implements
>> create
>>>>>> function
>>>>>>>>>> feature, stores the parsed function metadata and schema into
>>>> mysql,
>>>>>> and
>>>>>>>>>> also customizes the catalog, customizes sql-client to support
>>>>> custom
>>>>>>>>>> schemas and functions. Loaded, but the function is currently
>>>>> global,
>>>>>>> and is
>>>>>>>>>> not subdivided according to catalog and db.
>>>>>>>>>> 
>>>>>>>>>> In addition, I very much hope to participate in the
>> development
>>>> of
>>>>>> this
>>>>>>>>>> flip, I have been paying attention to the community, but
>> found
>>> it
>>>>> is
>>>>>>> more
>>>>>>>>>> difficult to join.
>>>>>>>>>> thank you.
>>>>>>>>>> 
>>>>>>>>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>>>>>>>>>> 
>>>>>>>>>>> Thanks to Tmo and Dawid for sharing thoughts.
>>>>>>>>>>> 
>>>>>>>>>>> It seems to me that there is a general consensus on having
>>> temp
>>>>>>> functions
>>>>>>>>>>> that have no namespaces and overwrite built-in functions.
>> (As
>>> a
>>>>> side
>>>>>>> note
>>>>>>>>>>> for comparability, the current user defined functions are
>> all
>>>>>>> temporary and
>>>>>>>>>>> having no namespaces.)
>>>>>>>>>>> 
>>>>>>>>>>> Nevertheless, I can also see the merit of having namespaced
>>> temp
>>>>>>> functions
>>>>>>>>>>> that can overwrite functions defined in a specific cat/db.
>>>>> However,
>>>>>>> this
>>>>>>>>>>> idea appears orthogonal to the former and can be added
>>>>>> incrementally.
>>>>>>>>>>> 
>>>>>>>>>>> How about we first implement non-namespaced temp functions
>> now
>>>> and
>>>>>>> leave
>>>>>>>>>>> the door open for namespaced ones for later releases as the
>>>>>>> requirement
>>>>>>>>>>> might become more crystal? This also helps shorten the
>> debate
>>>> and
>>>>>>> allow us
>>>>>>>>>>> to make some progress along this direction.
>>>>>>>>>>> 
>>>>>>>>>>> As to Dawid's idea of having a dedicated cat/db to host the
>>>>>> temporary
>>>>>>> temp
>>>>>>>>>>> functions that don't have namespaces, my only concern is the
>>>>> special
>>>>>>>>>>> treatment for a cat/db, which makes code less clean, as
>>> evident
>>>> in
>>>>>>> treating
>>>>>>>>>>> the built-in catalog currently.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Xuefiu
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>>>>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Another idea to consider on top of Timo's suggestion. How
>>> about
>>>>> we
>>>>>>> have a
>>>>>>>>>>>> special namespace (catalog + database) for built-in
>> objects?
>>>> This
>>>>>>> catalog
>>>>>>>>>>>> would be invisible for users as Xuefu was suggesting.
>>>>>>>>>>>> 
>>>>>>>>>>>> Then users could still override built-in functions, if they
>>>> fully
>>>>>>> qualify
>>>>>>>>>>>> object with the built-in namespace, but by default the
>> common
>>>>> logic
>>>>>>> of
>>>>>>>>>>>> current dB & cat would be used.
>>>>>>>>>>>> 
>>>>>>>>>>>> CREATE TEMPORARY FUNCTION func ...
>>>>>>>>>>>> registers temporary function in current cat & dB
>>>>>>>>>>>> 
>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>>>>>>>>>> registers temporary function in cat db
>>>>>>>>>>>> 
>>>>>>>>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>>>>>>>>>> Overrides built-in function with temporary function
>>>>>>>>>>>> 
>>>>>>>>>>>> The built-in/system namespace would not be writable for
>>>> permanent
>>>>>>>>>>> objects.
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>> 
>>>>>>>>>>>> This way I think we can have benefits of both solutions.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Dawid
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
>> twalthr@apache.org
>>>> 
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I understand the potential benefit of overriding certain
>>>>> built-in
>>>>>>>>>>>>> functions. I'm open to such a feature if many people
>> agree.
>>>>>>> However, it
>>>>>>>>>>>>> would be great to still support overriding catalog
>> functions
>>>>> with
>>>>>>>>>>>>> temporary functions in order to prototype a query even
>>> though
>>>> a
>>>>>>>>>>>>> catalog/database might not be available currently or
>> should
>>>> not
>>>>> be
>>>>>>>>>>>>> modified yet. How about we support both cases?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION abs
>>>>>>>>>>>>> -> creates/overrides a built-in function and never
>>> consideres
>>>>>>> current
>>>>>>>>>>>>> catalog and database; inconsistent with other DDL but
>>>> acceptable
>>>>>> for
>>>>>>>>>>>>> functions I guess.
>>>>>>>>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>>>>>>>>>>> -> creates/overrides a catalog function
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding "Flink don't have any other built-in objects
>>>> (tables,
>>>>>>> views)
>>>>>>>>>>>>> except functions", this might change in the near future.
>>> Take
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
>>>>> example.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I agree 1-part/no-override is the least favorable
>>> thus I
>>>>>>> didn't
>>>>>>>>>>>>>> include that as a voting option, and the discussion is
>>> mainly
>>>>>>> between
>>>>>>>>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Re > However, it means that temp functions are
>> differently
>>>>>> treated
>>>>>>>>>>> than
>>>>>>>>>>>>>> other db objects.
>>>>>>>>>>>>>> IMO, the treatment difference results from the fact that
>>>>>> functions
>>>>>>>>>>> are
>>>>>>>>>>>> a
>>>>>>>>>>>>>> bit different from other objects - Flink don't have any
>>> other
>>>>>>>>>>> built-in
>>>>>>>>>>>>>> objects (tables, views) except functions.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>> 
>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Xuefu Zhang
>>>>>> 
>>>>>> "In Honey We Trust!"
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Xuefu Zhang
>>>> 
>>>> "In Honey We Trust!"
>>>> 
>>> 
>> 
>> 
>> --
>> Xuefu Zhang
>> 
>> "In Honey We Trust!"
>> 


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Jark Wu <im...@gmail.com>.
I agree with Xuefu that inconsistent handling with all the other objects is
not a big problem.

Regarding to option#3, the special "system.system" namespace may confuse
users.
Users need to know the set of built-in function names to know when to use
"system.system" namespace.
What will happen if user registers a non-builtin function name under the
"system.system" namespace?
Besides, I think it doesn't solve the "explode" problem I mentioned at the
beginning of this thread.

So here is my vote:

+1 for #1
0 for #2
-1 for #3

Best,
Jark


On Thu, 19 Sep 2019 at 08:38, Xuefu Z <us...@gmail.com> wrote:

> @Dawid, Re: we also don't need additional referencing the specialcatalog
> anywhere.
>
> True. But once we allow such reference, then user can do so in any possible
> place where a function name is expected, for which we have to handle.
> That's a big difference, I think.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> wysakowicz.dawid@gmail.com>
> wrote:
>
> > @Bowen I am not suggesting introducing additional catalog. I think we
> need
> > to get rid of the current built-in catalog.
> >
> > @Xuefu in option #3 we also don't need additional referencing the special
> > catalog anywhere else besides in the CREATE statement. The resolution
> > behaviour is exactly the same in both options.
> >
> > On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
> >
> > > Hi Dawid,
> > >
> > > "GLOBAL" is a temporary keyword that was given to the approach. It can
> be
> > > changed to something else for better.
> > >
> > > The difference between this and the #3 approach is that we only need
> the
> > > keyword for this create DDL. For other places (such as function
> > > referencing), no keyword or special namespace is needed.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > wysakowicz.dawid@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > I think it makes sense to start voting at this point.
> > > >
> > > > Option 1: Only 1-part identifiers
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > CONS:
> > > > - incosistent with all the other objects, both permanent & temporary
> > > > - does not allow shadowing catalog functions
> > > >
> > > > Option 2: Special keyword for built-in function
> > > > I think this is quite similar to the special catalog/db. The thing I
> am
> > > > strongly against in this proposal is the GLOBAL keyword. This keyword
> > > has a
> > > > meaning in rdbms systems and means a function that is present for a
> > > > lifetime of a session in which it was created, but available in all
> > other
> > > > sessions. Therefore I really don't want to use this keyword in a
> > > different
> > > > context.
> > > >
> > > > Option 3: Special catalog/db
> > > >
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > - allows shadowing catalog functions
> > > > - consistent with other objects
> > > > CONS:
> > > > - we introduce a special namespace for built-in functions
> > > >
> > > > I don't see a problem with introducing the special namespace. In the
> > end
> > > it
> > > > is very similar to the keyword approach. In this case the catalog/db
> > > > combination would be the "keyword"
> > > >
> > > > Therefore my votes:
> > > > Option 1: -0
> > > > Option 2: -1 (I might change to +0 if we can come up with a better
> > > keyword)
> > > > Option 3: +1
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > >
> > > > On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> > > >
> > > > > Hi Aljoscha,
> > > > >
> > > > > Thanks for the summary and these are great questions to be
> answered.
> > > The
> > > > > answer to your first question is clear: there is a general
> agreement
> > to
> > > > > override built-in functions with temp functions.
> > > > >
> > > > > However, your second and third questions are sort of related, as a
> > > > function
> > > > > reference can be either just function name (like "func") or in the
> > form
> > > > or
> > > > > "cat.db.func". When a reference is just function name, it can mean
> > > > either a
> > > > > built-in function or a function defined in the current cat/db. If
> we
> > > > > support overriding a built-in function with a temp function, such
> > > > > overriding can also cover a function in the current cat/db.
> > > > >
> > > > > I think what Timo referred as "overriding a catalog function"
> means a
> > > > temp
> > > > > function defined as "cat.db.func" overrides a catalog function
> "func"
> > > in
> > > > > cat/db even if cat/db is not current. To support this, temp
> function
> > > has
> > > > to
> > > > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > > > questions
> > > > > are related. The problem with such support is the ambiguity when
> user
> > > > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> > ...".
> > > > > Here "func" can means a global temp function, or a temp function in
> > > > current
> > > > > cat/db. If we can assume the former, this creates an inconsistency
> > > > because
> > > > > "CREATE FUNCTION func" actually means a function in current cat/db.
> > If
> > > we
> > > > > assume the latter, then there is no way for user to create a global
> > > temp
> > > > > function.
> > > > >
> > > > > Giving a special namespace for built-in functions may solve the
> > > ambiguity
> > > > > problem above, but it also introduces artificial catalog/database
> > that
> > > > > needs special treatment and pollutes the cleanness of  the code. I
> > > would
> > > > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > > > [GLOBAL] TEMPORARY FUNCTION func".
> > > > >
> > > > > Thus, I'd like to summarize a few candidate proposals for voting
> > > > purposes:
> > > > >
> > > > > 1. Support only global, temporary functions without namespace. Such
> > > temp
> > > > > functions overrides built-in functions and catalog functions in
> > current
> > > > > cat/db. The resolution order is: temp functions -> built-in
> functions
> > > ->
> > > > > catalog functions. (Partially or fully qualified functions has no
> > > > > ambiguity!)
> > > > >
> > > > > 2. In addition to #1, support creating and referencing temporary
> > > > functions
> > > > > associated with a cat/db with "GLOBAL" qualifier in DDL for global
> > temp
> > > > > functions. The resolution order is: global temp functions ->
> built-in
> > > > > functions -> temp functions in current cat/db -> catalog function.
> > > > > (Resolution for partially or fully qualified function reference is:
> > > temp
> > > > > functions -> persistent functions.)
> > > > >
> > > > > 3. In addition to #1, support creating and referencing temporary
> > > > functions
> > > > > associated with a cat/db with a special namespace for built-in
> > > functions
> > > > > and global temp functions. The resolution is the same as #2, except
> > > that
> > > > > the special namespace might be prefixed to a reference to a
> built-in
> > > > > function or global temp function. (In absence of the special
> > namespace,
> > > > the
> > > > > resolution order is the same as in #2.)
> > > > >
> > > > > My personal preference is #1, given the unknown use case and
> > introduced
> > > > > complexity for #2 and #3. However, #2 is an acceptable alternative.
> > > Thus,
> > > > > my votes are:
> > > > >
> > > > > +1 for #1
> > > > > +0 for #2
> > > > > -1 for #3
> > > > >
> > > > > Everyone, please cast your vote (in above format please!), or let
> me
> > > know
> > > > > if you have more questions or other candidates.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > aljoscha@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I think this discussion and the one for FLIP-64 are very
> connected.
> > > To
> > > > > > resolve the differences, think we have to think about the basic
> > > > > principles
> > > > > > and find consensus there. The basic questions I see are:
> > > > > >
> > > > > >  - Do we want to support overriding builtin functions?
> > > > > >  - Do we want to support overriding catalog functions?
> > > > > >  - And then later: should temporary functions be tied to a
> > > > > > catalog/database?
> > > > > >
> > > > > > I don’t have much to say about these, except that we should
> > somewhat
> > > > > stick
> > > > > > to what the industry does. But I also understand that the
> industry
> > is
> > > > > > already very divided on this.
> > > > > >
> > > > > > Best,
> > > > > > Aljoscha
> > > > > >
> > > > > > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > +1 to strive for reaching consensus on the remaining topics. We
> > are
> > > > > > close to the truth. It will waste a lot of time if we resume the
> > > topic
> > > > > some
> > > > > > time later.
> > > > > > >
> > > > > > > +1 to “1-part/override” and I’m also fine with Timo’s
> > “cat.db.fun”
> > > > way
> > > > > > to override a catalog function.
> > > > > > >
> > > > > > > I’m not sure about “system.system.fun”, it introduces a
> > nonexistent
> > > > cat
> > > > > > & db? And we still need to do special treatment for the dedicated
> > > > > > system.system cat & db?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > >
> > > > > > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > > > > > >>
> > > > > > >> Hi everyone,
> > > > > > >>
> > > > > > >> @Xuefu: I would like to avoid adding too many things
> > > incrementally.
> > > > > > Users should be able to override all catalog objects consistently
> > > > > according
> > > > > > to FLIP-64 (Support for Temporary Objects in Table module). If
> > > > functions
> > > > > > are treated completely different, we need more code and special
> > > cases.
> > > > > From
> > > > > > an implementation perspective, this topic only affects the lookup
> > > logic
> > > > > > which is rather low implementation effort which is why I would
> like
> > > to
> > > > > > clarify the remaining items. As you said, we have a slight
> consenus
> > > on
> > > > > > overriding built-in functions; we should also strive for reaching
> > > > > consensus
> > > > > > on the remaining topics.
> > > > > > >>
> > > > > > >> @Dawid: I like your idea as it ensures registering catalog
> > objects
> > > > > > consistent and the overriding of built-in functions more
> explicit.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Timo
> > > > > > >>
> > > > > > >>
> > > > > > >> On 17.09.19 11:59, kai wang wrote:
> > > > > > >>> hi, everyone
> > > > > > >>> I think this flip is very meaningful. it supports functions
> > that
> > > > can
> > > > > be
> > > > > > >>> shared by different catalogs and dbs, reducing the
> duplication
> > of
> > > > > > functions.
> > > > > > >>>
> > > > > > >>> Our group based on flink's sql parser module implements
> create
> > > > > function
> > > > > > >>> feature, stores the parsed function metadata and schema into
> > > mysql,
> > > > > and
> > > > > > >>> also customizes the catalog, customizes sql-client to support
> > > > custom
> > > > > > >>> schemas and functions. Loaded, but the function is currently
> > > > global,
> > > > > > and is
> > > > > > >>> not subdivided according to catalog and db.
> > > > > > >>>
> > > > > > >>> In addition, I very much hope to participate in the
> development
> > > of
> > > > > this
> > > > > > >>> flip, I have been paying attention to the community, but
> found
> > it
> > > > is
> > > > > > more
> > > > > > >>> difficult to join.
> > > > > > >>> thank you.
> > > > > > >>>
> > > > > > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > > > > > >>>
> > > > > > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > > > > >>>>
> > > > > > >>>> It seems to me that there is a general consensus on having
> > temp
> > > > > > functions
> > > > > > >>>> that have no namespaces and overwrite built-in functions.
> (As
> > a
> > > > side
> > > > > > note
> > > > > > >>>> for comparability, the current user defined functions are
> all
> > > > > > temporary and
> > > > > > >>>> having no namespaces.)
> > > > > > >>>>
> > > > > > >>>> Nevertheless, I can also see the merit of having namespaced
> > temp
> > > > > > functions
> > > > > > >>>> that can overwrite functions defined in a specific cat/db.
> > > > However,
> > > > > > this
> > > > > > >>>> idea appears orthogonal to the former and can be added
> > > > > incrementally.
> > > > > > >>>>
> > > > > > >>>> How about we first implement non-namespaced temp functions
> now
> > > and
> > > > > > leave
> > > > > > >>>> the door open for namespaced ones for later releases as the
> > > > > > requirement
> > > > > > >>>> might become more crystal? This also helps shorten the
> debate
> > > and
> > > > > > allow us
> > > > > > >>>> to make some progress along this direction.
> > > > > > >>>>
> > > > > > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> > > > > temporary
> > > > > > temp
> > > > > > >>>> functions that don't have namespaces, my only concern is the
> > > > special
> > > > > > >>>> treatment for a cat/db, which makes code less clean, as
> > evident
> > > in
> > > > > > treating
> > > > > > >>>> the built-in catalog currently.
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>> Xuefiu
> > > > > > >>>>
> > > > > > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > > > > >>>> wysakowicz.dawid@gmail.com>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi,
> > > > > > >>>>> Another idea to consider on top of Timo's suggestion. How
> > about
> > > > we
> > > > > > have a
> > > > > > >>>>> special namespace (catalog + database) for built-in
> objects?
> > > This
> > > > > > catalog
> > > > > > >>>>> would be invisible for users as Xuefu was suggesting.
> > > > > > >>>>>
> > > > > > >>>>> Then users could still override built-in functions, if they
> > > fully
> > > > > > qualify
> > > > > > >>>>> object with the built-in namespace, but by default the
> common
> > > > logic
> > > > > > of
> > > > > > >>>>> current dB & cat would be used.
> > > > > > >>>>>
> > > > > > >>>>> CREATE TEMPORARY FUNCTION func ...
> > > > > > >>>>> registers temporary function in current cat & dB
> > > > > > >>>>>
> > > > > > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > > > > >>>>> registers temporary function in cat db
> > > > > > >>>>>
> > > > > > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > > > > >>>>> Overrides built-in function with temporary function
> > > > > > >>>>>
> > > > > > >>>>> The built-in/system namespace would not be writable for
> > > permanent
> > > > > > >>>> objects.
> > > > > > >>>>> WDYT?
> > > > > > >>>>>
> > > > > > >>>>> This way I think we can have benefits of both solutions.
> > > > > > >>>>>
> > > > > > >>>>> Best,
> > > > > > >>>>> Dawid
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <
> twalthr@apache.org
> > >
> > > > > wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Hi Bowen,
> > > > > > >>>>>>
> > > > > > >>>>>> I understand the potential benefit of overriding certain
> > > > built-in
> > > > > > >>>>>> functions. I'm open to such a feature if many people
> agree.
> > > > > > However, it
> > > > > > >>>>>> would be great to still support overriding catalog
> functions
> > > > with
> > > > > > >>>>>> temporary functions in order to prototype a query even
> > though
> > > a
> > > > > > >>>>>> catalog/database might not be available currently or
> should
> > > not
> > > > be
> > > > > > >>>>>> modified yet. How about we support both cases?
> > > > > > >>>>>>
> > > > > > >>>>>> CREATE TEMPORARY FUNCTION abs
> > > > > > >>>>>> -> creates/overrides a built-in function and never
> > consideres
> > > > > > current
> > > > > > >>>>>> catalog and database; inconsistent with other DDL but
> > > acceptable
> > > > > for
> > > > > > >>>>>> functions I guess.
> > > > > > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > > > > >>>>>> -> creates/overrides a catalog function
> > > > > > >>>>>>
> > > > > > >>>>>> Regarding "Flink don't have any other built-in objects
> > > (tables,
> > > > > > views)
> > > > > > >>>>>> except functions", this might change in the near future.
> > Take
> > > > > > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> > > > example.
> > > > > > >>>>>>
> > > > > > >>>>>> Thanks,
> > > > > > >>>>>> Timo
> > > > > > >>>>>>
> > > > > > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > > > > >>>>>>> Hi Fabian,
> > > > > > >>>>>>>
> > > > > > >>>>>>> Yes, I agree 1-part/no-override is the least favorable
> > thus I
> > > > > > didn't
> > > > > > >>>>>>> include that as a voting option, and the discussion is
> > mainly
> > > > > > between
> > > > > > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Re > However, it means that temp functions are
> differently
> > > > > treated
> > > > > > >>>> than
> > > > > > >>>>>>> other db objects.
> > > > > > >>>>>>> IMO, the treatment difference results from the fact that
> > > > > functions
> > > > > > >>>> are
> > > > > > >>>>> a
> > > > > > >>>>>>> bit different from other objects - Flink don't have any
> > other
> > > > > > >>>> built-in
> > > > > > >>>>>>> objects (tables, views) except functions.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Cheers,
> > > > > > >>>>>>> Bowen
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>
> > > > > > >>>> --
> > > > > > >>>> Xuefu Zhang
> > > > > > >>>>
> > > > > > >>>> "In Honey We Trust!"
> > > > > > >>>>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Xuefu Zhang
> > > > >
> > > > > "In Honey We Trust!"
> > > > >
> > > >
> > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
@Dawid, Re: we also don't need additional referencing the specialcatalog
anywhere.

True. But once we allow such reference, then user can do so in any possible
place where a function name is expected, for which we have to handle.
That's a big difference, I think.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <wy...@gmail.com>
wrote:

> @Bowen I am not suggesting introducing additional catalog. I think we need
> to get rid of the current built-in catalog.
>
> @Xuefu in option #3 we also don't need additional referencing the special
> catalog anywhere else besides in the CREATE statement. The resolution
> behaviour is exactly the same in both options.
>
> On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:
>
> > Hi Dawid,
> >
> > "GLOBAL" is a temporary keyword that was given to the approach. It can be
> > changed to something else for better.
> >
> > The difference between this and the #3 approach is that we only need the
> > keyword for this create DDL. For other places (such as function
> > referencing), no keyword or special namespace is needed.
> >
> > Thanks,
> > Xuefu
> >
> > On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > wysakowicz.dawid@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I think it makes sense to start voting at this point.
> > >
> > > Option 1: Only 1-part identifiers
> > > PROS:
> > > - allows shadowing built-in functions
> > > CONS:
> > > - incosistent with all the other objects, both permanent & temporary
> > > - does not allow shadowing catalog functions
> > >
> > > Option 2: Special keyword for built-in function
> > > I think this is quite similar to the special catalog/db. The thing I am
> > > strongly against in this proposal is the GLOBAL keyword. This keyword
> > has a
> > > meaning in rdbms systems and means a function that is present for a
> > > lifetime of a session in which it was created, but available in all
> other
> > > sessions. Therefore I really don't want to use this keyword in a
> > different
> > > context.
> > >
> > > Option 3: Special catalog/db
> > >
> > > PROS:
> > > - allows shadowing built-in functions
> > > - allows shadowing catalog functions
> > > - consistent with other objects
> > > CONS:
> > > - we introduce a special namespace for built-in functions
> > >
> > > I don't see a problem with introducing the special namespace. In the
> end
> > it
> > > is very similar to the keyword approach. In this case the catalog/db
> > > combination would be the "keyword"
> > >
> > > Therefore my votes:
> > > Option 1: -0
> > > Option 2: -1 (I might change to +0 if we can come up with a better
> > keyword)
> > > Option 3: +1
> > >
> > > Best,
> > > Dawid
> > >
> > >
> > > On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> > >
> > > > Hi Aljoscha,
> > > >
> > > > Thanks for the summary and these are great questions to be answered.
> > The
> > > > answer to your first question is clear: there is a general agreement
> to
> > > > override built-in functions with temp functions.
> > > >
> > > > However, your second and third questions are sort of related, as a
> > > function
> > > > reference can be either just function name (like "func") or in the
> form
> > > or
> > > > "cat.db.func". When a reference is just function name, it can mean
> > > either a
> > > > built-in function or a function defined in the current cat/db. If we
> > > > support overriding a built-in function with a temp function, such
> > > > overriding can also cover a function in the current cat/db.
> > > >
> > > > I think what Timo referred as "overriding a catalog function" means a
> > > temp
> > > > function defined as "cat.db.func" overrides a catalog function "func"
> > in
> > > > cat/db even if cat/db is not current. To support this, temp function
> > has
> > > to
> > > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > > questions
> > > > are related. The problem with such support is the ambiguity when user
> > > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> ...".
> > > > Here "func" can means a global temp function, or a temp function in
> > > current
> > > > cat/db. If we can assume the former, this creates an inconsistency
> > > because
> > > > "CREATE FUNCTION func" actually means a function in current cat/db.
> If
> > we
> > > > assume the latter, then there is no way for user to create a global
> > temp
> > > > function.
> > > >
> > > > Giving a special namespace for built-in functions may solve the
> > ambiguity
> > > > problem above, but it also introduces artificial catalog/database
> that
> > > > needs special treatment and pollutes the cleanness of  the code. I
> > would
> > > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > > [GLOBAL] TEMPORARY FUNCTION func".
> > > >
> > > > Thus, I'd like to summarize a few candidate proposals for voting
> > > purposes:
> > > >
> > > > 1. Support only global, temporary functions without namespace. Such
> > temp
> > > > functions overrides built-in functions and catalog functions in
> current
> > > > cat/db. The resolution order is: temp functions -> built-in functions
> > ->
> > > > catalog functions. (Partially or fully qualified functions has no
> > > > ambiguity!)
> > > >
> > > > 2. In addition to #1, support creating and referencing temporary
> > > functions
> > > > associated with a cat/db with "GLOBAL" qualifier in DDL for global
> temp
> > > > functions. The resolution order is: global temp functions -> built-in
> > > > functions -> temp functions in current cat/db -> catalog function.
> > > > (Resolution for partially or fully qualified function reference is:
> > temp
> > > > functions -> persistent functions.)
> > > >
> > > > 3. In addition to #1, support creating and referencing temporary
> > > functions
> > > > associated with a cat/db with a special namespace for built-in
> > functions
> > > > and global temp functions. The resolution is the same as #2, except
> > that
> > > > the special namespace might be prefixed to a reference to a built-in
> > > > function or global temp function. (In absence of the special
> namespace,
> > > the
> > > > resolution order is the same as in #2.)
> > > >
> > > > My personal preference is #1, given the unknown use case and
> introduced
> > > > complexity for #2 and #3. However, #2 is an acceptable alternative.
> > Thus,
> > > > my votes are:
> > > >
> > > > +1 for #1
> > > > +0 for #2
> > > > -1 for #3
> > > >
> > > > Everyone, please cast your vote (in above format please!), or let me
> > know
> > > > if you have more questions or other candidates.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> aljoscha@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I think this discussion and the one for FLIP-64 are very connected.
> > To
> > > > > resolve the differences, think we have to think about the basic
> > > > principles
> > > > > and find consensus there. The basic questions I see are:
> > > > >
> > > > >  - Do we want to support overriding builtin functions?
> > > > >  - Do we want to support overriding catalog functions?
> > > > >  - And then later: should temporary functions be tied to a
> > > > > catalog/database?
> > > > >
> > > > > I don’t have much to say about these, except that we should
> somewhat
> > > > stick
> > > > > to what the industry does. But I also understand that the industry
> is
> > > > > already very divided on this.
> > > > >
> > > > > Best,
> > > > > Aljoscha
> > > > >
> > > > > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > +1 to strive for reaching consensus on the remaining topics. We
> are
> > > > > close to the truth. It will waste a lot of time if we resume the
> > topic
> > > > some
> > > > > time later.
> > > > > >
> > > > > > +1 to “1-part/override” and I’m also fine with Timo’s
> “cat.db.fun”
> > > way
> > > > > to override a catalog function.
> > > > > >
> > > > > > I’m not sure about “system.system.fun”, it introduces a
> nonexistent
> > > cat
> > > > > & db? And we still need to do special treatment for the dedicated
> > > > > system.system cat & db?
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > > > > >>
> > > > > >> Hi everyone,
> > > > > >>
> > > > > >> @Xuefu: I would like to avoid adding too many things
> > incrementally.
> > > > > Users should be able to override all catalog objects consistently
> > > > according
> > > > > to FLIP-64 (Support for Temporary Objects in Table module). If
> > > functions
> > > > > are treated completely different, we need more code and special
> > cases.
> > > > From
> > > > > an implementation perspective, this topic only affects the lookup
> > logic
> > > > > which is rather low implementation effort which is why I would like
> > to
> > > > > clarify the remaining items. As you said, we have a slight consenus
> > on
> > > > > overriding built-in functions; we should also strive for reaching
> > > > consensus
> > > > > on the remaining topics.
> > > > > >>
> > > > > >> @Dawid: I like your idea as it ensures registering catalog
> objects
> > > > > consistent and the overriding of built-in functions more explicit.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Timo
> > > > > >>
> > > > > >>
> > > > > >> On 17.09.19 11:59, kai wang wrote:
> > > > > >>> hi, everyone
> > > > > >>> I think this flip is very meaningful. it supports functions
> that
> > > can
> > > > be
> > > > > >>> shared by different catalogs and dbs, reducing the duplication
> of
> > > > > functions.
> > > > > >>>
> > > > > >>> Our group based on flink's sql parser module implements create
> > > > function
> > > > > >>> feature, stores the parsed function metadata and schema into
> > mysql,
> > > > and
> > > > > >>> also customizes the catalog, customizes sql-client to support
> > > custom
> > > > > >>> schemas and functions. Loaded, but the function is currently
> > > global,
> > > > > and is
> > > > > >>> not subdivided according to catalog and db.
> > > > > >>>
> > > > > >>> In addition, I very much hope to participate in the development
> > of
> > > > this
> > > > > >>> flip, I have been paying attention to the community, but found
> it
> > > is
> > > > > more
> > > > > >>> difficult to join.
> > > > > >>> thank you.
> > > > > >>>
> > > > > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > > > > >>>
> > > > > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > > > >>>>
> > > > > >>>> It seems to me that there is a general consensus on having
> temp
> > > > > functions
> > > > > >>>> that have no namespaces and overwrite built-in functions. (As
> a
> > > side
> > > > > note
> > > > > >>>> for comparability, the current user defined functions are all
> > > > > temporary and
> > > > > >>>> having no namespaces.)
> > > > > >>>>
> > > > > >>>> Nevertheless, I can also see the merit of having namespaced
> temp
> > > > > functions
> > > > > >>>> that can overwrite functions defined in a specific cat/db.
> > > However,
> > > > > this
> > > > > >>>> idea appears orthogonal to the former and can be added
> > > > incrementally.
> > > > > >>>>
> > > > > >>>> How about we first implement non-namespaced temp functions now
> > and
> > > > > leave
> > > > > >>>> the door open for namespaced ones for later releases as the
> > > > > requirement
> > > > > >>>> might become more crystal? This also helps shorten the debate
> > and
> > > > > allow us
> > > > > >>>> to make some progress along this direction.
> > > > > >>>>
> > > > > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> > > > temporary
> > > > > temp
> > > > > >>>> functions that don't have namespaces, my only concern is the
> > > special
> > > > > >>>> treatment for a cat/db, which makes code less clean, as
> evident
> > in
> > > > > treating
> > > > > >>>> the built-in catalog currently.
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>> Xuefiu
> > > > > >>>>
> > > > > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > > > >>>> wysakowicz.dawid@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hi,
> > > > > >>>>> Another idea to consider on top of Timo's suggestion. How
> about
> > > we
> > > > > have a
> > > > > >>>>> special namespace (catalog + database) for built-in objects?
> > This
> > > > > catalog
> > > > > >>>>> would be invisible for users as Xuefu was suggesting.
> > > > > >>>>>
> > > > > >>>>> Then users could still override built-in functions, if they
> > fully
> > > > > qualify
> > > > > >>>>> object with the built-in namespace, but by default the common
> > > logic
> > > > > of
> > > > > >>>>> current dB & cat would be used.
> > > > > >>>>>
> > > > > >>>>> CREATE TEMPORARY FUNCTION func ...
> > > > > >>>>> registers temporary function in current cat & dB
> > > > > >>>>>
> > > > > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > > > >>>>> registers temporary function in cat db
> > > > > >>>>>
> > > > > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > > > >>>>> Overrides built-in function with temporary function
> > > > > >>>>>
> > > > > >>>>> The built-in/system namespace would not be writable for
> > permanent
> > > > > >>>> objects.
> > > > > >>>>> WDYT?
> > > > > >>>>>
> > > > > >>>>> This way I think we can have benefits of both solutions.
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Dawid
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <twalthr@apache.org
> >
> > > > wrote:
> > > > > >>>>>
> > > > > >>>>>> Hi Bowen,
> > > > > >>>>>>
> > > > > >>>>>> I understand the potential benefit of overriding certain
> > > built-in
> > > > > >>>>>> functions. I'm open to such a feature if many people agree.
> > > > > However, it
> > > > > >>>>>> would be great to still support overriding catalog functions
> > > with
> > > > > >>>>>> temporary functions in order to prototype a query even
> though
> > a
> > > > > >>>>>> catalog/database might not be available currently or should
> > not
> > > be
> > > > > >>>>>> modified yet. How about we support both cases?
> > > > > >>>>>>
> > > > > >>>>>> CREATE TEMPORARY FUNCTION abs
> > > > > >>>>>> -> creates/overrides a built-in function and never
> consideres
> > > > > current
> > > > > >>>>>> catalog and database; inconsistent with other DDL but
> > acceptable
> > > > for
> > > > > >>>>>> functions I guess.
> > > > > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > > > >>>>>> -> creates/overrides a catalog function
> > > > > >>>>>>
> > > > > >>>>>> Regarding "Flink don't have any other built-in objects
> > (tables,
> > > > > views)
> > > > > >>>>>> except functions", this might change in the near future.
> Take
> > > > > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> > > example.
> > > > > >>>>>>
> > > > > >>>>>> Thanks,
> > > > > >>>>>> Timo
> > > > > >>>>>>
> > > > > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > > > >>>>>>> Hi Fabian,
> > > > > >>>>>>>
> > > > > >>>>>>> Yes, I agree 1-part/no-override is the least favorable
> thus I
> > > > > didn't
> > > > > >>>>>>> include that as a voting option, and the discussion is
> mainly
> > > > > between
> > > > > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> > > > > >>>>>>>
> > > > > >>>>>>> Re > However, it means that temp functions are differently
> > > > treated
> > > > > >>>> than
> > > > > >>>>>>> other db objects.
> > > > > >>>>>>> IMO, the treatment difference results from the fact that
> > > > functions
> > > > > >>>> are
> > > > > >>>>> a
> > > > > >>>>>>> bit different from other objects - Flink don't have any
> other
> > > > > >>>> built-in
> > > > > >>>>>>> objects (tables, views) except functions.
> > > > > >>>>>>>
> > > > > >>>>>>> Cheers,
> > > > > >>>>>>> Bowen
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> Xuefu Zhang
> > > > > >>>>
> > > > > >>>> "In Honey We Trust!"
> > > > > >>>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > >
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <wy...@gmail.com>.
@Bowen I am not suggesting introducing additional catalog. I think we need
to get rid of the current built-in catalog.

@Xuefu in option #3 we also don't need additional referencing the special
catalog anywhere else besides in the CREATE statement. The resolution
behaviour is exactly the same in both options.

On Thu, 19 Sep 2019, 08:17 Xuefu Z, <us...@gmail.com> wrote:

> Hi Dawid,
>
> "GLOBAL" is a temporary keyword that was given to the approach. It can be
> changed to something else for better.
>
> The difference between this and the #3 approach is that we only need the
> keyword for this create DDL. For other places (such as function
> referencing), no keyword or special namespace is needed.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> wysakowicz.dawid@gmail.com>
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> it
> > is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >
> > > Hi Aljoscha,
> > >
> > > Thanks for the summary and these are great questions to be answered.
> The
> > > answer to your first question is clear: there is a general agreement to
> > > override built-in functions with temp functions.
> > >
> > > However, your second and third questions are sort of related, as a
> > function
> > > reference can be either just function name (like "func") or in the form
> > or
> > > "cat.db.func". When a reference is just function name, it can mean
> > either a
> > > built-in function or a function defined in the current cat/db. If we
> > > support overriding a built-in function with a temp function, such
> > > overriding can also cover a function in the current cat/db.
> > >
> > > I think what Timo referred as "overriding a catalog function" means a
> > temp
> > > function defined as "cat.db.func" overrides a catalog function "func"
> in
> > > cat/db even if cat/db is not current. To support this, temp function
> has
> > to
> > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > questions
> > > are related. The problem with such support is the ambiguity when user
> > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> > > Here "func" can means a global temp function, or a temp function in
> > current
> > > cat/db. If we can assume the former, this creates an inconsistency
> > because
> > > "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> > > assume the latter, then there is no way for user to create a global
> temp
> > > function.
> > >
> > > Giving a special namespace for built-in functions may solve the
> ambiguity
> > > problem above, but it also introduces artificial catalog/database that
> > > needs special treatment and pollutes the cleanness of  the code. I
> would
> > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > [GLOBAL] TEMPORARY FUNCTION func".
> > >
> > > Thus, I'd like to summarize a few candidate proposals for voting
> > purposes:
> > >
> > > 1. Support only global, temporary functions without namespace. Such
> temp
> > > functions overrides built-in functions and catalog functions in current
> > > cat/db. The resolution order is: temp functions -> built-in functions
> ->
> > > catalog functions. (Partially or fully qualified functions has no
> > > ambiguity!)
> > >
> > > 2. In addition to #1, support creating and referencing temporary
> > functions
> > > associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> > > functions. The resolution order is: global temp functions -> built-in
> > > functions -> temp functions in current cat/db -> catalog function.
> > > (Resolution for partially or fully qualified function reference is:
> temp
> > > functions -> persistent functions.)
> > >
> > > 3. In addition to #1, support creating and referencing temporary
> > functions
> > > associated with a cat/db with a special namespace for built-in
> functions
> > > and global temp functions. The resolution is the same as #2, except
> that
> > > the special namespace might be prefixed to a reference to a built-in
> > > function or global temp function. (In absence of the special namespace,
> > the
> > > resolution order is the same as in #2.)
> > >
> > > My personal preference is #1, given the unknown use case and introduced
> > > complexity for #2 and #3. However, #2 is an acceptable alternative.
> Thus,
> > > my votes are:
> > >
> > > +1 for #1
> > > +0 for #2
> > > -1 for #3
> > >
> > > Everyone, please cast your vote (in above format please!), or let me
> know
> > > if you have more questions or other candidates.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I think this discussion and the one for FLIP-64 are very connected.
> To
> > > > resolve the differences, think we have to think about the basic
> > > principles
> > > > and find consensus there. The basic questions I see are:
> > > >
> > > >  - Do we want to support overriding builtin functions?
> > > >  - Do we want to support overriding catalog functions?
> > > >  - And then later: should temporary functions be tied to a
> > > > catalog/database?
> > > >
> > > > I don’t have much to say about these, except that we should somewhat
> > > stick
> > > > to what the industry does. But I also understand that the industry is
> > > > already very divided on this.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > +1 to strive for reaching consensus on the remaining topics. We are
> > > > close to the truth. It will waste a lot of time if we resume the
> topic
> > > some
> > > > time later.
> > > > >
> > > > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun”
> > way
> > > > to override a catalog function.
> > > > >
> > > > > I’m not sure about “system.system.fun”, it introduces a nonexistent
> > cat
> > > > & db? And we still need to do special treatment for the dedicated
> > > > system.system cat & db?
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > > > >>
> > > > >> Hi everyone,
> > > > >>
> > > > >> @Xuefu: I would like to avoid adding too many things
> incrementally.
> > > > Users should be able to override all catalog objects consistently
> > > according
> > > > to FLIP-64 (Support for Temporary Objects in Table module). If
> > functions
> > > > are treated completely different, we need more code and special
> cases.
> > > From
> > > > an implementation perspective, this topic only affects the lookup
> logic
> > > > which is rather low implementation effort which is why I would like
> to
> > > > clarify the remaining items. As you said, we have a slight consenus
> on
> > > > overriding built-in functions; we should also strive for reaching
> > > consensus
> > > > on the remaining topics.
> > > > >>
> > > > >> @Dawid: I like your idea as it ensures registering catalog objects
> > > > consistent and the overriding of built-in functions more explicit.
> > > > >>
> > > > >> Thanks,
> > > > >> Timo
> > > > >>
> > > > >>
> > > > >> On 17.09.19 11:59, kai wang wrote:
> > > > >>> hi, everyone
> > > > >>> I think this flip is very meaningful. it supports functions that
> > can
> > > be
> > > > >>> shared by different catalogs and dbs, reducing the duplication of
> > > > functions.
> > > > >>>
> > > > >>> Our group based on flink's sql parser module implements create
> > > function
> > > > >>> feature, stores the parsed function metadata and schema into
> mysql,
> > > and
> > > > >>> also customizes the catalog, customizes sql-client to support
> > custom
> > > > >>> schemas and functions. Loaded, but the function is currently
> > global,
> > > > and is
> > > > >>> not subdivided according to catalog and db.
> > > > >>>
> > > > >>> In addition, I very much hope to participate in the development
> of
> > > this
> > > > >>> flip, I have been paying attention to the community, but found it
> > is
> > > > more
> > > > >>> difficult to join.
> > > > >>> thank you.
> > > > >>>
> > > > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > > > >>>
> > > > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > > >>>>
> > > > >>>> It seems to me that there is a general consensus on having temp
> > > > functions
> > > > >>>> that have no namespaces and overwrite built-in functions. (As a
> > side
> > > > note
> > > > >>>> for comparability, the current user defined functions are all
> > > > temporary and
> > > > >>>> having no namespaces.)
> > > > >>>>
> > > > >>>> Nevertheless, I can also see the merit of having namespaced temp
> > > > functions
> > > > >>>> that can overwrite functions defined in a specific cat/db.
> > However,
> > > > this
> > > > >>>> idea appears orthogonal to the former and can be added
> > > incrementally.
> > > > >>>>
> > > > >>>> How about we first implement non-namespaced temp functions now
> and
> > > > leave
> > > > >>>> the door open for namespaced ones for later releases as the
> > > > requirement
> > > > >>>> might become more crystal? This also helps shorten the debate
> and
> > > > allow us
> > > > >>>> to make some progress along this direction.
> > > > >>>>
> > > > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> > > temporary
> > > > temp
> > > > >>>> functions that don't have namespaces, my only concern is the
> > special
> > > > >>>> treatment for a cat/db, which makes code less clean, as evident
> in
> > > > treating
> > > > >>>> the built-in catalog currently.
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Xuefiu
> > > > >>>>
> > > > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > > >>>> wysakowicz.dawid@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi,
> > > > >>>>> Another idea to consider on top of Timo's suggestion. How about
> > we
> > > > have a
> > > > >>>>> special namespace (catalog + database) for built-in objects?
> This
> > > > catalog
> > > > >>>>> would be invisible for users as Xuefu was suggesting.
> > > > >>>>>
> > > > >>>>> Then users could still override built-in functions, if they
> fully
> > > > qualify
> > > > >>>>> object with the built-in namespace, but by default the common
> > logic
> > > > of
> > > > >>>>> current dB & cat would be used.
> > > > >>>>>
> > > > >>>>> CREATE TEMPORARY FUNCTION func ...
> > > > >>>>> registers temporary function in current cat & dB
> > > > >>>>>
> > > > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > > >>>>> registers temporary function in cat db
> > > > >>>>>
> > > > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > > >>>>> Overrides built-in function with temporary function
> > > > >>>>>
> > > > >>>>> The built-in/system namespace would not be writable for
> permanent
> > > > >>>> objects.
> > > > >>>>> WDYT?
> > > > >>>>>
> > > > >>>>> This way I think we can have benefits of both solutions.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Dawid
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
> > > wrote:
> > > > >>>>>
> > > > >>>>>> Hi Bowen,
> > > > >>>>>>
> > > > >>>>>> I understand the potential benefit of overriding certain
> > built-in
> > > > >>>>>> functions. I'm open to such a feature if many people agree.
> > > > However, it
> > > > >>>>>> would be great to still support overriding catalog functions
> > with
> > > > >>>>>> temporary functions in order to prototype a query even though
> a
> > > > >>>>>> catalog/database might not be available currently or should
> not
> > be
> > > > >>>>>> modified yet. How about we support both cases?
> > > > >>>>>>
> > > > >>>>>> CREATE TEMPORARY FUNCTION abs
> > > > >>>>>> -> creates/overrides a built-in function and never consideres
> > > > current
> > > > >>>>>> catalog and database; inconsistent with other DDL but
> acceptable
> > > for
> > > > >>>>>> functions I guess.
> > > > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > > >>>>>> -> creates/overrides a catalog function
> > > > >>>>>>
> > > > >>>>>> Regarding "Flink don't have any other built-in objects
> (tables,
> > > > views)
> > > > >>>>>> except functions", this might change in the near future. Take
> > > > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> > example.
> > > > >>>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> Timo
> > > > >>>>>>
> > > > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > > >>>>>>> Hi Fabian,
> > > > >>>>>>>
> > > > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> > > > didn't
> > > > >>>>>>> include that as a voting option, and the discussion is mainly
> > > > between
> > > > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> > > > >>>>>>>
> > > > >>>>>>> Re > However, it means that temp functions are differently
> > > treated
> > > > >>>> than
> > > > >>>>>>> other db objects.
> > > > >>>>>>> IMO, the treatment difference results from the fact that
> > > functions
> > > > >>>> are
> > > > >>>>> a
> > > > >>>>>>> bit different from other objects - Flink don't have any other
> > > > >>>> built-in
> > > > >>>>>>> objects (tables, views) except functions.
> > > > >>>>>>>
> > > > >>>>>>> Cheers,
> > > > >>>>>>> Bowen
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>> Xuefu Zhang
> > > > >>>>
> > > > >>>> "In Honey We Trust!"
> > > > >>>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi Dawid,

"GLOBAL" is a temporary keyword that was given to the approach. It can be
changed to something else for better.

The difference between this and the #3 approach is that we only need the
keyword for this create DDL. For other places (such as function
referencing), no keyword or special namespace is needed.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <wy...@gmail.com>
wrote:

> Hi,
> I think it makes sense to start voting at this point.
>
> Option 1: Only 1-part identifiers
> PROS:
> - allows shadowing built-in functions
> CONS:
> - incosistent with all the other objects, both permanent & temporary
> - does not allow shadowing catalog functions
>
> Option 2: Special keyword for built-in function
> I think this is quite similar to the special catalog/db. The thing I am
> strongly against in this proposal is the GLOBAL keyword. This keyword has a
> meaning in rdbms systems and means a function that is present for a
> lifetime of a session in which it was created, but available in all other
> sessions. Therefore I really don't want to use this keyword in a different
> context.
>
> Option 3: Special catalog/db
>
> PROS:
> - allows shadowing built-in functions
> - allows shadowing catalog functions
> - consistent with other objects
> CONS:
> - we introduce a special namespace for built-in functions
>
> I don't see a problem with introducing the special namespace. In the end it
> is very similar to the keyword approach. In this case the catalog/db
> combination would be the "keyword"
>
> Therefore my votes:
> Option 1: -0
> Option 2: -1 (I might change to +0 if we can come up with a better keyword)
> Option 3: +1
>
> Best,
> Dawid
>
>
> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>
> > Hi Aljoscha,
> >
> > Thanks for the summary and these are great questions to be answered. The
> > answer to your first question is clear: there is a general agreement to
> > override built-in functions with temp functions.
> >
> > However, your second and third questions are sort of related, as a
> function
> > reference can be either just function name (like "func") or in the form
> or
> > "cat.db.func". When a reference is just function name, it can mean
> either a
> > built-in function or a function defined in the current cat/db. If we
> > support overriding a built-in function with a temp function, such
> > overriding can also cover a function in the current cat/db.
> >
> > I think what Timo referred as "overriding a catalog function" means a
> temp
> > function defined as "cat.db.func" overrides a catalog function "func" in
> > cat/db even if cat/db is not current. To support this, temp function has
> to
> > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> questions
> > are related. The problem with such support is the ambiguity when user
> > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> > Here "func" can means a global temp function, or a temp function in
> current
> > cat/db. If we can assume the former, this creates an inconsistency
> because
> > "CREATE FUNCTION func" actually means a function in current cat/db. If we
> > assume the latter, then there is no way for user to create a global temp
> > function.
> >
> > Giving a special namespace for built-in functions may solve the ambiguity
> > problem above, but it also introduces artificial catalog/database that
> > needs special treatment and pollutes the cleanness of  the code. I would
> > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > [GLOBAL] TEMPORARY FUNCTION func".
> >
> > Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >
> > 1. Support only global, temporary functions without namespace. Such temp
> > functions overrides built-in functions and catalog functions in current
> > cat/db. The resolution order is: temp functions -> built-in functions ->
> > catalog functions. (Partially or fully qualified functions has no
> > ambiguity!)
> >
> > 2. In addition to #1, support creating and referencing temporary
> functions
> > associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> > functions. The resolution order is: global temp functions -> built-in
> > functions -> temp functions in current cat/db -> catalog function.
> > (Resolution for partially or fully qualified function reference is: temp
> > functions -> persistent functions.)
> >
> > 3. In addition to #1, support creating and referencing temporary
> functions
> > associated with a cat/db with a special namespace for built-in functions
> > and global temp functions. The resolution is the same as #2, except that
> > the special namespace might be prefixed to a reference to a built-in
> > function or global temp function. (In absence of the special namespace,
> the
> > resolution order is the same as in #2.)
> >
> > My personal preference is #1, given the unknown use case and introduced
> > complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
> > my votes are:
> >
> > +1 for #1
> > +0 for #2
> > -1 for #3
> >
> > Everyone, please cast your vote (in above format please!), or let me know
> > if you have more questions or other candidates.
> >
> > Thanks,
> > Xuefu
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > I think this discussion and the one for FLIP-64 are very connected. To
> > > resolve the differences, think we have to think about the basic
> > principles
> > > and find consensus there. The basic questions I see are:
> > >
> > >  - Do we want to support overriding builtin functions?
> > >  - Do we want to support overriding catalog functions?
> > >  - And then later: should temporary functions be tied to a
> > > catalog/database?
> > >
> > > I don’t have much to say about these, except that we should somewhat
> > stick
> > > to what the industry does. But I also understand that the industry is
> > > already very divided on this.
> > >
> > > Best,
> > > Aljoscha
> > >
> > > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > +1 to strive for reaching consensus on the remaining topics. We are
> > > close to the truth. It will waste a lot of time if we resume the topic
> > some
> > > time later.
> > > >
> > > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun”
> way
> > > to override a catalog function.
> > > >
> > > > I’m not sure about “system.system.fun”, it introduces a nonexistent
> cat
> > > & db? And we still need to do special treatment for the dedicated
> > > system.system cat & db?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> @Xuefu: I would like to avoid adding too many things incrementally.
> > > Users should be able to override all catalog objects consistently
> > according
> > > to FLIP-64 (Support for Temporary Objects in Table module). If
> functions
> > > are treated completely different, we need more code and special cases.
> > From
> > > an implementation perspective, this topic only affects the lookup logic
> > > which is rather low implementation effort which is why I would like to
> > > clarify the remaining items. As you said, we have a slight consenus on
> > > overriding built-in functions; we should also strive for reaching
> > consensus
> > > on the remaining topics.
> > > >>
> > > >> @Dawid: I like your idea as it ensures registering catalog objects
> > > consistent and the overriding of built-in functions more explicit.
> > > >>
> > > >> Thanks,
> > > >> Timo
> > > >>
> > > >>
> > > >> On 17.09.19 11:59, kai wang wrote:
> > > >>> hi, everyone
> > > >>> I think this flip is very meaningful. it supports functions that
> can
> > be
> > > >>> shared by different catalogs and dbs, reducing the duplication of
> > > functions.
> > > >>>
> > > >>> Our group based on flink's sql parser module implements create
> > function
> > > >>> feature, stores the parsed function metadata and schema into mysql,
> > and
> > > >>> also customizes the catalog, customizes sql-client to support
> custom
> > > >>> schemas and functions. Loaded, but the function is currently
> global,
> > > and is
> > > >>> not subdivided according to catalog and db.
> > > >>>
> > > >>> In addition, I very much hope to participate in the development of
> > this
> > > >>> flip, I have been paying attention to the community, but found it
> is
> > > more
> > > >>> difficult to join.
> > > >>> thank you.
> > > >>>
> > > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > > >>>
> > > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > > >>>>
> > > >>>> It seems to me that there is a general consensus on having temp
> > > functions
> > > >>>> that have no namespaces and overwrite built-in functions. (As a
> side
> > > note
> > > >>>> for comparability, the current user defined functions are all
> > > temporary and
> > > >>>> having no namespaces.)
> > > >>>>
> > > >>>> Nevertheless, I can also see the merit of having namespaced temp
> > > functions
> > > >>>> that can overwrite functions defined in a specific cat/db.
> However,
> > > this
> > > >>>> idea appears orthogonal to the former and can be added
> > incrementally.
> > > >>>>
> > > >>>> How about we first implement non-namespaced temp functions now and
> > > leave
> > > >>>> the door open for namespaced ones for later releases as the
> > > requirement
> > > >>>> might become more crystal? This also helps shorten the debate and
> > > allow us
> > > >>>> to make some progress along this direction.
> > > >>>>
> > > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> > temporary
> > > temp
> > > >>>> functions that don't have namespaces, my only concern is the
> special
> > > >>>> treatment for a cat/db, which makes code less clean, as evident in
> > > treating
> > > >>>> the built-in catalog currently.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Xuefiu
> > > >>>>
> > > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > > >>>> wysakowicz.dawid@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>> Another idea to consider on top of Timo's suggestion. How about
> we
> > > have a
> > > >>>>> special namespace (catalog + database) for built-in objects? This
> > > catalog
> > > >>>>> would be invisible for users as Xuefu was suggesting.
> > > >>>>>
> > > >>>>> Then users could still override built-in functions, if they fully
> > > qualify
> > > >>>>> object with the built-in namespace, but by default the common
> logic
> > > of
> > > >>>>> current dB & cat would be used.
> > > >>>>>
> > > >>>>> CREATE TEMPORARY FUNCTION func ...
> > > >>>>> registers temporary function in current cat & dB
> > > >>>>>
> > > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > > >>>>> registers temporary function in cat db
> > > >>>>>
> > > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > > >>>>> Overrides built-in function with temporary function
> > > >>>>>
> > > >>>>> The built-in/system namespace would not be writable for permanent
> > > >>>> objects.
> > > >>>>> WDYT?
> > > >>>>>
> > > >>>>> This way I think we can have benefits of both solutions.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Dawid
> > > >>>>>
> > > >>>>>
> > > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
> > wrote:
> > > >>>>>
> > > >>>>>> Hi Bowen,
> > > >>>>>>
> > > >>>>>> I understand the potential benefit of overriding certain
> built-in
> > > >>>>>> functions. I'm open to such a feature if many people agree.
> > > However, it
> > > >>>>>> would be great to still support overriding catalog functions
> with
> > > >>>>>> temporary functions in order to prototype a query even though a
> > > >>>>>> catalog/database might not be available currently or should not
> be
> > > >>>>>> modified yet. How about we support both cases?
> > > >>>>>>
> > > >>>>>> CREATE TEMPORARY FUNCTION abs
> > > >>>>>> -> creates/overrides a built-in function and never consideres
> > > current
> > > >>>>>> catalog and database; inconsistent with other DDL but acceptable
> > for
> > > >>>>>> functions I guess.
> > > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > > >>>>>> -> creates/overrides a catalog function
> > > >>>>>>
> > > >>>>>> Regarding "Flink don't have any other built-in objects (tables,
> > > views)
> > > >>>>>> except functions", this might change in the near future. Take
> > > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> example.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Timo
> > > >>>>>>
> > > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > > >>>>>>> Hi Fabian,
> > > >>>>>>>
> > > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> > > didn't
> > > >>>>>>> include that as a voting option, and the discussion is mainly
> > > between
> > > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> > > >>>>>>>
> > > >>>>>>> Re > However, it means that temp functions are differently
> > treated
> > > >>>> than
> > > >>>>>>> other db objects.
> > > >>>>>>> IMO, the treatment difference results from the fact that
> > functions
> > > >>>> are
> > > >>>>> a
> > > >>>>>>> bit different from other objects - Flink don't have any other
> > > >>>> built-in
> > > >>>>>>> objects (tables, views) except functions.
> > > >>>>>>>
> > > >>>>>>> Cheers,
> > > >>>>>>> Bowen
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>> --
> > > >>>> Xuefu Zhang
> > > >>>>
> > > >>>> "In Honey We Trust!"
> > > >>>>
> > > >>
> > > >
> > >
> > >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi,

For #2, as Xuefu and I discussed offline, the key point is to introduce a
keyword to SQL DDL to distinguish temp function that override built-in
functions v.s. temp functions that override catalog functions. It can be
something else than "GLOBAL", like "BUILTIN" (e.g. "CREATE BUILTIN TEMP
FUNCTION") given there's no SQL standard for temp functions. This option is
more about whether we want to have such a new keyword in SQL for the
proposed functionality, and I personally think it's acceptable.

For #3, besides the drawbacks mentioned by Xuefu and Jark, another con is:
we already have a special catalog and db as "default_catalog" and
"default_database", though they are not used very correctly at the moment
(another story...), they are at least physically present. Introducing yet
another virtual special catalog/db as something like "system"."system" that
never physically exist would further confuse users, and
hurt understandability and usability.

Thus my vote is:

+1 for #1
0 for #2
-1 for #3

Thanks,
Bowen

On Wed, Sep 18, 2019 at 4:55 PM Dawid Wysakowicz <wy...@gmail.com>
wrote:

> Last additional comment on Option 2. The reason why I prefer option 3 is
> that in option 3 all objects internally are identified with 3 parts. This
> makes it easier to handle at different locations e.g. while persisting
> views, as all objects have uniform representation.
>
> On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, <wy...@gmail.com>
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> > it is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >
> >> Hi Aljoscha,
> >>
> >> Thanks for the summary and these are great questions to be answered. The
> >> answer to your first question is clear: there is a general agreement to
> >> override built-in functions with temp functions.
> >>
> >> However, your second and third questions are sort of related, as a
> >> function
> >> reference can be either just function name (like "func") or in the form
> or
> >> "cat.db.func". When a reference is just function name, it can mean
> either
> >> a
> >> built-in function or a function defined in the current cat/db. If we
> >> support overriding a built-in function with a temp function, such
> >> overriding can also cover a function in the current cat/db.
> >>
> >> I think what Timo referred as "overriding a catalog function" means a
> temp
> >> function defined as "cat.db.func" overrides a catalog function "func" in
> >> cat/db even if cat/db is not current. To support this, temp function has
> >> to
> >> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >> questions
> >> are related. The problem with such support is the ambiguity when user
> >> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> >> Here "func" can means a global temp function, or a temp function in
> >> current
> >> cat/db. If we can assume the former, this creates an inconsistency
> because
> >> "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> >> assume the latter, then there is no way for user to create a global temp
> >> function.
> >>
> >> Giving a special namespace for built-in functions may solve the
> ambiguity
> >> problem above, but it also introduces artificial catalog/database that
> >> needs special treatment and pollutes the cleanness of  the code. I would
> >> rather introduce a syntax in DDL to solve the problem, like "CREATE
> >> [GLOBAL] TEMPORARY FUNCTION func".
> >>
> >> Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >>
> >> 1. Support only global, temporary functions without namespace. Such temp
> >> functions overrides built-in functions and catalog functions in current
> >> cat/db. The resolution order is: temp functions -> built-in functions ->
> >> catalog functions. (Partially or fully qualified functions has no
> >> ambiguity!)
> >>
> >> 2. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> >> functions. The resolution order is: global temp functions -> built-in
> >> functions -> temp functions in current cat/db -> catalog function.
> >> (Resolution for partially or fully qualified function reference is: temp
> >> functions -> persistent functions.)
> >>
> >> 3. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with a special namespace for built-in functions
> >> and global temp functions. The resolution is the same as #2, except that
> >> the special namespace might be prefixed to a reference to a built-in
> >> function or global temp function. (In absence of the special namespace,
> >> the
> >> resolution order is the same as in #2.)
> >>
> >> My personal preference is #1, given the unknown use case and introduced
> >> complexity for #2 and #3. However, #2 is an acceptable alternative.
> Thus,
> >> my votes are:
> >>
> >> +1 for #1
> >> +0 for #2
> >> -1 for #3
> >>
> >> Everyone, please cast your vote (in above format please!), or let me
> know
> >> if you have more questions or other candidates.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I think this discussion and the one for FLIP-64 are very connected. To
> >> > resolve the differences, think we have to think about the basic
> >> principles
> >> > and find consensus there. The basic questions I see are:
> >> >
> >> >  - Do we want to support overriding builtin functions?
> >> >  - Do we want to support overriding catalog functions?
> >> >  - And then later: should temporary functions be tied to a
> >> > catalog/database?
> >> >
> >> > I don’t have much to say about these, except that we should somewhat
> >> stick
> >> > to what the industry does. But I also understand that the industry is
> >> > already very divided on this.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> >> > >
> >> > > Hi,
> >> > >
> >> > > +1 to strive for reaching consensus on the remaining topics. We are
> >> > close to the truth. It will waste a lot of time if we resume the topic
> >> some
> >> > time later.
> >> > >
> >> > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun”
> way
> >> > to override a catalog function.
> >> > >
> >> > > I’m not sure about “system.system.fun”, it introduces a nonexistent
> >> cat
> >> > & db? And we still need to do special treatment for the dedicated
> >> > system.system cat & db?
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > >
> >> > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >> > >>
> >> > >> Hi everyone,
> >> > >>
> >> > >> @Xuefu: I would like to avoid adding too many things incrementally.
> >> > Users should be able to override all catalog objects consistently
> >> according
> >> > to FLIP-64 (Support for Temporary Objects in Table module). If
> functions
> >> > are treated completely different, we need more code and special cases.
> >> From
> >> > an implementation perspective, this topic only affects the lookup
> logic
> >> > which is rather low implementation effort which is why I would like to
> >> > clarify the remaining items. As you said, we have a slight consenus on
> >> > overriding built-in functions; we should also strive for reaching
> >> consensus
> >> > on the remaining topics.
> >> > >>
> >> > >> @Dawid: I like your idea as it ensures registering catalog objects
> >> > consistent and the overriding of built-in functions more explicit.
> >> > >>
> >> > >> Thanks,
> >> > >> Timo
> >> > >>
> >> > >>
> >> > >> On 17.09.19 11:59, kai wang wrote:
> >> > >>> hi, everyone
> >> > >>> I think this flip is very meaningful. it supports functions that
> >> can be
> >> > >>> shared by different catalogs and dbs, reducing the duplication of
> >> > functions.
> >> > >>>
> >> > >>> Our group based on flink's sql parser module implements create
> >> function
> >> > >>> feature, stores the parsed function metadata and schema into
> mysql,
> >> and
> >> > >>> also customizes the catalog, customizes sql-client to support
> custom
> >> > >>> schemas and functions. Loaded, but the function is currently
> global,
> >> > and is
> >> > >>> not subdivided according to catalog and db.
> >> > >>>
> >> > >>> In addition, I very much hope to participate in the development of
> >> this
> >> > >>> flip, I have been paying attention to the community, but found it
> is
> >> > more
> >> > >>> difficult to join.
> >> > >>> thank you.
> >> > >>>
> >> > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >> > >>>
> >> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> >> > >>>>
> >> > >>>> It seems to me that there is a general consensus on having temp
> >> > functions
> >> > >>>> that have no namespaces and overwrite built-in functions. (As a
> >> side
> >> > note
> >> > >>>> for comparability, the current user defined functions are all
> >> > temporary and
> >> > >>>> having no namespaces.)
> >> > >>>>
> >> > >>>> Nevertheless, I can also see the merit of having namespaced temp
> >> > functions
> >> > >>>> that can overwrite functions defined in a specific cat/db.
> However,
> >> > this
> >> > >>>> idea appears orthogonal to the former and can be added
> >> incrementally.
> >> > >>>>
> >> > >>>> How about we first implement non-namespaced temp functions now
> and
> >> > leave
> >> > >>>> the door open for namespaced ones for later releases as the
> >> > requirement
> >> > >>>> might become more crystal? This also helps shorten the debate and
> >> > allow us
> >> > >>>> to make some progress along this direction.
> >> > >>>>
> >> > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> >> temporary
> >> > temp
> >> > >>>> functions that don't have namespaces, my only concern is the
> >> special
> >> > >>>> treatment for a cat/db, which makes code less clean, as evident
> in
> >> > treating
> >> > >>>> the built-in catalog currently.
> >> > >>>>
> >> > >>>> Thanks,
> >> > >>>> Xuefiu
> >> > >>>>
> >> > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >> > >>>> wysakowicz.dawid@gmail.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Hi,
> >> > >>>>> Another idea to consider on top of Timo's suggestion. How about
> we
> >> > have a
> >> > >>>>> special namespace (catalog + database) for built-in objects?
> This
> >> > catalog
> >> > >>>>> would be invisible for users as Xuefu was suggesting.
> >> > >>>>>
> >> > >>>>> Then users could still override built-in functions, if they
> fully
> >> > qualify
> >> > >>>>> object with the built-in namespace, but by default the common
> >> logic
> >> > of
> >> > >>>>> current dB & cat would be used.
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION func ...
> >> > >>>>> registers temporary function in current cat & dB
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >> > >>>>> registers temporary function in cat db
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >> > >>>>> Overrides built-in function with temporary function
> >> > >>>>>
> >> > >>>>> The built-in/system namespace would not be writable for
> permanent
> >> > >>>> objects.
> >> > >>>>> WDYT?
> >> > >>>>>
> >> > >>>>> This way I think we can have benefits of both solutions.
> >> > >>>>>
> >> > >>>>> Best,
> >> > >>>>> Dawid
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
> >> wrote:
> >> > >>>>>
> >> > >>>>>> Hi Bowen,
> >> > >>>>>>
> >> > >>>>>> I understand the potential benefit of overriding certain
> built-in
> >> > >>>>>> functions. I'm open to such a feature if many people agree.
> >> > However, it
> >> > >>>>>> would be great to still support overriding catalog functions
> with
> >> > >>>>>> temporary functions in order to prototype a query even though a
> >> > >>>>>> catalog/database might not be available currently or should not
> >> be
> >> > >>>>>> modified yet. How about we support both cases?
> >> > >>>>>>
> >> > >>>>>> CREATE TEMPORARY FUNCTION abs
> >> > >>>>>> -> creates/overrides a built-in function and never consideres
> >> > current
> >> > >>>>>> catalog and database; inconsistent with other DDL but
> acceptable
> >> for
> >> > >>>>>> functions I guess.
> >> > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >> > >>>>>> -> creates/overrides a catalog function
> >> > >>>>>>
> >> > >>>>>> Regarding "Flink don't have any other built-in objects (tables,
> >> > views)
> >> > >>>>>> except functions", this might change in the near future. Take
> >> > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> example.
> >> > >>>>>>
> >> > >>>>>> Thanks,
> >> > >>>>>> Timo
> >> > >>>>>>
> >> > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >> > >>>>>>> Hi Fabian,
> >> > >>>>>>>
> >> > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> >> > didn't
> >> > >>>>>>> include that as a voting option, and the discussion is mainly
> >> > between
> >> > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> >> > >>>>>>>
> >> > >>>>>>> Re > However, it means that temp functions are differently
> >> treated
> >> > >>>> than
> >> > >>>>>>> other db objects.
> >> > >>>>>>> IMO, the treatment difference results from the fact that
> >> functions
> >> > >>>> are
> >> > >>>>> a
> >> > >>>>>>> bit different from other objects - Flink don't have any other
> >> > >>>> built-in
> >> > >>>>>>> objects (tables, views) except functions.
> >> > >>>>>>>
> >> > >>>>>>> Cheers,
> >> > >>>>>>> Bowen
> >> > >>>>>>>
> >> > >>>>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Xuefu Zhang
> >> > >>>>
> >> > >>>> "In Honey We Trust!"
> >> > >>>>
> >> > >>
> >> > >
> >> >
> >> >
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Re: The reason why I prefer option 3 is that in option 3 all objects
internally are identified with 3 parts.

True, but the problem we have is not about how to differentiate each type
objects internally. Rather, it's rather about how a user referencing an
object unambiguously and consistently.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 4:55 PM Dawid Wysakowicz <wy...@gmail.com>
wrote:

> Last additional comment on Option 2. The reason why I prefer option 3 is
> that in option 3 all objects internally are identified with 3 parts. This
> makes it easier to handle at different locations e.g. while persisting
> views, as all objects have uniform representation.
>
> On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, <wy...@gmail.com>
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> > it is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
> >
> >> Hi Aljoscha,
> >>
> >> Thanks for the summary and these are great questions to be answered. The
> >> answer to your first question is clear: there is a general agreement to
> >> override built-in functions with temp functions.
> >>
> >> However, your second and third questions are sort of related, as a
> >> function
> >> reference can be either just function name (like "func") or in the form
> or
> >> "cat.db.func". When a reference is just function name, it can mean
> either
> >> a
> >> built-in function or a function defined in the current cat/db. If we
> >> support overriding a built-in function with a temp function, such
> >> overriding can also cover a function in the current cat/db.
> >>
> >> I think what Timo referred as "overriding a catalog function" means a
> temp
> >> function defined as "cat.db.func" overrides a catalog function "func" in
> >> cat/db even if cat/db is not current. To support this, temp function has
> >> to
> >> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >> questions
> >> are related. The problem with such support is the ambiguity when user
> >> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> >> Here "func" can means a global temp function, or a temp function in
> >> current
> >> cat/db. If we can assume the former, this creates an inconsistency
> because
> >> "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> >> assume the latter, then there is no way for user to create a global temp
> >> function.
> >>
> >> Giving a special namespace for built-in functions may solve the
> ambiguity
> >> problem above, but it also introduces artificial catalog/database that
> >> needs special treatment and pollutes the cleanness of  the code. I would
> >> rather introduce a syntax in DDL to solve the problem, like "CREATE
> >> [GLOBAL] TEMPORARY FUNCTION func".
> >>
> >> Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >>
> >> 1. Support only global, temporary functions without namespace. Such temp
> >> functions overrides built-in functions and catalog functions in current
> >> cat/db. The resolution order is: temp functions -> built-in functions ->
> >> catalog functions. (Partially or fully qualified functions has no
> >> ambiguity!)
> >>
> >> 2. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> >> functions. The resolution order is: global temp functions -> built-in
> >> functions -> temp functions in current cat/db -> catalog function.
> >> (Resolution for partially or fully qualified function reference is: temp
> >> functions -> persistent functions.)
> >>
> >> 3. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with a special namespace for built-in functions
> >> and global temp functions. The resolution is the same as #2, except that
> >> the special namespace might be prefixed to a reference to a built-in
> >> function or global temp function. (In absence of the special namespace,
> >> the
> >> resolution order is the same as in #2.)
> >>
> >> My personal preference is #1, given the unknown use case and introduced
> >> complexity for #2 and #3. However, #2 is an acceptable alternative.
> Thus,
> >> my votes are:
> >>
> >> +1 for #1
> >> +0 for #2
> >> -1 for #3
> >>
> >> Everyone, please cast your vote (in above format please!), or let me
> know
> >> if you have more questions or other candidates.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I think this discussion and the one for FLIP-64 are very connected. To
> >> > resolve the differences, think we have to think about the basic
> >> principles
> >> > and find consensus there. The basic questions I see are:
> >> >
> >> >  - Do we want to support overriding builtin functions?
> >> >  - Do we want to support overriding catalog functions?
> >> >  - And then later: should temporary functions be tied to a
> >> > catalog/database?
> >> >
> >> > I don’t have much to say about these, except that we should somewhat
> >> stick
> >> > to what the industry does. But I also understand that the industry is
> >> > already very divided on this.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> >> > >
> >> > > Hi,
> >> > >
> >> > > +1 to strive for reaching consensus on the remaining topics. We are
> >> > close to the truth. It will waste a lot of time if we resume the topic
> >> some
> >> > time later.
> >> > >
> >> > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun”
> way
> >> > to override a catalog function.
> >> > >
> >> > > I’m not sure about “system.system.fun”, it introduces a nonexistent
> >> cat
> >> > & db? And we still need to do special treatment for the dedicated
> >> > system.system cat & db?
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > >
> >> > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >> > >>
> >> > >> Hi everyone,
> >> > >>
> >> > >> @Xuefu: I would like to avoid adding too many things incrementally.
> >> > Users should be able to override all catalog objects consistently
> >> according
> >> > to FLIP-64 (Support for Temporary Objects in Table module). If
> functions
> >> > are treated completely different, we need more code and special cases.
> >> From
> >> > an implementation perspective, this topic only affects the lookup
> logic
> >> > which is rather low implementation effort which is why I would like to
> >> > clarify the remaining items. As you said, we have a slight consenus on
> >> > overriding built-in functions; we should also strive for reaching
> >> consensus
> >> > on the remaining topics.
> >> > >>
> >> > >> @Dawid: I like your idea as it ensures registering catalog objects
> >> > consistent and the overriding of built-in functions more explicit.
> >> > >>
> >> > >> Thanks,
> >> > >> Timo
> >> > >>
> >> > >>
> >> > >> On 17.09.19 11:59, kai wang wrote:
> >> > >>> hi, everyone
> >> > >>> I think this flip is very meaningful. it supports functions that
> >> can be
> >> > >>> shared by different catalogs and dbs, reducing the duplication of
> >> > functions.
> >> > >>>
> >> > >>> Our group based on flink's sql parser module implements create
> >> function
> >> > >>> feature, stores the parsed function metadata and schema into
> mysql,
> >> and
> >> > >>> also customizes the catalog, customizes sql-client to support
> custom
> >> > >>> schemas and functions. Loaded, but the function is currently
> global,
> >> > and is
> >> > >>> not subdivided according to catalog and db.
> >> > >>>
> >> > >>> In addition, I very much hope to participate in the development of
> >> this
> >> > >>> flip, I have been paying attention to the community, but found it
> is
> >> > more
> >> > >>> difficult to join.
> >> > >>> thank you.
> >> > >>>
> >> > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >> > >>>
> >> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> >> > >>>>
> >> > >>>> It seems to me that there is a general consensus on having temp
> >> > functions
> >> > >>>> that have no namespaces and overwrite built-in functions. (As a
> >> side
> >> > note
> >> > >>>> for comparability, the current user defined functions are all
> >> > temporary and
> >> > >>>> having no namespaces.)
> >> > >>>>
> >> > >>>> Nevertheless, I can also see the merit of having namespaced temp
> >> > functions
> >> > >>>> that can overwrite functions defined in a specific cat/db.
> However,
> >> > this
> >> > >>>> idea appears orthogonal to the former and can be added
> >> incrementally.
> >> > >>>>
> >> > >>>> How about we first implement non-namespaced temp functions now
> and
> >> > leave
> >> > >>>> the door open for namespaced ones for later releases as the
> >> > requirement
> >> > >>>> might become more crystal? This also helps shorten the debate and
> >> > allow us
> >> > >>>> to make some progress along this direction.
> >> > >>>>
> >> > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> >> temporary
> >> > temp
> >> > >>>> functions that don't have namespaces, my only concern is the
> >> special
> >> > >>>> treatment for a cat/db, which makes code less clean, as evident
> in
> >> > treating
> >> > >>>> the built-in catalog currently.
> >> > >>>>
> >> > >>>> Thanks,
> >> > >>>> Xuefiu
> >> > >>>>
> >> > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >> > >>>> wysakowicz.dawid@gmail.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Hi,
> >> > >>>>> Another idea to consider on top of Timo's suggestion. How about
> we
> >> > have a
> >> > >>>>> special namespace (catalog + database) for built-in objects?
> This
> >> > catalog
> >> > >>>>> would be invisible for users as Xuefu was suggesting.
> >> > >>>>>
> >> > >>>>> Then users could still override built-in functions, if they
> fully
> >> > qualify
> >> > >>>>> object with the built-in namespace, but by default the common
> >> logic
> >> > of
> >> > >>>>> current dB & cat would be used.
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION func ...
> >> > >>>>> registers temporary function in current cat & dB
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >> > >>>>> registers temporary function in cat db
> >> > >>>>>
> >> > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >> > >>>>> Overrides built-in function with temporary function
> >> > >>>>>
> >> > >>>>> The built-in/system namespace would not be writable for
> permanent
> >> > >>>> objects.
> >> > >>>>> WDYT?
> >> > >>>>>
> >> > >>>>> This way I think we can have benefits of both solutions.
> >> > >>>>>
> >> > >>>>> Best,
> >> > >>>>> Dawid
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
> >> wrote:
> >> > >>>>>
> >> > >>>>>> Hi Bowen,
> >> > >>>>>>
> >> > >>>>>> I understand the potential benefit of overriding certain
> built-in
> >> > >>>>>> functions. I'm open to such a feature if many people agree.
> >> > However, it
> >> > >>>>>> would be great to still support overriding catalog functions
> with
> >> > >>>>>> temporary functions in order to prototype a query even though a
> >> > >>>>>> catalog/database might not be available currently or should not
> >> be
> >> > >>>>>> modified yet. How about we support both cases?
> >> > >>>>>>
> >> > >>>>>> CREATE TEMPORARY FUNCTION abs
> >> > >>>>>> -> creates/overrides a built-in function and never consideres
> >> > current
> >> > >>>>>> catalog and database; inconsistent with other DDL but
> acceptable
> >> for
> >> > >>>>>> functions I guess.
> >> > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >> > >>>>>> -> creates/overrides a catalog function
> >> > >>>>>>
> >> > >>>>>> Regarding "Flink don't have any other built-in objects (tables,
> >> > views)
> >> > >>>>>> except functions", this might change in the near future. Take
> >> > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an
> example.
> >> > >>>>>>
> >> > >>>>>> Thanks,
> >> > >>>>>> Timo
> >> > >>>>>>
> >> > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >> > >>>>>>> Hi Fabian,
> >> > >>>>>>>
> >> > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> >> > didn't
> >> > >>>>>>> include that as a voting option, and the discussion is mainly
> >> > between
> >> > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> >> > >>>>>>>
> >> > >>>>>>> Re > However, it means that temp functions are differently
> >> treated
> >> > >>>> than
> >> > >>>>>>> other db objects.
> >> > >>>>>>> IMO, the treatment difference results from the fact that
> >> functions
> >> > >>>> are
> >> > >>>>> a
> >> > >>>>>>> bit different from other objects - Flink don't have any other
> >> > >>>> built-in
> >> > >>>>>>> objects (tables, views) except functions.
> >> > >>>>>>>
> >> > >>>>>>> Cheers,
> >> > >>>>>>> Bowen
> >> > >>>>>>>
> >> > >>>>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Xuefu Zhang
> >> > >>>>
> >> > >>>> "In Honey We Trust!"
> >> > >>>>
> >> > >>
> >> > >
> >> >
> >> >
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <wy...@gmail.com>.
Last additional comment on Option 2. The reason why I prefer option 3 is
that in option 3 all objects internally are identified with 3 parts. This
makes it easier to handle at different locations e.g. while persisting
views, as all objects have uniform representation.

On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, <wy...@gmail.com>
wrote:

> Hi,
> I think it makes sense to start voting at this point.
>
> Option 1: Only 1-part identifiers
> PROS:
> - allows shadowing built-in functions
> CONS:
> - incosistent with all the other objects, both permanent & temporary
> - does not allow shadowing catalog functions
>
> Option 2: Special keyword for built-in function
> I think this is quite similar to the special catalog/db. The thing I am
> strongly against in this proposal is the GLOBAL keyword. This keyword has a
> meaning in rdbms systems and means a function that is present for a
> lifetime of a session in which it was created, but available in all other
> sessions. Therefore I really don't want to use this keyword in a different
> context.
>
> Option 3: Special catalog/db
>
> PROS:
> - allows shadowing built-in functions
> - allows shadowing catalog functions
> - consistent with other objects
> CONS:
> - we introduce a special namespace for built-in functions
>
> I don't see a problem with introducing the special namespace. In the end
> it is very similar to the keyword approach. In this case the catalog/db
> combination would be the "keyword"
>
> Therefore my votes:
> Option 1: -0
> Option 2: -1 (I might change to +0 if we can come up with a better keyword)
> Option 3: +1
>
> Best,
> Dawid
>
>
> On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:
>
>> Hi Aljoscha,
>>
>> Thanks for the summary and these are great questions to be answered. The
>> answer to your first question is clear: there is a general agreement to
>> override built-in functions with temp functions.
>>
>> However, your second and third questions are sort of related, as a
>> function
>> reference can be either just function name (like "func") or in the form or
>> "cat.db.func". When a reference is just function name, it can mean either
>> a
>> built-in function or a function defined in the current cat/db. If we
>> support overriding a built-in function with a temp function, such
>> overriding can also cover a function in the current cat/db.
>>
>> I think what Timo referred as "overriding a catalog function" means a temp
>> function defined as "cat.db.func" overrides a catalog function "func" in
>> cat/db even if cat/db is not current. To support this, temp function has
>> to
>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>> questions
>> are related. The problem with such support is the ambiguity when user
>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
>> Here "func" can means a global temp function, or a temp function in
>> current
>> cat/db. If we can assume the former, this creates an inconsistency because
>> "CREATE FUNCTION func" actually means a function in current cat/db. If we
>> assume the latter, then there is no way for user to create a global temp
>> function.
>>
>> Giving a special namespace for built-in functions may solve the ambiguity
>> problem above, but it also introduces artificial catalog/database that
>> needs special treatment and pollutes the cleanness of  the code. I would
>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>> [GLOBAL] TEMPORARY FUNCTION func".
>>
>> Thus, I'd like to summarize a few candidate proposals for voting purposes:
>>
>> 1. Support only global, temporary functions without namespace. Such temp
>> functions overrides built-in functions and catalog functions in current
>> cat/db. The resolution order is: temp functions -> built-in functions ->
>> catalog functions. (Partially or fully qualified functions has no
>> ambiguity!)
>>
>> 2. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
>> functions. The resolution order is: global temp functions -> built-in
>> functions -> temp functions in current cat/db -> catalog function.
>> (Resolution for partially or fully qualified function reference is: temp
>> functions -> persistent functions.)
>>
>> 3. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with a special namespace for built-in functions
>> and global temp functions. The resolution is the same as #2, except that
>> the special namespace might be prefixed to a reference to a built-in
>> function or global temp function. (In absence of the special namespace,
>> the
>> resolution order is the same as in #2.)
>>
>> My personal preference is #1, given the unknown use case and introduced
>> complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
>> my votes are:
>>
>> +1 for #1
>> +0 for #2
>> -1 for #3
>>
>> Everyone, please cast your vote (in above format please!), or let me know
>> if you have more questions or other candidates.
>>
>> Thanks,
>> Xuefu
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
>> wrote:
>>
>> > Hi,
>> >
>> > I think this discussion and the one for FLIP-64 are very connected. To
>> > resolve the differences, think we have to think about the basic
>> principles
>> > and find consensus there. The basic questions I see are:
>> >
>> >  - Do we want to support overriding builtin functions?
>> >  - Do we want to support overriding catalog functions?
>> >  - And then later: should temporary functions be tied to a
>> > catalog/database?
>> >
>> > I don’t have much to say about these, except that we should somewhat
>> stick
>> > to what the industry does. But I also understand that the industry is
>> > already very divided on this.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > +1 to strive for reaching consensus on the remaining topics. We are
>> > close to the truth. It will waste a lot of time if we resume the topic
>> some
>> > time later.
>> > >
>> > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way
>> > to override a catalog function.
>> > >
>> > > I’m not sure about “system.system.fun”, it introduces a nonexistent
>> cat
>> > & db? And we still need to do special treatment for the dedicated
>> > system.system cat & db?
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > >
>> > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>> > >>
>> > >> Hi everyone,
>> > >>
>> > >> @Xuefu: I would like to avoid adding too many things incrementally.
>> > Users should be able to override all catalog objects consistently
>> according
>> > to FLIP-64 (Support for Temporary Objects in Table module). If functions
>> > are treated completely different, we need more code and special cases.
>> From
>> > an implementation perspective, this topic only affects the lookup logic
>> > which is rather low implementation effort which is why I would like to
>> > clarify the remaining items. As you said, we have a slight consenus on
>> > overriding built-in functions; we should also strive for reaching
>> consensus
>> > on the remaining topics.
>> > >>
>> > >> @Dawid: I like your idea as it ensures registering catalog objects
>> > consistent and the overriding of built-in functions more explicit.
>> > >>
>> > >> Thanks,
>> > >> Timo
>> > >>
>> > >>
>> > >> On 17.09.19 11:59, kai wang wrote:
>> > >>> hi, everyone
>> > >>> I think this flip is very meaningful. it supports functions that
>> can be
>> > >>> shared by different catalogs and dbs, reducing the duplication of
>> > functions.
>> > >>>
>> > >>> Our group based on flink's sql parser module implements create
>> function
>> > >>> feature, stores the parsed function metadata and schema into mysql,
>> and
>> > >>> also customizes the catalog, customizes sql-client to support custom
>> > >>> schemas and functions. Loaded, but the function is currently global,
>> > and is
>> > >>> not subdivided according to catalog and db.
>> > >>>
>> > >>> In addition, I very much hope to participate in the development of
>> this
>> > >>> flip, I have been paying attention to the community, but found it is
>> > more
>> > >>> difficult to join.
>> > >>> thank you.
>> > >>>
>> > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> > >>>
>> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
>> > >>>>
>> > >>>> It seems to me that there is a general consensus on having temp
>> > functions
>> > >>>> that have no namespaces and overwrite built-in functions. (As a
>> side
>> > note
>> > >>>> for comparability, the current user defined functions are all
>> > temporary and
>> > >>>> having no namespaces.)
>> > >>>>
>> > >>>> Nevertheless, I can also see the merit of having namespaced temp
>> > functions
>> > >>>> that can overwrite functions defined in a specific cat/db. However,
>> > this
>> > >>>> idea appears orthogonal to the former and can be added
>> incrementally.
>> > >>>>
>> > >>>> How about we first implement non-namespaced temp functions now and
>> > leave
>> > >>>> the door open for namespaced ones for later releases as the
>> > requirement
>> > >>>> might become more crystal? This also helps shorten the debate and
>> > allow us
>> > >>>> to make some progress along this direction.
>> > >>>>
>> > >>>> As to Dawid's idea of having a dedicated cat/db to host the
>> temporary
>> > temp
>> > >>>> functions that don't have namespaces, my only concern is the
>> special
>> > >>>> treatment for a cat/db, which makes code less clean, as evident in
>> > treating
>> > >>>> the built-in catalog currently.
>> > >>>>
>> > >>>> Thanks,
>> > >>>> Xuefiu
>> > >>>>
>> > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> > >>>> wysakowicz.dawid@gmail.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Hi,
>> > >>>>> Another idea to consider on top of Timo's suggestion. How about we
>> > have a
>> > >>>>> special namespace (catalog + database) for built-in objects? This
>> > catalog
>> > >>>>> would be invisible for users as Xuefu was suggesting.
>> > >>>>>
>> > >>>>> Then users could still override built-in functions, if they fully
>> > qualify
>> > >>>>> object with the built-in namespace, but by default the common
>> logic
>> > of
>> > >>>>> current dB & cat would be used.
>> > >>>>>
>> > >>>>> CREATE TEMPORARY FUNCTION func ...
>> > >>>>> registers temporary function in current cat & dB
>> > >>>>>
>> > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>> > >>>>> registers temporary function in cat db
>> > >>>>>
>> > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>> > >>>>> Overrides built-in function with temporary function
>> > >>>>>
>> > >>>>> The built-in/system namespace would not be writable for permanent
>> > >>>> objects.
>> > >>>>> WDYT?
>> > >>>>>
>> > >>>>> This way I think we can have benefits of both solutions.
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Dawid
>> > >>>>>
>> > >>>>>
>> > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
>> wrote:
>> > >>>>>
>> > >>>>>> Hi Bowen,
>> > >>>>>>
>> > >>>>>> I understand the potential benefit of overriding certain built-in
>> > >>>>>> functions. I'm open to such a feature if many people agree.
>> > However, it
>> > >>>>>> would be great to still support overriding catalog functions with
>> > >>>>>> temporary functions in order to prototype a query even though a
>> > >>>>>> catalog/database might not be available currently or should not
>> be
>> > >>>>>> modified yet. How about we support both cases?
>> > >>>>>>
>> > >>>>>> CREATE TEMPORARY FUNCTION abs
>> > >>>>>> -> creates/overrides a built-in function and never consideres
>> > current
>> > >>>>>> catalog and database; inconsistent with other DDL but acceptable
>> for
>> > >>>>>> functions I guess.
>> > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>> > >>>>>> -> creates/overrides a catalog function
>> > >>>>>>
>> > >>>>>> Regarding "Flink don't have any other built-in objects (tables,
>> > views)
>> > >>>>>> except functions", this might change in the near future. Take
>> > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>> > >>>>>>
>> > >>>>>> Thanks,
>> > >>>>>> Timo
>> > >>>>>>
>> > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
>> > >>>>>>> Hi Fabian,
>> > >>>>>>>
>> > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
>> > didn't
>> > >>>>>>> include that as a voting option, and the discussion is mainly
>> > between
>> > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
>> > >>>>>>>
>> > >>>>>>> Re > However, it means that temp functions are differently
>> treated
>> > >>>> than
>> > >>>>>>> other db objects.
>> > >>>>>>> IMO, the treatment difference results from the fact that
>> functions
>> > >>>> are
>> > >>>>> a
>> > >>>>>>> bit different from other objects - Flink don't have any other
>> > >>>> built-in
>> > >>>>>>> objects (tables, views) except functions.
>> > >>>>>>>
>> > >>>>>>> Cheers,
>> > >>>>>>> Bowen
>> > >>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>> --
>> > >>>> Xuefu Zhang
>> > >>>>
>> > >>>> "In Honey We Trust!"
>> > >>>>
>> > >>
>> > >
>> >
>> >
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <wy...@gmail.com>.
Hi,
I think it makes sense to start voting at this point.

Option 1: Only 1-part identifiers
PROS:
- allows shadowing built-in functions
CONS:
- incosistent with all the other objects, both permanent & temporary
- does not allow shadowing catalog functions

Option 2: Special keyword for built-in function
I think this is quite similar to the special catalog/db. The thing I am
strongly against in this proposal is the GLOBAL keyword. This keyword has a
meaning in rdbms systems and means a function that is present for a
lifetime of a session in which it was created, but available in all other
sessions. Therefore I really don't want to use this keyword in a different
context.

Option 3: Special catalog/db

PROS:
- allows shadowing built-in functions
- allows shadowing catalog functions
- consistent with other objects
CONS:
- we introduce a special namespace for built-in functions

I don't see a problem with introducing the special namespace. In the end it
is very similar to the keyword approach. In this case the catalog/db
combination would be the "keyword"

Therefore my votes:
Option 1: -0
Option 2: -1 (I might change to +0 if we can come up with a better keyword)
Option 3: +1

Best,
Dawid


On Thu, 19 Sep 2019, 05:12 Xuefu Z, <us...@gmail.com> wrote:

> Hi Aljoscha,
>
> Thanks for the summary and these are great questions to be answered. The
> answer to your first question is clear: there is a general agreement to
> override built-in functions with temp functions.
>
> However, your second and third questions are sort of related, as a function
> reference can be either just function name (like "func") or in the form or
> "cat.db.func". When a reference is just function name, it can mean either a
> built-in function or a function defined in the current cat/db. If we
> support overriding a built-in function with a temp function, such
> overriding can also cover a function in the current cat/db.
>
> I think what Timo referred as "overriding a catalog function" means a temp
> function defined as "cat.db.func" overrides a catalog function "func" in
> cat/db even if cat/db is not current. To support this, temp function has to
> be tied to a cat/db. What's why I said above that the 2nd and 3rd questions
> are related. The problem with such support is the ambiguity when user
> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> Here "func" can means a global temp function, or a temp function in current
> cat/db. If we can assume the former, this creates an inconsistency because
> "CREATE FUNCTION func" actually means a function in current cat/db. If we
> assume the latter, then there is no way for user to create a global temp
> function.
>
> Giving a special namespace for built-in functions may solve the ambiguity
> problem above, but it also introduces artificial catalog/database that
> needs special treatment and pollutes the cleanness of  the code. I would
> rather introduce a syntax in DDL to solve the problem, like "CREATE
> [GLOBAL] TEMPORARY FUNCTION func".
>
> Thus, I'd like to summarize a few candidate proposals for voting purposes:
>
> 1. Support only global, temporary functions without namespace. Such temp
> functions overrides built-in functions and catalog functions in current
> cat/db. The resolution order is: temp functions -> built-in functions ->
> catalog functions. (Partially or fully qualified functions has no
> ambiguity!)
>
> 2. In addition to #1, support creating and referencing temporary functions
> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> functions. The resolution order is: global temp functions -> built-in
> functions -> temp functions in current cat/db -> catalog function.
> (Resolution for partially or fully qualified function reference is: temp
> functions -> persistent functions.)
>
> 3. In addition to #1, support creating and referencing temporary functions
> associated with a cat/db with a special namespace for built-in functions
> and global temp functions. The resolution is the same as #2, except that
> the special namespace might be prefixed to a reference to a built-in
> function or global temp function. (In absence of the special namespace, the
> resolution order is the same as in #2.)
>
> My personal preference is #1, given the unknown use case and introduced
> complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
> my votes are:
>
> +1 for #1
> +0 for #2
> -1 for #3
>
> Everyone, please cast your vote (in above format please!), or let me know
> if you have more questions or other candidates.
>
> Thanks,
> Xuefu
>
>
>
>
>
>
>
> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > Hi,
> >
> > I think this discussion and the one for FLIP-64 are very connected. To
> > resolve the differences, think we have to think about the basic
> principles
> > and find consensus there. The basic questions I see are:
> >
> >  - Do we want to support overriding builtin functions?
> >  - Do we want to support overriding catalog functions?
> >  - And then later: should temporary functions be tied to a
> > catalog/database?
> >
> > I don’t have much to say about these, except that we should somewhat
> stick
> > to what the industry does. But I also understand that the industry is
> > already very divided on this.
> >
> > Best,
> > Aljoscha
> >
> > > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > +1 to strive for reaching consensus on the remaining topics. We are
> > close to the truth. It will waste a lot of time if we resume the topic
> some
> > time later.
> > >
> > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way
> > to override a catalog function.
> > >
> > > I’m not sure about “system.system.fun”, it introduces a nonexistent cat
> > & db? And we still need to do special treatment for the dedicated
> > system.system cat & db?
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> > >>
> > >> Hi everyone,
> > >>
> > >> @Xuefu: I would like to avoid adding too many things incrementally.
> > Users should be able to override all catalog objects consistently
> according
> > to FLIP-64 (Support for Temporary Objects in Table module). If functions
> > are treated completely different, we need more code and special cases.
> From
> > an implementation perspective, this topic only affects the lookup logic
> > which is rather low implementation effort which is why I would like to
> > clarify the remaining items. As you said, we have a slight consenus on
> > overriding built-in functions; we should also strive for reaching
> consensus
> > on the remaining topics.
> > >>
> > >> @Dawid: I like your idea as it ensures registering catalog objects
> > consistent and the overriding of built-in functions more explicit.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >>
> > >> On 17.09.19 11:59, kai wang wrote:
> > >>> hi, everyone
> > >>> I think this flip is very meaningful. it supports functions that can
> be
> > >>> shared by different catalogs and dbs, reducing the duplication of
> > functions.
> > >>>
> > >>> Our group based on flink's sql parser module implements create
> function
> > >>> feature, stores the parsed function metadata and schema into mysql,
> and
> > >>> also customizes the catalog, customizes sql-client to support custom
> > >>> schemas and functions. Loaded, but the function is currently global,
> > and is
> > >>> not subdivided according to catalog and db.
> > >>>
> > >>> In addition, I very much hope to participate in the development of
> this
> > >>> flip, I have been paying attention to the community, but found it is
> > more
> > >>> difficult to join.
> > >>> thank you.
> > >>>
> > >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> > >>>
> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > >>>>
> > >>>> It seems to me that there is a general consensus on having temp
> > functions
> > >>>> that have no namespaces and overwrite built-in functions. (As a side
> > note
> > >>>> for comparability, the current user defined functions are all
> > temporary and
> > >>>> having no namespaces.)
> > >>>>
> > >>>> Nevertheless, I can also see the merit of having namespaced temp
> > functions
> > >>>> that can overwrite functions defined in a specific cat/db. However,
> > this
> > >>>> idea appears orthogonal to the former and can be added
> incrementally.
> > >>>>
> > >>>> How about we first implement non-namespaced temp functions now and
> > leave
> > >>>> the door open for namespaced ones for later releases as the
> > requirement
> > >>>> might become more crystal? This also helps shorten the debate and
> > allow us
> > >>>> to make some progress along this direction.
> > >>>>
> > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> temporary
> > temp
> > >>>> functions that don't have namespaces, my only concern is the special
> > >>>> treatment for a cat/db, which makes code less clean, as evident in
> > treating
> > >>>> the built-in catalog currently.
> > >>>>
> > >>>> Thanks,
> > >>>> Xuefiu
> > >>>>
> > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> > >>>> wysakowicz.dawid@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>> Another idea to consider on top of Timo's suggestion. How about we
> > have a
> > >>>>> special namespace (catalog + database) for built-in objects? This
> > catalog
> > >>>>> would be invisible for users as Xuefu was suggesting.
> > >>>>>
> > >>>>> Then users could still override built-in functions, if they fully
> > qualify
> > >>>>> object with the built-in namespace, but by default the common logic
> > of
> > >>>>> current dB & cat would be used.
> > >>>>>
> > >>>>> CREATE TEMPORARY FUNCTION func ...
> > >>>>> registers temporary function in current cat & dB
> > >>>>>
> > >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> > >>>>> registers temporary function in cat db
> > >>>>>
> > >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> > >>>>> Overrides built-in function with temporary function
> > >>>>>
> > >>>>> The built-in/system namespace would not be writable for permanent
> > >>>> objects.
> > >>>>> WDYT?
> > >>>>>
> > >>>>> This way I think we can have benefits of both solutions.
> > >>>>>
> > >>>>> Best,
> > >>>>> Dawid
> > >>>>>
> > >>>>>
> > >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org>
> wrote:
> > >>>>>
> > >>>>>> Hi Bowen,
> > >>>>>>
> > >>>>>> I understand the potential benefit of overriding certain built-in
> > >>>>>> functions. I'm open to such a feature if many people agree.
> > However, it
> > >>>>>> would be great to still support overriding catalog functions with
> > >>>>>> temporary functions in order to prototype a query even though a
> > >>>>>> catalog/database might not be available currently or should not be
> > >>>>>> modified yet. How about we support both cases?
> > >>>>>>
> > >>>>>> CREATE TEMPORARY FUNCTION abs
> > >>>>>> -> creates/overrides a built-in function and never consideres
> > current
> > >>>>>> catalog and database; inconsistent with other DDL but acceptable
> for
> > >>>>>> functions I guess.
> > >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> > >>>>>> -> creates/overrides a catalog function
> > >>>>>>
> > >>>>>> Regarding "Flink don't have any other built-in objects (tables,
> > views)
> > >>>>>> except functions", this might change in the near future. Take
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> > >>>>>>> Hi Fabian,
> > >>>>>>>
> > >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> > didn't
> > >>>>>>> include that as a voting option, and the discussion is mainly
> > between
> > >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> > >>>>>>>
> > >>>>>>> Re > However, it means that temp functions are differently
> treated
> > >>>> than
> > >>>>>>> other db objects.
> > >>>>>>> IMO, the treatment difference results from the fact that
> functions
> > >>>> are
> > >>>>> a
> > >>>>>>> bit different from other objects - Flink don't have any other
> > >>>> built-in
> > >>>>>>> objects (tables, views) except functions.
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Bowen
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>> --
> > >>>> Xuefu Zhang
> > >>>>
> > >>>> "In Honey We Trust!"
> > >>>>
> > >>
> > >
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi Aljoscha,

Thanks for the summary and these are great questions to be answered. The
answer to your first question is clear: there is a general agreement to
override built-in functions with temp functions.

However, your second and third questions are sort of related, as a function
reference can be either just function name (like "func") or in the form or
"cat.db.func". When a reference is just function name, it can mean either a
built-in function or a function defined in the current cat/db. If we
support overriding a built-in function with a temp function, such
overriding can also cover a function in the current cat/db.

I think what Timo referred as "overriding a catalog function" means a temp
function defined as "cat.db.func" overrides a catalog function "func" in
cat/db even if cat/db is not current. To support this, temp function has to
be tied to a cat/db. What's why I said above that the 2nd and 3rd questions
are related. The problem with such support is the ambiguity when user
defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
Here "func" can means a global temp function, or a temp function in current
cat/db. If we can assume the former, this creates an inconsistency because
"CREATE FUNCTION func" actually means a function in current cat/db. If we
assume the latter, then there is no way for user to create a global temp
function.

Giving a special namespace for built-in functions may solve the ambiguity
problem above, but it also introduces artificial catalog/database that
needs special treatment and pollutes the cleanness of  the code. I would
rather introduce a syntax in DDL to solve the problem, like "CREATE
[GLOBAL] TEMPORARY FUNCTION func".

Thus, I'd like to summarize a few candidate proposals for voting purposes:

1. Support only global, temporary functions without namespace. Such temp
functions overrides built-in functions and catalog functions in current
cat/db. The resolution order is: temp functions -> built-in functions ->
catalog functions. (Partially or fully qualified functions has no
ambiguity!)

2. In addition to #1, support creating and referencing temporary functions
associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
functions. The resolution order is: global temp functions -> built-in
functions -> temp functions in current cat/db -> catalog function.
(Resolution for partially or fully qualified function reference is: temp
functions -> persistent functions.)

3. In addition to #1, support creating and referencing temporary functions
associated with a cat/db with a special namespace for built-in functions
and global temp functions. The resolution is the same as #2, except that
the special namespace might be prefixed to a reference to a built-in
function or global temp function. (In absence of the special namespace, the
resolution order is the same as in #2.)

My personal preference is #1, given the unknown use case and introduced
complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
my votes are:

+1 for #1
+0 for #2
-1 for #3

Everyone, please cast your vote (in above format please!), or let me know
if you have more questions or other candidates.

Thanks,
Xuefu







On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
>
> I think this discussion and the one for FLIP-64 are very connected. To
> resolve the differences, think we have to think about the basic principles
> and find consensus there. The basic questions I see are:
>
>  - Do we want to support overriding builtin functions?
>  - Do we want to support overriding catalog functions?
>  - And then later: should temporary functions be tied to a
> catalog/database?
>
> I don’t have much to say about these, except that we should somewhat stick
> to what the industry does. But I also understand that the industry is
> already very divided on this.
>
> Best,
> Aljoscha
>
> > On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> >
> > Hi,
> >
> > +1 to strive for reaching consensus on the remaining topics. We are
> close to the truth. It will waste a lot of time if we resume the topic some
> time later.
> >
> > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way
> to override a catalog function.
> >
> > I’m not sure about “system.system.fun”, it introduces a nonexistent cat
> & db? And we still need to do special treatment for the dedicated
> system.system cat & db?
> >
> > Best,
> > Jark
> >
> >
> >> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> >>
> >> Hi everyone,
> >>
> >> @Xuefu: I would like to avoid adding too many things incrementally.
> Users should be able to override all catalog objects consistently according
> to FLIP-64 (Support for Temporary Objects in Table module). If functions
> are treated completely different, we need more code and special cases. From
> an implementation perspective, this topic only affects the lookup logic
> which is rather low implementation effort which is why I would like to
> clarify the remaining items. As you said, we have a slight consenus on
> overriding built-in functions; we should also strive for reaching consensus
> on the remaining topics.
> >>
> >> @Dawid: I like your idea as it ensures registering catalog objects
> consistent and the overriding of built-in functions more explicit.
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >> On 17.09.19 11:59, kai wang wrote:
> >>> hi, everyone
> >>> I think this flip is very meaningful. it supports functions that can be
> >>> shared by different catalogs and dbs, reducing the duplication of
> functions.
> >>>
> >>> Our group based on flink's sql parser module implements create function
> >>> feature, stores the parsed function metadata and schema into mysql, and
> >>> also customizes the catalog, customizes sql-client to support custom
> >>> schemas and functions. Loaded, but the function is currently global,
> and is
> >>> not subdivided according to catalog and db.
> >>>
> >>> In addition, I very much hope to participate in the development of this
> >>> flip, I have been paying attention to the community, but found it is
> more
> >>> difficult to join.
> >>> thank you.
> >>>
> >>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
> >>>
> >>>> Thanks to Tmo and Dawid for sharing thoughts.
> >>>>
> >>>> It seems to me that there is a general consensus on having temp
> functions
> >>>> that have no namespaces and overwrite built-in functions. (As a side
> note
> >>>> for comparability, the current user defined functions are all
> temporary and
> >>>> having no namespaces.)
> >>>>
> >>>> Nevertheless, I can also see the merit of having namespaced temp
> functions
> >>>> that can overwrite functions defined in a specific cat/db. However,
> this
> >>>> idea appears orthogonal to the former and can be added incrementally.
> >>>>
> >>>> How about we first implement non-namespaced temp functions now and
> leave
> >>>> the door open for namespaced ones for later releases as the
> requirement
> >>>> might become more crystal? This also helps shorten the debate and
> allow us
> >>>> to make some progress along this direction.
> >>>>
> >>>> As to Dawid's idea of having a dedicated cat/db to host the temporary
> temp
> >>>> functions that don't have namespaces, my only concern is the special
> >>>> treatment for a cat/db, which makes code less clean, as evident in
> treating
> >>>> the built-in catalog currently.
> >>>>
> >>>> Thanks,
> >>>> Xuefiu
> >>>>
> >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >>>> wysakowicz.dawid@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> Another idea to consider on top of Timo's suggestion. How about we
> have a
> >>>>> special namespace (catalog + database) for built-in objects? This
> catalog
> >>>>> would be invisible for users as Xuefu was suggesting.
> >>>>>
> >>>>> Then users could still override built-in functions, if they fully
> qualify
> >>>>> object with the built-in namespace, but by default the common logic
> of
> >>>>> current dB & cat would be used.
> >>>>>
> >>>>> CREATE TEMPORARY FUNCTION func ...
> >>>>> registers temporary function in current cat & dB
> >>>>>
> >>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
> >>>>> registers temporary function in cat db
> >>>>>
> >>>>> CREATE TEMPORARY FUNCTION system.system.func ...
> >>>>> Overrides built-in function with temporary function
> >>>>>
> >>>>> The built-in/system namespace would not be writable for permanent
> >>>> objects.
> >>>>> WDYT?
> >>>>>
> >>>>> This way I think we can have benefits of both solutions.
> >>>>>
> >>>>> Best,
> >>>>> Dawid
> >>>>>
> >>>>>
> >>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
> >>>>>
> >>>>>> Hi Bowen,
> >>>>>>
> >>>>>> I understand the potential benefit of overriding certain built-in
> >>>>>> functions. I'm open to such a feature if many people agree.
> However, it
> >>>>>> would be great to still support overriding catalog functions with
> >>>>>> temporary functions in order to prototype a query even though a
> >>>>>> catalog/database might not be available currently or should not be
> >>>>>> modified yet. How about we support both cases?
> >>>>>>
> >>>>>> CREATE TEMPORARY FUNCTION abs
> >>>>>> -> creates/overrides a built-in function and never consideres
> current
> >>>>>> catalog and database; inconsistent with other DDL but acceptable for
> >>>>>> functions I guess.
> >>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
> >>>>>> -> creates/overrides a catalog function
> >>>>>>
> >>>>>> Regarding "Flink don't have any other built-in objects (tables,
> views)
> >>>>>> except functions", this might change in the near future. Take
> >>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Timo
> >>>>>>
> >>>>>> On 14.09.19 01:40, Bowen Li wrote:
> >>>>>>> Hi Fabian,
> >>>>>>>
> >>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I
> didn't
> >>>>>>> include that as a voting option, and the discussion is mainly
> between
> >>>>>>> 1-part/override builtin and 3-part/not override builtin.
> >>>>>>>
> >>>>>>> Re > However, it means that temp functions are differently treated
> >>>> than
> >>>>>>> other db objects.
> >>>>>>> IMO, the treatment difference results from the fact that functions
> >>>> are
> >>>>> a
> >>>>>>> bit different from other objects - Flink don't have any other
> >>>> built-in
> >>>>>>> objects (tables, views) except functions.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Bowen
> >>>>>>>
> >>>>>>
> >>>>
> >>>> --
> >>>> Xuefu Zhang
> >>>>
> >>>> "In Honey We Trust!"
> >>>>
> >>
> >
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

I think this discussion and the one for FLIP-64 are very connected. To resolve the differences, think we have to think about the basic principles and find consensus there. The basic questions I see are:

 - Do we want to support overriding builtin functions?
 - Do we want to support overriding catalog functions?
 - And then later: should temporary functions be tied to a catalog/database?

I don’t have much to say about these, except that we should somewhat stick to what the industry does. But I also understand that the industry is already very divided on this.

Best,
Aljoscha

> On 18. Sep 2019, at 11:41, Jark Wu <im...@gmail.com> wrote:
> 
> Hi,
> 
> +1 to strive for reaching consensus on the remaining topics. We are close to the truth. It will waste a lot of time if we resume the topic some time later. 
> 
> +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to override a catalog function. 
> 
> I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? And we still need to do special treatment for the dedicated system.system cat & db? 
> 
> Best,
> Jark
> 
> 
>> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
>> 
>> Hi everyone,
>> 
>> @Xuefu: I would like to avoid adding too many things incrementally. Users should be able to override all catalog objects consistently according to FLIP-64 (Support for Temporary Objects in Table module). If functions are treated completely different, we need more code and special cases. From an implementation perspective, this topic only affects the lookup logic which is rather low implementation effort which is why I would like to clarify the remaining items. As you said, we have a slight consenus on overriding built-in functions; we should also strive for reaching consensus on the remaining topics.
>> 
>> @Dawid: I like your idea as it ensures registering catalog objects consistent and the overriding of built-in functions more explicit.
>> 
>> Thanks,
>> Timo
>> 
>> 
>> On 17.09.19 11:59, kai wang wrote:
>>> hi, everyone
>>> I think this flip is very meaningful. it supports functions that can be
>>> shared by different catalogs and dbs, reducing the duplication of functions.
>>> 
>>> Our group based on flink's sql parser module implements create function
>>> feature, stores the parsed function metadata and schema into mysql, and
>>> also customizes the catalog, customizes sql-client to support custom
>>> schemas and functions. Loaded, but the function is currently global, and is
>>> not subdivided according to catalog and db.
>>> 
>>> In addition, I very much hope to participate in the development of this
>>> flip, I have been paying attention to the community, but found it is more
>>> difficult to join.
>>> thank you.
>>> 
>>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>>> 
>>>> Thanks to Tmo and Dawid for sharing thoughts.
>>>> 
>>>> It seems to me that there is a general consensus on having temp functions
>>>> that have no namespaces and overwrite built-in functions. (As a side note
>>>> for comparability, the current user defined functions are all temporary and
>>>> having no namespaces.)
>>>> 
>>>> Nevertheless, I can also see the merit of having namespaced temp functions
>>>> that can overwrite functions defined in a specific cat/db. However,  this
>>>> idea appears orthogonal to the former and can be added incrementally.
>>>> 
>>>> How about we first implement non-namespaced temp functions now and leave
>>>> the door open for namespaced ones for later releases as the requirement
>>>> might become more crystal? This also helps shorten the debate and allow us
>>>> to make some progress along this direction.
>>>> 
>>>> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
>>>> functions that don't have namespaces, my only concern is the special
>>>> treatment for a cat/db, which makes code less clean, as evident in treating
>>>> the built-in catalog currently.
>>>> 
>>>> Thanks,
>>>> Xuefiu
>>>> 
>>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>>> wysakowicz.dawid@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> Another idea to consider on top of Timo's suggestion. How about we have a
>>>>> special namespace (catalog + database) for built-in objects? This catalog
>>>>> would be invisible for users as Xuefu was suggesting.
>>>>> 
>>>>> Then users could still override built-in functions, if they fully qualify
>>>>> object with the built-in namespace, but by default the common logic of
>>>>> current dB & cat would be used.
>>>>> 
>>>>> CREATE TEMPORARY FUNCTION func ...
>>>>> registers temporary function in current cat & dB
>>>>> 
>>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>>> registers temporary function in cat db
>>>>> 
>>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>>> Overrides built-in function with temporary function
>>>>> 
>>>>> The built-in/system namespace would not be writable for permanent
>>>> objects.
>>>>> WDYT?
>>>>> 
>>>>> This way I think we can have benefits of both solutions.
>>>>> 
>>>>> Best,
>>>>> Dawid
>>>>> 
>>>>> 
>>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
>>>>> 
>>>>>> Hi Bowen,
>>>>>> 
>>>>>> I understand the potential benefit of overriding certain built-in
>>>>>> functions. I'm open to such a feature if many people agree. However, it
>>>>>> would be great to still support overriding catalog functions with
>>>>>> temporary functions in order to prototype a query even though a
>>>>>> catalog/database might not be available currently or should not be
>>>>>> modified yet. How about we support both cases?
>>>>>> 
>>>>>> CREATE TEMPORARY FUNCTION abs
>>>>>> -> creates/overrides a built-in function and never consideres current
>>>>>> catalog and database; inconsistent with other DDL but acceptable for
>>>>>> functions I guess.
>>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>>>> -> creates/overrides a catalog function
>>>>>> 
>>>>>> Regarding "Flink don't have any other built-in objects (tables, views)
>>>>>> except functions", this might change in the near future. Take
>>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>>>>>> 
>>>>>> Thanks,
>>>>>> Timo
>>>>>> 
>>>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>>>> Hi Fabian,
>>>>>>> 
>>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I didn't
>>>>>>> include that as a voting option, and the discussion is mainly between
>>>>>>> 1-part/override builtin and 3-part/not override builtin.
>>>>>>> 
>>>>>>> Re > However, it means that temp functions are differently treated
>>>> than
>>>>>>> other db objects.
>>>>>>> IMO, the treatment difference results from the fact that functions
>>>> are
>>>>> a
>>>>>>> bit different from other objects - Flink don't have any other
>>>> built-in
>>>>>>> objects (tables, views) except functions.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Bowen
>>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>>> Xuefu Zhang
>>>> 
>>>> "In Honey We Trust!"
>>>> 
>> 
> 


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Jark Wu <im...@gmail.com>.
Hi,

+1 to strive for reaching consensus on the remaining topics. We are close to the truth. It will waste a lot of time if we resume the topic some time later. 

+1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to override a catalog function. 

I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? And we still need to do special treatment for the dedicated system.system cat & db? 

Best,
Jark


> 在 2019年9月18日,06:54,Timo Walther <tw...@apache.org> 写道:
> 
> Hi everyone,
> 
> @Xuefu: I would like to avoid adding too many things incrementally. Users should be able to override all catalog objects consistently according to FLIP-64 (Support for Temporary Objects in Table module). If functions are treated completely different, we need more code and special cases. From an implementation perspective, this topic only affects the lookup logic which is rather low implementation effort which is why I would like to clarify the remaining items. As you said, we have a slight consenus on overriding built-in functions; we should also strive for reaching consensus on the remaining topics.
> 
> @Dawid: I like your idea as it ensures registering catalog objects consistent and the overriding of built-in functions more explicit.
> 
> Thanks,
> Timo
> 
> 
> On 17.09.19 11:59, kai wang wrote:
>> hi, everyone
>> I think this flip is very meaningful. it supports functions that can be
>> shared by different catalogs and dbs, reducing the duplication of functions.
>> 
>> Our group based on flink's sql parser module implements create function
>> feature, stores the parsed function metadata and schema into mysql, and
>> also customizes the catalog, customizes sql-client to support custom
>> schemas and functions. Loaded, but the function is currently global, and is
>> not subdivided according to catalog and db.
>> 
>> In addition, I very much hope to participate in the development of this
>> flip, I have been paying attention to the community, but found it is more
>> difficult to join.
>>  thank you.
>> 
>> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>> 
>>> Thanks to Tmo and Dawid for sharing thoughts.
>>> 
>>> It seems to me that there is a general consensus on having temp functions
>>> that have no namespaces and overwrite built-in functions. (As a side note
>>> for comparability, the current user defined functions are all temporary and
>>> having no namespaces.)
>>> 
>>> Nevertheless, I can also see the merit of having namespaced temp functions
>>> that can overwrite functions defined in a specific cat/db. However,  this
>>> idea appears orthogonal to the former and can be added incrementally.
>>> 
>>> How about we first implement non-namespaced temp functions now and leave
>>> the door open for namespaced ones for later releases as the requirement
>>> might become more crystal? This also helps shorten the debate and allow us
>>> to make some progress along this direction.
>>> 
>>> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
>>> functions that don't have namespaces, my only concern is the special
>>> treatment for a cat/db, which makes code less clean, as evident in treating
>>> the built-in catalog currently.
>>> 
>>> Thanks,
>>> Xuefiu
>>> 
>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>> wysakowicz.dawid@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> Another idea to consider on top of Timo's suggestion. How about we have a
>>>> special namespace (catalog + database) for built-in objects? This catalog
>>>> would be invisible for users as Xuefu was suggesting.
>>>> 
>>>> Then users could still override built-in functions, if they fully qualify
>>>> object with the built-in namespace, but by default the common logic of
>>>> current dB & cat would be used.
>>>> 
>>>> CREATE TEMPORARY FUNCTION func ...
>>>> registers temporary function in current cat & dB
>>>> 
>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>> registers temporary function in cat db
>>>> 
>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>> Overrides built-in function with temporary function
>>>> 
>>>> The built-in/system namespace would not be writable for permanent
>>> objects.
>>>> WDYT?
>>>> 
>>>> This way I think we can have benefits of both solutions.
>>>> 
>>>> Best,
>>>> Dawid
>>>> 
>>>> 
>>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
>>>> 
>>>>> Hi Bowen,
>>>>> 
>>>>> I understand the potential benefit of overriding certain built-in
>>>>> functions. I'm open to such a feature if many people agree. However, it
>>>>> would be great to still support overriding catalog functions with
>>>>> temporary functions in order to prototype a query even though a
>>>>> catalog/database might not be available currently or should not be
>>>>> modified yet. How about we support both cases?
>>>>> 
>>>>> CREATE TEMPORARY FUNCTION abs
>>>>> -> creates/overrides a built-in function and never consideres current
>>>>> catalog and database; inconsistent with other DDL but acceptable for
>>>>> functions I guess.
>>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>>> -> creates/overrides a catalog function
>>>>> 
>>>>> Regarding "Flink don't have any other built-in objects (tables, views)
>>>>> except functions", this might change in the near future. Take
>>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>>>>> 
>>>>> Thanks,
>>>>> Timo
>>>>> 
>>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>>> Hi Fabian,
>>>>>> 
>>>>>> Yes, I agree 1-part/no-override is the least favorable thus I didn't
>>>>>> include that as a voting option, and the discussion is mainly between
>>>>>> 1-part/override builtin and 3-part/not override builtin.
>>>>>> 
>>>>>> Re > However, it means that temp functions are differently treated
>>> than
>>>>>> other db objects.
>>>>>> IMO, the treatment difference results from the fact that functions
>>> are
>>>> a
>>>>>> bit different from other objects - Flink don't have any other
>>> built-in
>>>>>> objects (tables, views) except functions.
>>>>>> 
>>>>>> Cheers,
>>>>>> Bowen
>>>>>> 
>>>>> 
>>> 
>>> --
>>> Xuefu Zhang
>>> 
>>> "In Honey We Trust!"
>>> 
> 


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi everyone,

@Xuefu: I would like to avoid adding too many things incrementally. 
Users should be able to override all catalog objects consistently 
according to FLIP-64 (Support for Temporary Objects in Table module). If 
functions are treated completely different, we need more code and 
special cases. From an implementation perspective, this topic only 
affects the lookup logic which is rather low implementation effort which 
is why I would like to clarify the remaining items. As you said, we have 
a slight consenus on overriding built-in functions; we should also 
strive for reaching consensus on the remaining topics.

@Dawid: I like your idea as it ensures registering catalog objects 
consistent and the overriding of built-in functions more explicit.

Thanks,
Timo


On 17.09.19 11:59, kai wang wrote:
> hi, everyone
> I think this flip is very meaningful. it supports functions that can be
> shared by different catalogs and dbs, reducing the duplication of functions.
>
> Our group based on flink's sql parser module implements create function
> feature, stores the parsed function metadata and schema into mysql, and
> also customizes the catalog, customizes sql-client to support custom
> schemas and functions. Loaded, but the function is currently global, and is
> not subdivided according to catalog and db.
>
> In addition, I very much hope to participate in the development of this
> flip, I have been paying attention to the community, but found it is more
> difficult to join.
>   thank you.
>
> Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:
>
>> Thanks to Tmo and Dawid for sharing thoughts.
>>
>> It seems to me that there is a general consensus on having temp functions
>> that have no namespaces and overwrite built-in functions. (As a side note
>> for comparability, the current user defined functions are all temporary and
>> having no namespaces.)
>>
>> Nevertheless, I can also see the merit of having namespaced temp functions
>> that can overwrite functions defined in a specific cat/db. However,  this
>> idea appears orthogonal to the former and can be added incrementally.
>>
>> How about we first implement non-namespaced temp functions now and leave
>> the door open for namespaced ones for later releases as the requirement
>> might become more crystal? This also helps shorten the debate and allow us
>> to make some progress along this direction.
>>
>> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
>> functions that don't have namespaces, my only concern is the special
>> treatment for a cat/db, which makes code less clean, as evident in treating
>> the built-in catalog currently.
>>
>> Thanks,
>> Xuefiu
>>
>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>> wysakowicz.dawid@gmail.com>
>> wrote:
>>
>>> Hi,
>>> Another idea to consider on top of Timo's suggestion. How about we have a
>>> special namespace (catalog + database) for built-in objects? This catalog
>>> would be invisible for users as Xuefu was suggesting.
>>>
>>> Then users could still override built-in functions, if they fully qualify
>>> object with the built-in namespace, but by default the common logic of
>>> current dB & cat would be used.
>>>
>>> CREATE TEMPORARY FUNCTION func ...
>>> registers temporary function in current cat & dB
>>>
>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>> registers temporary function in cat db
>>>
>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>> Overrides built-in function with temporary function
>>>
>>> The built-in/system namespace would not be writable for permanent
>> objects.
>>> WDYT?
>>>
>>> This way I think we can have benefits of both solutions.
>>>
>>> Best,
>>> Dawid
>>>
>>>
>>> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
>>>
>>>> Hi Bowen,
>>>>
>>>> I understand the potential benefit of overriding certain built-in
>>>> functions. I'm open to such a feature if many people agree. However, it
>>>> would be great to still support overriding catalog functions with
>>>> temporary functions in order to prototype a query even though a
>>>> catalog/database might not be available currently or should not be
>>>> modified yet. How about we support both cases?
>>>>
>>>> CREATE TEMPORARY FUNCTION abs
>>>> -> creates/overrides a built-in function and never consideres current
>>>> catalog and database; inconsistent with other DDL but acceptable for
>>>> functions I guess.
>>>> CREATE TEMPORARY FUNCTION cat.db.fun
>>>> -> creates/overrides a catalog function
>>>>
>>>> Regarding "Flink don't have any other built-in objects (tables, views)
>>>> except functions", this might change in the near future. Take
>>>> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>> On 14.09.19 01:40, Bowen Li wrote:
>>>>> Hi Fabian,
>>>>>
>>>>> Yes, I agree 1-part/no-override is the least favorable thus I didn't
>>>>> include that as a voting option, and the discussion is mainly between
>>>>> 1-part/override builtin and 3-part/not override builtin.
>>>>>
>>>>> Re > However, it means that temp functions are differently treated
>> than
>>>>> other db objects.
>>>>> IMO, the treatment difference results from the fact that functions
>> are
>>> a
>>>>> bit different from other objects - Flink don't have any other
>> built-in
>>>>> objects (tables, views) except functions.
>>>>>
>>>>> Cheers,
>>>>> Bowen
>>>>>
>>>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by kai wang <yi...@gmail.com>.
hi, everyone
I think this flip is very meaningful. it supports functions that can be
shared by different catalogs and dbs, reducing the duplication of functions.

Our group based on flink's sql parser module implements create function
feature, stores the parsed function metadata and schema into mysql, and
also customizes the catalog, customizes sql-client to support custom
schemas and functions. Loaded, but the function is currently global, and is
not subdivided according to catalog and db.

In addition, I very much hope to participate in the development of this
flip, I have been paying attention to the community, but found it is more
difficult to join.
 thank you.

Xuefu Z <us...@gmail.com> 于2019年9月17日周二 上午11:19写道:

> Thanks to Tmo and Dawid for sharing thoughts.
>
> It seems to me that there is a general consensus on having temp functions
> that have no namespaces and overwrite built-in functions. (As a side note
> for comparability, the current user defined functions are all temporary and
> having no namespaces.)
>
> Nevertheless, I can also see the merit of having namespaced temp functions
> that can overwrite functions defined in a specific cat/db. However,  this
> idea appears orthogonal to the former and can be added incrementally.
>
> How about we first implement non-namespaced temp functions now and leave
> the door open for namespaced ones for later releases as the requirement
> might become more crystal? This also helps shorten the debate and allow us
> to make some progress along this direction.
>
> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
> functions that don't have namespaces, my only concern is the special
> treatment for a cat/db, which makes code less clean, as evident in treating
> the built-in catalog currently.
>
> Thanks,
> Xuefiu
>
> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> wysakowicz.dawid@gmail.com>
> wrote:
>
> > Hi,
> > Another idea to consider on top of Timo's suggestion. How about we have a
> > special namespace (catalog + database) for built-in objects? This catalog
> > would be invisible for users as Xuefu was suggesting.
> >
> > Then users could still override built-in functions, if they fully qualify
> > object with the built-in namespace, but by default the common logic of
> > current dB & cat would be used.
> >
> > CREATE TEMPORARY FUNCTION func ...
> > registers temporary function in current cat & dB
> >
> > CREATE TEMPORARY FUNCTION cat.db.func ...
> > registers temporary function in cat db
> >
> > CREATE TEMPORARY FUNCTION system.system.func ...
> > Overrides built-in function with temporary function
> >
> > The built-in/system namespace would not be writable for permanent
> objects.
> > WDYT?
> >
> > This way I think we can have benefits of both solutions.
> >
> > Best,
> > Dawid
> >
> >
> > On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
> >
> > > Hi Bowen,
> > >
> > > I understand the potential benefit of overriding certain built-in
> > > functions. I'm open to such a feature if many people agree. However, it
> > > would be great to still support overriding catalog functions with
> > > temporary functions in order to prototype a query even though a
> > > catalog/database might not be available currently or should not be
> > > modified yet. How about we support both cases?
> > >
> > > CREATE TEMPORARY FUNCTION abs
> > > -> creates/overrides a built-in function and never consideres current
> > > catalog and database; inconsistent with other DDL but acceptable for
> > > functions I guess.
> > > CREATE TEMPORARY FUNCTION cat.db.fun
> > > -> creates/overrides a catalog function
> > >
> > > Regarding "Flink don't have any other built-in objects (tables, views)
> > > except functions", this might change in the near future. Take
> > > https://issues.apache.org/jira/browse/FLINK-13900 as an example.
> > >
> > > Thanks,
> > > Timo
> > >
> > > On 14.09.19 01:40, Bowen Li wrote:
> > > > Hi Fabian,
> > > >
> > > > Yes, I agree 1-part/no-override is the least favorable thus I didn't
> > > > include that as a voting option, and the discussion is mainly between
> > > > 1-part/override builtin and 3-part/not override builtin.
> > > >
> > > > Re > However, it means that temp functions are differently treated
> than
> > > > other db objects.
> > > > IMO, the treatment difference results from the fact that functions
> are
> > a
> > > > bit different from other objects - Flink don't have any other
> built-in
> > > > objects (tables, views) except functions.
> > > >
> > > > Cheers,
> > > > Bowen
> > > >
> > >
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Thanks to Tmo and Dawid for sharing thoughts.

It seems to me that there is a general consensus on having temp functions
that have no namespaces and overwrite built-in functions. (As a side note
for comparability, the current user defined functions are all temporary and
having no namespaces.)

Nevertheless, I can also see the merit of having namespaced temp functions
that can overwrite functions defined in a specific cat/db. However,  this
idea appears orthogonal to the former and can be added incrementally.

How about we first implement non-namespaced temp functions now and leave
the door open for namespaced ones for later releases as the requirement
might become more crystal? This also helps shorten the debate and allow us
to make some progress along this direction.

As to Dawid's idea of having a dedicated cat/db to host the temporary temp
functions that don't have namespaces, my only concern is the special
treatment for a cat/db, which makes code less clean, as evident in treating
the built-in catalog currently.

Thanks,
Xuefiu

On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <wy...@gmail.com>
wrote:

> Hi,
> Another idea to consider on top of Timo's suggestion. How about we have a
> special namespace (catalog + database) for built-in objects? This catalog
> would be invisible for users as Xuefu was suggesting.
>
> Then users could still override built-in functions, if they fully qualify
> object with the built-in namespace, but by default the common logic of
> current dB & cat would be used.
>
> CREATE TEMPORARY FUNCTION func ...
> registers temporary function in current cat & dB
>
> CREATE TEMPORARY FUNCTION cat.db.func ...
> registers temporary function in cat db
>
> CREATE TEMPORARY FUNCTION system.system.func ...
> Overrides built-in function with temporary function
>
> The built-in/system namespace would not be writable for permanent objects.
> WDYT?
>
> This way I think we can have benefits of both solutions.
>
> Best,
> Dawid
>
>
> On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:
>
> > Hi Bowen,
> >
> > I understand the potential benefit of overriding certain built-in
> > functions. I'm open to such a feature if many people agree. However, it
> > would be great to still support overriding catalog functions with
> > temporary functions in order to prototype a query even though a
> > catalog/database might not be available currently or should not be
> > modified yet. How about we support both cases?
> >
> > CREATE TEMPORARY FUNCTION abs
> > -> creates/overrides a built-in function and never consideres current
> > catalog and database; inconsistent with other DDL but acceptable for
> > functions I guess.
> > CREATE TEMPORARY FUNCTION cat.db.fun
> > -> creates/overrides a catalog function
> >
> > Regarding "Flink don't have any other built-in objects (tables, views)
> > except functions", this might change in the near future. Take
> > https://issues.apache.org/jira/browse/FLINK-13900 as an example.
> >
> > Thanks,
> > Timo
> >
> > On 14.09.19 01:40, Bowen Li wrote:
> > > Hi Fabian,
> > >
> > > Yes, I agree 1-part/no-override is the least favorable thus I didn't
> > > include that as a voting option, and the discussion is mainly between
> > > 1-part/override builtin and 3-part/not override builtin.
> > >
> > > Re > However, it means that temp functions are differently treated than
> > > other db objects.
> > > IMO, the treatment difference results from the fact that functions are
> a
> > > bit different from other objects - Flink don't have any other built-in
> > > objects (tables, views) except functions.
> > >
> > > Cheers,
> > > Bowen
> > >
> >
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <wy...@gmail.com>.
Hi,
Another idea to consider on top of Timo's suggestion. How about we have a
special namespace (catalog + database) for built-in objects? This catalog
would be invisible for users as Xuefu was suggesting.

Then users could still override built-in functions, if they fully qualify
object with the built-in namespace, but by default the common logic of
current dB & cat would be used.

CREATE TEMPORARY FUNCTION func ...
registers temporary function in current cat & dB

CREATE TEMPORARY FUNCTION cat.db.func ...
registers temporary function in cat db

CREATE TEMPORARY FUNCTION system.system.func ...
Overrides built-in function with temporary function

The built-in/system namespace would not be writable for permanent objects.
WDYT?

This way I think we can have benefits of both solutions.

Best,
Dawid


On Tue, 17 Sep 2019, 07:24 Timo Walther, <tw...@apache.org> wrote:

> Hi Bowen,
>
> I understand the potential benefit of overriding certain built-in
> functions. I'm open to such a feature if many people agree. However, it
> would be great to still support overriding catalog functions with
> temporary functions in order to prototype a query even though a
> catalog/database might not be available currently or should not be
> modified yet. How about we support both cases?
>
> CREATE TEMPORARY FUNCTION abs
> -> creates/overrides a built-in function and never consideres current
> catalog and database; inconsistent with other DDL but acceptable for
> functions I guess.
> CREATE TEMPORARY FUNCTION cat.db.fun
> -> creates/overrides a catalog function
>
> Regarding "Flink don't have any other built-in objects (tables, views)
> except functions", this might change in the near future. Take
> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>
> Thanks,
> Timo
>
> On 14.09.19 01:40, Bowen Li wrote:
> > Hi Fabian,
> >
> > Yes, I agree 1-part/no-override is the least favorable thus I didn't
> > include that as a voting option, and the discussion is mainly between
> > 1-part/override builtin and 3-part/not override builtin.
> >
> > Re > However, it means that temp functions are differently treated than
> > other db objects.
> > IMO, the treatment difference results from the fact that functions are a
> > bit different from other objects - Flink don't have any other built-in
> > objects (tables, views) except functions.
> >
> > Cheers,
> > Bowen
> >
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi Bowen,

I understand the potential benefit of overriding certain built-in 
functions. I'm open to such a feature if many people agree. However, it 
would be great to still support overriding catalog functions with 
temporary functions in order to prototype a query even though a 
catalog/database might not be available currently or should not be 
modified yet. How about we support both cases?

CREATE TEMPORARY FUNCTION abs
-> creates/overrides a built-in function and never consideres current 
catalog and database; inconsistent with other DDL but acceptable for 
functions I guess.
CREATE TEMPORARY FUNCTION cat.db.fun
-> creates/overrides a catalog function

Regarding "Flink don't have any other built-in objects (tables, views) 
except functions", this might change in the near future. Take 
https://issues.apache.org/jira/browse/FLINK-13900 as an example.

Thanks,
Timo

On 14.09.19 01:40, Bowen Li wrote:
> Hi Fabian,
>
> Yes, I agree 1-part/no-override is the least favorable thus I didn't
> include that as a voting option, and the discussion is mainly between
> 1-part/override builtin and 3-part/not override builtin.
>
> Re > However, it means that temp functions are differently treated than
> other db objects.
> IMO, the treatment difference results from the fact that functions are a
> bit different from other objects - Flink don't have any other built-in
> objects (tables, views) except functions.
>
> Cheers,
> Bowen
>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi Fabian,

Yes, I agree 1-part/no-override is the least favorable thus I didn't
include that as a voting option, and the discussion is mainly between
1-part/override builtin and 3-part/not override builtin.

Re > However, it means that temp functions are differently treated than
other db objects.
IMO, the treatment difference results from the fact that functions are a
bit different from other objects - Flink don't have any other built-in
objects (tables, views) except functions.

Cheers,
Bowen

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
Hi all,

Thanks Dawid for the additional explanation!

As others summarized there are two questions:

1) Are temporal functions a) top-level functions (1-part address) and not
associated with a catalog/db or b) do we threat them like any other
database object with a 3-part address.
2) If we treat them as top-level functions, do we a) allow overriding
built-in functions (by giving preference over built-in functions) or b) not.

From that, we can have three combinations:

1-part/override: It has the (IMO) benefit of allowing to change the default
behavior of Flink SQL by overriding built-in functions. However, it means
that temp functions are differently treated than other db objects.
1-part/no-override: Unambiguous function resolution, special treatment for
temp functions
3-part: No special treatment of temp functions, but also no way to change
the semantics of a built-in function (unambiguous semantics)

In fact, I don't have strong preference for any of the choices but
1-part/no-override would be my least favorite (no gains but special
treatment of temp functions).
All have pros and cons and can be justified.

Cheers, Fabian




Am Mi., 11. Sept. 2019 um 20:52 Uhr schrieb Bowen Li <bo...@gmail.com>:

> Hi,
>
> Thanks @Fabian @Dawid and everyone else for sharing your thoughts!
>
> First, I'd like to take Hive built-in functions out of this FLIP to keep
> our original scope and make it less controversial on a potential modular
> approach. I will remove Hive built-in functions from the google doc.
>
> Then the focus of debate is mainly function resolution order and temp
> function namespace, which are somewhat related. I roughly summarized this
> thread, and currently we are debating on two approaches with preference
> from the following people:
>
> Option 1:
>     Proposal: temp functions will be of 1-part path (function name only),
> and can override built-in functions. The ambiguous function resolution
> order is thus 1) temp functions 2) built-in functions 3) catalog functions
> in the current catalog/database
>     Votes: Xuefu, Bowen, Fabian, Jark
>
> Option 2:
>     Proposal: temp functions will be of 3-part path (with catalog,
> database, and function name), and temp functions cannot override built-in
> functions. The ambiguous function resolution order is thus 1) built-in
> functions 2) temp functions (in 3-part path) 3) catalog functions in the
> current catalog/database
>     Votes:  Dawid, Timo
>
>
> Do you think we need a separate voting thread on the two options in the
> community, or are we able to conclude from the above summary?
>
>
>
> On Wed, Sep 11, 2019 at 8:09 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > Hi Fabian,
> > Thank you for your response.
> > Regarding the temporary function, just wanted to clarify one thing: the
> > 3-part identifier does not mean the user always has to provide the
> catalog
> > & database explicitly. The same way user does not have to provide them in
> > e.g. when creating permanent table, view etc. It means though functions
> are
> > always stored within a database. The same way as all the permanent
> objects
> > and other temporary objects(tables, views). If not given explicitly the
> > current catalog & database would be used, both in the create statement or
> > when using the function.
> >
> > Point taken though your preference would be to support overriding
> built-in
> > functions.
> >
> > Best,
> > Dawid
> >
> > On Wed, 11 Sep 2019, 21:14 Fabian Hueske, <fh...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to add my opinion on this topic as well ;-)
> > >
> > > In general, I think overriding built-in function with temp functions
> has
> > a
> > > couple of benefits but also a few challenges:
> > >
> > > * Users can reimplement the behavior of a built-in functions of a
> > different
> > > system, e.g., for backward compatibility after a migration.
> > > * I don't think that "accidental" overrides and surprising semantics
> are
> > an
> > > issue or dangerous. The user registered the temp function in the same
> > > session and should therefore be aware of the changed semantics.
> > > * I see that not all built-in functions can be overridden, like the
> CAST
> > > example that Dawid gave. However, I think these should be a small
> > fraction
> > > and such functions could be blacklisted. Sure, that's not super
> > consistent,
> > > but should (IMO) not be a big issue in practice.
> > > * Temp functions should be easy to use. Requiring a 3-part addressing
> > makes
> > > them a lot less user friendly, IMO. Users need to think about what
> > catalog
> > > and db to choose when registering them. Also using a temp function in a
> > > query becomes less convenient. Moreover, I agree with Bowen's concerns
> > that
> > > a 3-part addressing scheme reduces the temporal appearance of the
> > function.
> > >
> > > From the three possible solutions, my preference order is
> > > 1) 1-part address with override of built-in
> > > 2) 1-part address without override of built-in
> > > 3) 3-part address
> > >
> > > Regarding the issue of external built-in functions, I don't think that
> > > Timo's proposal of modules is fully orthogonal to this discussion.
> > > A Hive function module could be an alternative to offering Hive
> functions
> > > as part of Hive's catalog.
> > > From a user's point of view, I think that modules would be a "cleaner"
> > > integration ("Why do I need a Hive catalog if all I want to do is
> apply a
> > > Hive function on a Kafka table?").
> > > However, the module approach clearly has the problem of dealing with
> > > same-named functions in different modules (e.g., a Hive function and a
> > > Flink built-in function).
> > > The catalog approach as the benefit that functions can be addressed
> like
> > > hiveCat::func (or a similar path).
> > >
> > > I'm not sure what's the best solution here.
> > >
> > > Cheers,
> > > Fabian
> > >
> > >
> > > Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li <
> bowenli86@gmail.com
> > >:
> > >
> > > > Hi,
> > > >
> > > > W.r.t temp functions, I feel both options have their benefits and can
> > > > theoretically achieve similar functionalities one way or another. In
> > the
> > > > end, it's more about use cases, users habits, and trade-offs.
> > > >
> > > > Re> Not always users are in full control of the catalog functions.
> > There
> > > is
> > > > also the case where different teams manage the catalog & use the
> > catalog.
> > > >
> > > > Temp functions live within a session, and not within a catalog.
> Having
> > > > 3-part paths may implies temp functions are tied to a catalog in two
> > > > aspects.
> > > > 1) it may indicate each catalog manages their temp functions, which
> is
> > > not
> > > > true as we seem all agree they should reside at a central place,
> either
> > > in
> > > > FunctionCatalog or CatalogManager
> > > > 2) it may indicate there's some access control. When users are
> > forbidden
> > > to
> > > > manipulate some objects in the catalog that's managed by other teams,
> > but
> > > > are allowed to manipulate some other objects (temp functions in this
> > > case)
> > > > belonging to the catalog in namespaces, users may think we introduced
> > > extra
> > > > complexity and confusion with some kind of access control into the
> > > problem.
> > > > It doesn't feel intuitive enough for end users.
> > > >
> > > > Thus, I'd be in favor of 1-part path for temporary functions, and
> other
> > > > temp objects.
> > > >
> > > > Thanks,
> > > > Bowen
> > > >
> > > >
> > > >
> > > > On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz <
> > dwysakowicz@apache.org>
> > > > wrote:
> > > >
> > > > > I agree the consequences of the decision are substantial. Let's see
> > > what
> > > > > others think.
> > > > >
> > > > > -- Catalog functions are defined by users, and we suppose they can
> > > > > drop/alter it in any way they want. Thus, overwriting a catalog
> > > function
> > > > > doesn't seem to be a strong use case that we should be concerned
> > about.
> > > > > Rather, there are known use case for overwriting built-in
> functions.
> > > > >
> > > > > Not always users are in full control of the catalog functions.
> There
> > is
> > > > > also the case where different teams manage the catalog & use the
> > > catalog.
> > > > > As for overriding built-in functions with 3-part approach user can
> > > always
> > > > > use an equally named function from a catalog. E.g. to override
> > > > >
> > > > > *    SELECT explode(arr) FROM ...*
> > > > >
> > > > > user can always write:
> > > > >
> > > > > *    SELECT db.explode(arr) FROM ...*
> > > > >
> > > > > Best,
> > > > >
> > > > > Dawid
> > > > > On 06/09/2019 10:54, Xuefu Z wrote:
> > > > >
> > > > > Hi Dawid,
> > > > >
> > > > > Thank you for your summary. While the only difference in the two
> > > > proposals
> > > > > is one- or three-part in naming, the consequence would be
> > substantial.
> > > > >
> > > > > To me, there are two major use cases of temporary functions
> compared
> > to
> > > > > persistent ones:
> > > > > 1. Temporary in nature and auto managed by the session. More often
> > than
> > > > > not, admin doesn't even allow user to create persistent functions.
> > > > > 2. Provide an opportunity to overwriting system built-in functions.
> > > > >
> > > > > Since built-in functions has one-part name, requiring three-part
> name
> > > for
> > > > > temporary functions eliminates the overwriting opportunity.
> > > > >
> > > > > One-part naming essentially puts all temp functions under a single
> > > > > namespace and simplifies function resolution, such as we don't need
> > to
> > > > > consider the case of a temp function and a persistent function with
> > the
> > > > > same name under the same database.
> > > > >
> > > > > I agree having three-parts does have its merits, such as
> consistency
> > > with
> > > > > other temporary objects (table) and minor difference between temp
> vs
> > > > > catalog functions. However, there is a slight difference between
> > tables
> > > > and
> > > > > function in that there is no built-in table in SQL so there is no
> > need
> > > to
> > > > > overwrite it.
> > > > >
> > > > > I'm not sure if I fully agree the benefits you listed as the
> > advantages
> > > > of
> > > > > the three-part naming of temp functions.
> > > > >   -- Allowing overwriting built-in functions is a benefit and the
> > > > solution
> > > > > for disallowing certain overwriting shouldn't be totally banning
> it.
> > > > >   -- Catalog functions are defined by users, and we suppose they
> can
> > > > > drop/alter it in any way they want. Thus, overwriting a catalog
> > > function
> > > > > doesn't seem to be a strong use case that we should be concerned
> > about.
> > > > > Rather, there are known use case for overwriting built-in
> functions.
> > > > >
> > > > > Thus, personally I would prefer one-part name for temporary
> > functions.
> > > In
> > > > > lack of SQL standard on this, I certainly like to get opinions from
> > > > others
> > > > > to see if a consensus can be eventually reached.
> > > > >
> > > > > (To your point on modular approach to support external built-in
> > > > functions,
> > > > > we saw the value and are actively looking into it. Thanks for
> sharing
> > > > your
> > > > > opinion on that.)
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > > On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org>
> > > > > wrote:
> > > > >
> > > > >
> > > > > Hi Xuefu,
> > > > >
> > > > > Thank you for your answers.
> > > > >
> > > > > Let me summarize my understanding. In principle we differ only in
> > > regards
> > > > > to the fact if a temporary function can be only 1-part or only
> 3-part
> > > > > identified. I can reconfirm that if the community decides it
> prefers
> > > the
> > > > > 1-part approach I will commit to that, with the assumption that we
> > will
> > > > > force ONLY 1-part function names. (We will parse identifier and
> throw
> > > > > exception if a user tries to register e.g. db.temp_func).
> > > > >
> > > > > My preference is though the 3-part approach:
> > > > >
> > > > >    - there are some functions that it makes no sense to override,
> > e.g.
> > > > >    CAST, moreover I'm afraid that allowing overriding such will
> lead
> > to
> > > > high
> > > > >    inconsistency, similar to those that I mentioned spark has
> > > > >    - you cannot shadow a fully-qualified function. (If a user fully
> > > > >    qualifies his/her objects in a SQL query, which is often
> > considered
> > > a
> > > > good
> > > > >    practice)
> > > > >    - it does not differentiate between functions & temporary
> > functions.
> > > > >    Temporary functions just differ with regards to their
> life-cycle.
> > > The
> > > > >    registration & usage is exactly the same.
> > > > >
> > > > > As it can be seen, the proposed concept regarding temp function and
> > > > > function resolution is quite simple.
> > > > >
> > > > > Both approaches are equally simple. I would even say the 3-part
> > > approach
> > > > > is slightly simpler as it does not have to care about some special
> > > > built-in
> > > > > functions such as CAST.
> > > > >
> > > > > I don't want to express my opinion on the differentiation between
> > > > built-in
> > > > > functions and "external" built-in functions in this thread as it is
> > > > rather
> > > > > orthogonal, but I also like the modular approach and I definitely
> > don't
> > > > > like the special syntax "cat::function". I think it's better to
> stick
> > > to
> > > > a
> > > > > standard or at least other proved solutions from other systems.
> > > > >
> > > > > Best,
> > > > >
> > > > > Dawid
> > > > > On 05/09/2019 10:12, Xuefu Z wrote:
> > > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for sharing your thoughts and  request for clarifications. I
> > > > believe
> > > > > that I fully understood your proposal, which does has its merit.
> > > However,
> > > > > it's different from ours. Here are the answers to your questions:
> > > > >
> > > > > Re #1: yes, the temp functions in the proposal are global and have
> > just
> > > > > one-part names, similar to built-in functions. Two- or three-part
> > names
> > > > are
> > > > > not allowed.
> > > > >
> > > > > Re #2: not applicable as two- or three-part names are disallowed.
> > > > >
> > > > > Re #3: same as above. Referencing external built-in functions is
> > > achieved
> > > > > either implicitly (only the built-in functions in the current
> > catalogs
> > > > are
> > > > > considered) or via special syntax such as cat::function. However,
> we
> > > are
> > > > > looking into the modular approach that Time suggested with other
> > > feedback
> > > > > received from the community.
> > > > >
> > > > > Re #4: the resolution order goes like the following in our
> proposal:
> > > > >
> > > > > 1. temporary functions
> > > > > 2. bulit-in functions (including those augmented by add-on modules)
> > > > > 3. built-in functions in current catalog (this will not be needed
> if
> > > the
> > > > > special syntax "cat::function" is required)
> > > > > 4. functions in current catalog and db.
> > > > >
> > > > > If we go with the modular approach and make external built-in
> > functions
> > > > as
> > > > > an add-on module, the 2 and 3 above will be combined. In essence,
> the
> > > > > resolution order is equivalent in the two approaches.
> > > > >
> > > > > By the way, resolution order matters only for simple name
> reference.
> > > For
> > > > > names such as db.function (interpreted as current_cat/db/function)
> or
> > > > > cat.db.function, the reference is unambiguous, so on resolution is
> > > > needed.
> > > > >
> > > > > As it can be seen, the proposed concept regarding temp function and
> > > > > function resolution is quite simple. Additionally, the proposed
> > > > resolution
> > > > > order allows temp function to shadow a built-in function, which is
> > > > > important (though not decisive) in our opinion.
> > > > >
> > > > > I started liking the modular approach as the resolution order will
> > only
> > > > > include 1, 2, and 4, which is simpler and more generic. That's why
> I
> > > > > suggested we look more into this direction.
> > > > >
> > > > > Please let me know if there are further questions.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org>
> > > > > wrote:
> > > > >
> > > > >
> > > > > Hi Xuefu,
> > > > >
> > > > > Just wanted to summarize my opinion on the one topic (temporary
> > > > functions).
> > > > >
> > > > > My preference would be to make temporary functions always 3-part
> > > > qualified
> > > > > (as a result that would prohibit overriding built-in functions).
> > Having
> > > > > said that if the community decides that it's better to allow
> > overriding
> > > > > built-in functions I am fine with it and can commit to that
> decision.
> > > > >
> > > > > I wanted to ask if you could clarify a few points for me around
> that
> > > > > option.
> > > > >
> > > > >    1. Would you enforce temporary functions to be always just a
> > single
> > > > >    name (without db & cat) as hive does, or would you allow also 3
> or
> > > > even 2
> > > > >    part identifiers?
> > > > >    2. Assuming 2/3-part paths. How would you register a function
> > from a
> > > > >    following statement: CREATE TEMPORARY FUNCTION db.func? Would
> that
> > > > shadow
> > > > >    all functions named 'func' in all databases named 'db' in all
> > > > catalogs? Or
> > > > >    would you shadow only function 'func' in database 'db' in
> current
> > > > catalog?
> > > > >    3. This point is still under discussion, but was mentioned a few
> > > > >    times, that maybe we want to enable syntax cat.func for
> "external
> > > > built-in
> > > > >    functions". How would that affect statement from previous point?
> > > Would
> > > > >    'db.func' shadow "external built-in function" in 'db' catalog or
> > > user
> > > > >    functions as in point 2? Or maybe both?
> > > > >    4. Lastly in fact to summarize the previous points. Assuming
> > > 2/3-part
> > > > >    paths. Would the function resolution be actually as follows?:
> > > > >       1. temporary functions (1-part path)
> > > > >       2. built-in functions
> > > > >       3. temporary functions (2-part path)
> > > > >       4. 2-part catalog functions a.k.a. "external built-in
> > functions"
> > > > >       (cat + func) - this is still under discussion, if we want
> that
> > in
> > > > the other
> > > > >       focal point
> > > > >       5. temporary functions (3-part path)
> > > > >       6. 3-part catalog functions a.k.a. user functions
> > > > >
> > > > > I would be really grateful if you could explain me those questions,
> > > > thanks.
> > > > >
> > > > > BTW, Thank you all for a healthy discussion.
> > > > >
> > > > > Best,
> > > > >
> > > > > Dawid
> > > > > On 04/09/2019 23:25, Xuefu Z wrote:
> > > > >
> > > > > Thank all for the sharing thoughts. I think we have gathered some
> > > useful
> > > > > initial feedback from this long discussion with a couple of focal
> > > points
> > > > > sticking out.
> > > > >
> > > > >  We will go back to do more research and adapt our proposal. Once
> > it's
> > > > > ready, we will ask for a new round of review. If there is any
> > > > disagreement,
> > > > > we will start a new discussion thread on each rather than having a
> > mega
> > > > > discussion like this.
> > > > >
> > > > > Thanks to everyone for participating.
> > > > >
> > > > > Regards,
> > > > > Xuefu
> > > > >
> > > > >
> > > > > On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Let me try to summarize and conclude the long thread so far:
> > > > >
> > > > > 1. For order of temp function v.s. built-in function:
> > > > >
> > > > > I think Dawid's point that temp function should be of fully
> qualified
> > > > path
> > > > > is a better reasoning to back the newly proposed order, and i agree
> > we
> > > > > don't need to follow Hive/Spark.
> > > > >
> > > > > However, I'd rather not change fundamentals of temporary functions
> in
> > > > this
> > > > > FLIP. It belongs to a bigger story of how temporary objects should
> be
> > > > > redefined and be handled uniformly - currently temporary tables and
> > > views
> > > > > (those registered from TableEnv#registerTable()) behave different
> > than
> > > > what
> > > > > Dawid propose for temp functions, and we need a FLIP to just unify
> > > their
> > > > > APIs and behaviors.
> > > > >
> > > > > I agree that backward compatibility is not an issue w.r.t Jark's
> > > points.
> > > > >
> > > > > ***Seems we do have consensus that it's acceptable to prevent users
> > > > > registering a temp function in the same name as a built-in
> function.
> > To
> > > > > help us move forward, I'd like to propose setting such a restraint
> on
> > > > temp
> > > > > functions in this FLIP to simplify the design and avoid
> disputes.***
> > It
> > > > > will also leave rooms for improvements in the future.
> > > > >
> > > > >
> > > > > 2. For Hive built-in function:
> > > > >
> > > > > Thanks Timo for providing the Presto and Postgres examples. I feel
> > > > modular
> > > > > built-in functions can be a good fit for the geo and ml example as
> a
> > > > native
> > > > > Flink extension, but not sure if it fits well with external
> > > integrations.
> > > > > Anyway, I think modular built-in functions is a bigger story and
> can
> > be
> > > > on
> > > > > its own thread too, and our proposal doesn't prevent Flink from
> doing
> > > > that
> > > > > in the future.
> > > > >
> > > > > ***Seems we have consensus that users should be able to use
> built-in
> > > > > functions of Hive or other external systems in SQL explicitly and
> > > > > deterministically regardless of Flink built-in functions and the
> > > > potential
> > > > > modular built-in functions, via some new syntax like "mycat::func"?
> > If
> > > > so,
> > > > > I'd like to propose removing Hive built-in functions from ambiguous
> > > > > function resolution order, and empower users with such a syntax.
> This
> > > way
> > > > > we sacrifice a little convenience for certainty***
> > > > >
> > > > >
> > > > > What do you think?
> > > > >
> > > > > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org>
> > > > > wrote:
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've
> just
> > > > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think
> > they
> > > > >
> > > > > are
> > > > >
> > > > > very inconsistent in that manner (spark being way worse on that).
> > > > >
> > > > > Hive:
> > > > >
> > > > > You cannot overwrite all the built-in functions. I could overwrite
> > most
> > > > >
> > > > > of
> > > > >
> > > > > the functions I tried e.g. length, e, pi, round, rtrim, but there
> are
> > > > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > > > >
> > > > >
> > > > > *    ParseException line 1:29 cannot recognize input near 'array'
> > 'AS'
> > > *
> > > > >
> > > > > What is interesting is that I cannot ovewrite *array*, but I can
> > > ovewrite
> > > > > *map* or *struct*. Though hive behaves reasonable well if I manage
> to
> > > > > overwrite a function. When I drop the temporary function the native
> > > > > function is still available.
> > > > >
> > > > > Spark:
> > > > >
> > > > > Spark's behavior imho is super bad.
> > > > >
> > > > > Theoretically I could overwrite all functions. I was able e.g. to
> > > > > overwrite CAST function. I had to use though CREATE OR REPLACE
> > > TEMPORARY
> > > > > FUNCTION syntax. Otherwise I get an exception that a function
> already
> > > > > exists. However when I used the CAST function in a query it used
> the
> > > > > native, built-in one.
> > > > >
> > > > > When I overwrote current_date() function, it was used in a query,
> but
> > > it
> > > > > completely replaces the built-in function and I can no longer use
> the
> > > > > native function in any way. I cannot also drop the temporary
> > function.
> > > I
> > > > > get:
> > > > >
> > > > > *    Error in query: Cannot drop native function 'current_date';*
> > > > >
> > > > > Additional note, both systems do not allow creating TEMPORARY
> > FUNCTIONS
> > > > > with a database. Temporary functions are always represented as a
> > single
> > > > > name.
> > > > >
> > > > > In my opinion neither of the systems have consistent behavior.
> > > Generally
> > > > > speaking I think overwriting any system provided functions is just
> > > > > dangerous.
> > > > >
> > > > > Regarding Jark's concerns. Such functions would be registered in a
> > > > >
> > > > > current
> > > > >
> > > > > catalog/database schema, so a user could still use its own
> function,
> > > but
> > > > > would have to fully qualify the function (because built-in
> functions
> > > take
> > > > > precedence). Moreover users would have the same problem with
> > permanent
> > > > > functions. Imagine a user have a permanent function
> 'cat.db.explode'.
> > > In
> > > > > 1.9 the user could use just the 'explode' function as long as the
> > > 'cat' &
> > > > > 'db' were the default catalog & database. If we introduce 'explode'
> > > > > built-in function in 1.10, the user has to fully qualify the
> > function.
> > > > >
> > > > > Best,
> > > > >
> > > > > Dawid
> > > > > On 04/09/2019 15:19, Timo Walther wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > thanks for the healthy discussion. It is already a very long
> > discussion
> > > > > with a lot of text. So I will just post my opinion to a couple of
> > > > > statements:
> > > > >
> > > > >
> > > > > Hive built-in functions are not part of Flink built-in functions,
> > they
> > > > >
> > > > > are catalog functions
> > > > >
> > > > > That is not entirely true. Correct me if I'm wrong but I think Hive
> > > > > built-in functions are also not catalog functions. They are not
> > stored
> > > in
> > > > > every Hive metastore catalog that is freshly created but are a set
> of
> > > > > functions that are listed somewhere and made available.
> > > > >
> > > > >
> > > > > ambiguous functions reference just shouldn't be resolved to a
> > different
> > > > >
> > > > > catalog
> > > > >
> > > > > I agree. They should not be resolved to a different catalog. That's
> > > why I
> > > > > am suggesting to split the concept of built-in functions and
> catalog
> > > > >
> > > > > lookup
> > > > >
> > > > > semantics.
> > > > >
> > > > >
> > > > > I don't know if any other databases handle built-in functions like
> > that
> > > > >
> > > > > What I called "module" is:
> > > > > - Extension in Postgres [1]
> > > > > - Plugin in Presto [2]
> > > > >
> > > > > Btw. Presto even mentions example modules that are similar to the
> > ones
> > > > > that we will introduce in the near future both for ML and System
> XYZ
> > > > > compatibility:
> > > > > "See either the presto-ml module for machine learning functions or
> > the
> > > > > presto-teradata-functions module for Teradata-compatible functions,
> > > both
> > > > >
> > > > > in
> > > > >
> > > > > the root of the Presto source."
> > > > >
> > > > >
> > > > > functions should be either built-in already or just libraries
> > > > >
> > > > > functions,
> > > > >
> > > > > and library functions can be adapted to catalog APIs or of some
> other
> > > > > syntax to use
> > > > >
> > > > > Regarding "built-in already", of course we can add a lot of
> functions
> > > as
> > > > > built-ins but we will end-up in a dependency hell in the near
> future
> > if
> > > > >
> > > > > we
> > > > >
> > > > > don't introduce a pluggable approach. Library functions is what you
> > > also
> > > > > suggest but storing them in a catalog means to always fully qualify
> > > them
> > > > >
> > > > > or
> > > > >
> > > > > modifying the existing catalog design that was inspired by the
> > > standard.
> > > > >
> > > > > I don't think "it brings in even more complicated scenarios to the
> > > > > design", it just does clear separation of concerns. Integrating the
> > > > > functionality into the current design makes the catalog API more
> > > > > complicated.
> > > > >
> > > > >
> > > > > why would users name a temporary function the same as a built-in
> > > > >
> > > > > function then?
> > > > >
> > > > > Because you never know what users do. If they don't, my suggested
> > > > > resolution order should not be a problem, right?
> > > > >
> > > > >
> > > > > I don't think hive functions deserves be a function module
> > > > >
> > > > > Our goal is not to create a Hive clone. We need to think forward
> and
> > > Hive
> > > > > is just one of many systems that we can support. Not every built-in
> > > > > function behaves and will behave exactly like Hive.
> > > > >
> > > > >
> > > > > regarding temporary functions, there are few systems that support
> it
> > > > >
> > > > > IMHO Spark and Hive are not always the best examples for consistent
> > > > > design. Systems like Postgres, Presto, or SQL Server should be used
> > as
> > > a
> > > > > reference. I don't think that a user can overwrite a built-in
> > function
> > > > > there.
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > > > > [2] https://prestodb.github.io/docs/current/develop/functions.html
> > > > >
> > > > >
> > > > > On 04.09.19 13:44, Jark Wu wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Regarding #1 temp function <> built-in function and naming.
> > > > > I'm fine with temp functions should precede built-in function and
> can
> > > > > override built-in functions (we already support to override
> built-in
> > > > > function in 1.9).
> > > > > If we don't allow the same name as a built-in function, I'm afraid
> we
> > > > >
> > > > > will
> > > > >
> > > > > have compatibility issues in the future.
> > > > > Say users register a user defined function named "explode" in 1.9,
> > and
> > > we
> > > > > support a built-in "explode" function in 1.10.
> > > > > Then the user's jobs which call the registered "explode" function
> in
> > > 1.9
> > > > > will all fail in 1.10 because of naming conflict.
> > > > >
> > > > > Regarding #2 "External" built-in functions.
> > > > > I think if we store external built-in functions in catalog, then
> > > > > "hive1::sqrt" is a good way to go.
> > > > > However, I would prefer to support a discovery mechanism (e.g. SPI)
> > for
> > > > > built-in functions as Timo suggested above.
> > > > > This gives us the flexibility to add Hive or MySQL or Geo or
> whatever
> > > > > function set as built-in functions in an easy way.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > > > >
> > > > > Hi David,
> > > > >
> > > > > Thank you for sharing your findings. It seems to me that there is
> no
> > > SQL
> > > > > standard regarding temporary functions. There are few systems that
> > > > >
> > > > > support
> > > > >
> > > > > it. Here are what I have found:
> > > > >
> > > > > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > > > > 2. Spark: basically follows Hive (
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > > > >
> > > > > )
> > > > > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
> > > overwriting
> > > > > behavior. (
> > > >
> > http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> > > > >
> > > > > )
> > > > >
> > > > > Because of lack of standard, it's perfectly fine for Flink to
> define
> > > > > whatever it sees appropriate. Thus, your proposal (no overwriting
> and
> > > > >
> > > > > must
> > > > >
> > > > > have DB as holder) is one option. The advantage is simplicity, The
> > > > > downside
> > > > > is the deviation from Hive, which is popular and de facto standard
> in
> > > big
> > > > > data world.
> > > > >
> > > > > However, I don't think we have to follow Hive. More importantly, we
> > > need
> > > > >
> > > > > a
> > > > >
> > > > > consensus. I have no objection if your proposal is generally agreed
> > > upon.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org
> > > > <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org>
> > > > <dw...@apache.org> <dw...@apache.org>
> > > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Just an opinion on the built-in <> temporary functions resolution
> and
> > > > > NAMING issue. I think we should not allow overriding the built-in
> > > > > functions, as this may pose serious issues and to be honest is
> rather
> > > > > not feasible and would require major rework. What happens if a user
> > > > > wants to override CAST? Calls to that function are generated at
> > > > > different layers of the stack that unfortunately does not always go
> > > > > through the Catalog API (at least yet). Moreover from what I've
> > checked
> > > > > no other systems allow overriding the built-in functions. All the
> > > > > systems I've checked so far register temporary functions in a
> > > > > database/schema (either special database for temporary functions,
> or
> > > > > just current database). What I would suggest is to always register
> > > > > temporary functions with a 3 part identifier. The same way as
> tables,
> > > > > views etc. This effectively means you cannot override built-in
> > > > > functions. With such approach it is natural that the temporary
> > > functions
> > > > > end up a step lower in the resolution order:
> > > > >
> > > > > 1. built-in functions (1 part, maybe 2? - this is still under
> > > discussion)
> > > > >
> > > > > 2. temporary functions (always 3 part path)
> > > > >
> > > > > 3. catalog functions (always 3 part path)
> > > > >
> > > > > Let me know what do you think.
> > > > >
> > > > > Best,
> > > > >
> > > > > Dawid
> > > > >
> > > > > On 04/09/2019 06:13, Bowen Li wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I agree with Xuefu that the main controversial points are mainly
> the
> > > > >
> > > > > two
> > > > >
> > > > > places. My thoughts on them:
> > > > >
> > > > > 1) Determinism of referencing Hive built-in functions. We can
> either
> > > > >
> > > > > remove
> > > > >
> > > > > Hive built-in functions from ambiguous function resolution and
> > require
> > > > > users to use special syntax for their qualified names, or add a
> > config
> > > > >
> > > > > flag
> > > > >
> > > > > to catalog constructor/yaml for turning on and off Hive built-in
> > > > >
> > > > > functions
> > > > >
> > > > > with the flag set to 'false' by default and proper doc added to
> help
> > > > >
> > > > > users
> > > > >
> > > > > make their decisions.
> > > > >
> > > > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > > > >
> > > > > function
> > > > >
> > > > > resolution order. We believe Flink temp functions should precede
> > Flink
> > > > > built-in functions, and I have presented my reasons. Just in case
> if
> > we
> > > > > cannot reach an agreement, I propose forbid users registering temp
> > > > > functions in the same name as a built-in function, like MySQL's
> > > > >
> > > > > approach,
> > > > >
> > > > > for the moment. It won't have any performance concern, since
> built-in
> > > > > functions are all in memory and thus cost of a name check will be
> > > > >
> > > > > really
> > > > >
> > > > > trivial.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > > > >
> > > > >  From what I have seen, there are a couple of focal disagreements:
> > > > >
> > > > > 1. Resolution order: temp function --> flink built-in function -->
> > > > >
> > > > > catalog
> > > > >
> > > > > function vs flink built-in function --> temp function -> catalog
> > > > >
> > > > > function.
> > > > >
> > > > > 2. "External" built-in functions: how to treat built-in functions
> in
> > > > > external system and how users reference them
> > > > >
> > > > > For #1, I agree with Bowen that temp function needs to be at the
> > > > >
> > > > > highest
> > > > >
> > > > > priority because that's how a user might overwrite a built-in
> > function
> > > > > without referencing a persistent, overwriting catalog function
> with a
> > > > >
> > > > > fully
> > > > >
> > > > > qualified name. Putting built-in functions at the highest priority
> > > > > eliminates that usage.
> > > > >
> > > > > For #2, I saw a general agreement on referencing "external"
> built-in
> > > > > functions such as those in Hive needs to be explicit and
> > deterministic
> > > > >
> > > > > even
> > > > >
> > > > > though different approaches are proposed. To limit the scope and
> > > > >
> > > > > simply
> > > > >
> > > > > the
> > > > >
> > > > > usage, it seems making sense to me to introduce special syntax for
> > > > >
> > > > > user  to
> > > > >
> > > > > explicitly reference an external built-in function such as
> > hive1::sqrt
> > > > >
> > > > > or
> > > > >
> > > > > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog
> > API
> > > > >
> > > > > call
> > > > >
> > > > > hive1.getFunction(ObjectPath functionName) where the database name
> is
> > > > > absent for bulit-in functions available in that catalog hive1. I
> > > > >
> > > > > understand
> > > > >
> > > > > that Bowen's original proposal was trying to avoid this, but this
> > > > >
> > > > > could
> > > > >
> > > > > turn out to be a clean and simple solution.
> > > > >
> > > > > (Timo's modular approach is great way to "expand" Flink's built-in
> > > > >
> > > > > function
> > > > >
> > > > > set, which seems orthogonal and complementary to this, which could
> be
> > > > > tackled in further future work.)
> > > > >
> > > > > I'd be happy to hear further thoughts on the two points.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <
> > > > ykt836@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> > > ykt836@gmail.com>
> > > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com><
> > > ykt836@gmail.com>
> > > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> > > ykt836@gmail.com>
> > > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
> > > > >
> > > > > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal
> is
> > > > >
> > > > > the
> > > > >
> > > > > same
> > > > > as Bowen's. But after thinking about it, I'm currently lean to
> Timo's
> > > > > suggestion.
> > > > >
> > > > > The reason is backward compatibility. If we follow Bowen's
> approach,
> > > > >
> > > > > let's
> > > > >
> > > > > say we
> > > > > first find function in Flink's built-in functions, and then hive's
> > > > > built-in. For example, `foo`
> > > > > is not supported by Flink, but hive has such built-in function. So
> > > > >
> > > > > user
> > > > >
> > > > > will have hive's
> > > > > behavior for function `foo`. And in next release, Flink realize
> this
> > > > >
> > > > > is a
> > > > >
> > > > > very popular function
> > > > > and add it into Flink's built-in functions, but with different
> > > > >
> > > > > behavior
> > > > >
> > > > > as
> > > > >
> > > > > hive's. So in next
> > > > > release, the behavior changes.
> > > > >
> > > > > With Timo's approach, IIUC user have to tell the framework
> explicitly
> > > > >
> > > > > what
> > > > >
> > > > > kind of
> > > > > built-in functions he would like to use. He can just tell framework
> > > > >
> > > > > to
> > > > >
> > > > > abandon Flink's built-in
> > > > > functions, and use hive's instead. User can only choose between
> them,
> > > > >
> > > > > but
> > > > >
> > > > > not use
> > > > > them at the same time. I think this approach is more predictable.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > > > >
> > > > > section
> > > > >
> > > > > in the google doc was updated, please take a look first and let me
> > > > >
> > > > > know
> > > > >
> > > > > if
> > > > >
> > > > > you have more questions.
> > > > >
> > > > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > > >
> > > > > wrote:
> > > > >
> > > > > Hi Timo,
> > > > >
> > > > > Re> 1) We should not have the restriction "hive built-in functions
> > > > >
> > > > > can
> > > > >
> > > > > only
> > > > >
> > > > > be used when current catalog is hive catalog". Switching a catalog
> > > > > should only have implications on the cat.db.object resolution but
> > > > >
> > > > > not
> > > > >
> > > > > functions. It would be quite convinient for users to use Hive
> > > > >
> > > > > built-ins
> > > > >
> > > > > even if they use a Confluent schema registry or just the in-memory
> > > > >
> > > > > catalog.
> > > > >
> > > > > There might be a misunderstanding here.
> > > > >
> > > > > First of all, Hive built-in functions are not part of Flink
> > > > >
> > > > > built-in
> > > > >
> > > > > functions, they are catalog functions, thus if the current catalog
> > > > >
> > > > > is
> > > > >
> > > > > not a
> > > > >
> > > > > HiveCatalog but, say, a schema registry catalog, ambiguous
> > > > >
> > > > > functions
> > > > >
> > > > > reference just shouldn't be resolved to a different catalog.
> > > > >
> > > > > Second, Hive built-in functions can potentially be referenced
> > > > >
> > > > > across
> > > > >
> > > > > catalog, but it doesn't have db namespace and we currently just
> > > > >
> > > > > don't
> > > > >
> > > > > have
> > > > >
> > > > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > > > >
> > > > > defined,
> > > > >
> > > > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > > > >
> > > > > 2) I would propose to have separate concepts for catalog and
> > > > >
> > > > > built-in
> > > > >
> > > > > functions. In particular it would be nice to modularize built-in
> > > > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > > > >
> > > > > maybe
> > > > >
> > > > > we add more experimental functions in the future or function for
> > > > >
> > > > > some
> > > > >
> > > > > special application area (Geo functions, ML functions). A data
> > > > >
> > > > > platform
> > > > >
> > > > > team might not want to make every built-in function available. Or a
> > > > > function module like ML functions is in a different Maven module.
> > > > >
> > > > > I think this is orthogonal to this FLIP, especially we don't have
> > > > >
> > > > > the
> > > > >
> > > > > "external built-in functions" anymore and currently the built-in
> > > > >
> > > > > function
> > > > >
> > > > > category remains untouched.
> > > > >
> > > > > But just to share some thoughts on the proposal, I'm not sure about
> > > > >
> > > > > it:
> > > > >
> > > > > - I don't know if any other databases handle built-in functions
> > > > >
> > > > > like
> > > > >
> > > > > that.
> > > > >
> > > > > Maybe you can give some examples? IMHO, built-in functions are
> > > > >
> > > > > system
> > > > >
> > > > > info
> > > > >
> > > > > and should be deterministic, not depending on loaded libraries. Geo
> > > > > functions should be either built-in already or just libraries
> > > > >
> > > > > functions,
> > > > >
> > > > > and library functions can be adapted to catalog APIs or of some
> > > > >
> > > > > other
> > > > >
> > > > > syntax to use
> > > > > - I don't know if all use cases stand, and many can be achieved by
> > > > >
> > > > > other
> > > > >
> > > > > approaches too. E.g. experimental functions can be taken good care
> > > > >
> > > > > of
> > > > >
> > > > > by
> > > > >
> > > > > documentations, annotations, etc
> > > > > - the proposal basically introduces some concept like a pluggable
> > > > >
> > > > > built-in
> > > > >
> > > > > function catalog, despite the already existing catalog APIs
> > > > > - it brings in even more complicated scenarios to the design. E.g.
> > > > >
> > > > > how
> > > > >
> > > > > do
> > > > >
> > > > > you handle built-in functions in different modules but different
> > > > >
> > > > > names?
> > > > >
> > > > > In short, I'm not sure if it really stands and it looks like an
> > > > >
> > > > > overkill
> > > > >
> > > > > to me. I'd rather not go to that route. Related discussion can be
> > > > >
> > > > > on
> > > > >
> > > > > its
> > > > >
> > > > > own thread.
> > > > >
> > > > > 3) Following the suggestion above, we can have a separate discovery
> > > > > mechanism for built-in functions. Instead of just going through a
> > > > >
> > > > > static
> > > > >
> > > > > list like in BuiltInFunctionDefinitions, a platform team should be
> > > > >
> > > > > able
> > > > >
> > > > > to select function modules like
> > > > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > > > HiveFunctions) or via service discovery;
> > > > >
> > > > > Same as above. I'll leave it to its own thread.
> > > > >
> > > > > re > 3) Dawid and I discussed the resulution order again. I agree
> > > > >
> > > > > with
> > > > >
> > > > > Kurt
> > > > >
> > > > > that we should unify built-in function (external or internal)
> > > > >
> > > > > under a
> > > > >
> > > > > common layer. However, the resolution order should be:
> > > > >    1. built-in functions
> > > > >    2. temporary functions
> > > > >    3. regular catalog resolution logic
> > > > > Otherwise a temporary function could cause clashes with Flink's
> > > > >
> > > > > built-in
> > > > >
> > > > > functions. If you take a look at other vendors, like SQL Server
> > > > >
> > > > > they
> > > > >
> > > > > also do not allow to overwrite built-in functions.
> > > > >
> > > > > ”I agree with Kurt that we should unify built-in function (external
> > > > >
> > > > > or
> > > > >
> > > > > internal) under a common layer.“ <- I don't think this is what Kurt
> > > > >
> > > > > means.
> > > > >
> > > > > Kurt and I are in favor of unifying built-in functions of external
> > > > >
> > > > > systems
> > > > >
> > > > > and catalog functions. Did you type a mistake?
> > > > >
> > > > > Besides, I'm not sure about the resolution order you proposed.
> > > > >
> > > > > Temporary
> > > > >
> > > > > functions have a lifespan over a session and are only visible to
> > > > >
> > > > > the
> > > > >
> > > > > session owner, they are unique to each user, and users create them
> > > > >
> > > > > on
> > > > >
> > > > > purpose to be the highest priority in order to overwrite system
> > > > >
> > > > > info
> > > > >
> > > > > (built-in functions in this case).
> > > > >
> > > > > In your case, why would users name a temporary function the same
> > > > >
> > > > > as a
> > > > >
> > > > > built-in function then? Since using that name in ambiguous function
> > > > > reference will always be resolved to built-in functions, creating a
> > > > > same-named temp function would be meaningless in the end.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > > >
> > > > > wrote:
> > > > >
> > > > > Hi Jingsong,
> > > > >
> > > > > Re> 1.Hive built-in functions is an intermediate solution. So we
> > > > >
> > > > > should
> > > > >
> > > > > not introduce interfaces to influence the framework. To make
> > > > > Flink itself more powerful, we should implement the functions
> > > > > we need to add.
> > > > >
> > > > > Yes, please see the doc.
> > > > >
> > > > > Re> 2.Non-flink built-in functions are easy for users to change
> > > > >
> > > > > their
> > > > >
> > > > > behavior. If we support some flink built-in functions in the
> > > > > future but act differently from non-flink built-in, this will
> > > > >
> > > > > lead
> > > > >
> > > > > to
> > > > >
> > > > > changes in user behavior.
> > > > >
> > > > > There's no such concept as "external built-in functions" any more.
> > > > > Built-in functions of external systems will be treated as special
> > > > >
> > > > > catalog
> > > > >
> > > > > functions.
> > > > >
> > > > > Re> Another question is, does this fallback include all
> > > > >
> > > > > hive built-in functions? As far as I know, some hive functions
> > > > > have some hacky. If possible, can we start with a white list?
> > > > > Once we implement some functions to flink built-in, we can
> > > > > also update the whitelist.
> > > > >
> > > > > Yes, that's something we thought of too. I don't think it's super
> > > > > critical to the scope of this FLIP, thus I'd like to leave it to
> > > > >
> > > > > future
> > > > >
> > > > > efforts as a nice-to-have feature.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > > >
> > > > > wrote:
> > > > >
> > > > > Hi Kurt,
> > > > >
> > > > > Re: > What I want to propose is we can merge #3 and #4, make them
> > > > >
> > > > > both
> > > > >
> > > > > under
> > > > >
> > > > > "catalog" concept, by extending catalog function to make it have
> > > > >
> > > > > ability to
> > > > >
> > > > > have built-in catalog functions. Some benefits I can see from
> > > > >
> > > > > this
> > > > >
> > > > > approach:
> > > > >
> > > > > 1. We don't have to introduce new concept like external built-in
> > > > >
> > > > > functions.
> > > > >
> > > > > Actually I don't see a full story about how to treat a built-in
> > > > >
> > > > > functions, and it
> > > > >
> > > > > seems a little bit disrupt with catalog. As a result, you have
> > > > >
> > > > > to
> > > > >
> > > > > make
> > > > >
> > > > > some restriction
> > > > >
> > > > > like "hive built-in functions can only be used when current
> > > > >
> > > > > catalog
> > > > >
> > > > > is
> > > > >
> > > > > hive catalog".
> > > > >
> > > > > Yes, I've unified #3 and #4 but it seems I didn't update some
> > > > >
> > > > > part
> > > > >
> > > > > of
> > > > >
> > > > > the doc. I've modified those sections, and they are up to date
> > > > >
> > > > > now.
> > > > >
> > > > > In short, now built-in function of external systems are defined
> > > > >
> > > > > as
> > > > >
> > > > > a
> > > > >
> > > > > special kind of catalog function in Flink, and handled by Flink
> > > > >
> > > > > as
> > > > >
> > > > > following:
> > > > > - An external built-in function must be associated with a catalog
> > > > >
> > > > > for
> > > > >
> > > > > the purpose of decoupling flink-table and external systems.
> > > > > - It always resides in front of catalog functions in ambiguous
> > > > >
> > > > > function
> > > > >
> > > > > reference order, just like in its own external system
> > > > > - It is a special catalog function that doesn’t have a
> > > > >
> > > > > schema/database
> > > > >
> > > > > namespace
> > > > > - It goes thru the same instantiation logic as other user defined
> > > > > catalog functions in the external system
> > > > >
> > > > > Please take another look at the doc, and let me know if you have
> > > > >
> > > > > more
> > > > >
> > > > > questions.
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <
> > > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > > twalthr@apache.org><tw...@apache.org> <tw...@apache.org> <
> > > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org>
> > > > >
> > > > > wrote:
> > > > >
> > > > > Hi Kurt,
> > > > >
> > > > > it should not affect the functions and operations we currently
> > > > >
> > > > > have
> > > > >
> > > > > in
> > > > >
> > > > > SQL. It just categorizes the available built-in functions. It is
> > > > >
> > > > > kind
> > > > >
> > > > > of
> > > > > an orthogonal concept to the catalog API but built-in functions
> > > > >
> > > > > deserve
> > > > >
> > > > > this special kind of treatment. CatalogFunction still fits
> > > > >
> > > > > perfectly
> > > > >
> > > > > in
> > > > >
> > > > > there because the regular catalog object resolution logic is not
> > > > > affected. So tables and functions are resolved in the same way
> > > > >
> > > > > but
> > > > >
> > > > > with
> > > > >
> > > > > built-in functions that have priority as in the original design.
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 03.09.19 15:26, Kurt Young wrote:
> > > > >
> > > > > Does this only affect the functions and operations we currently
> > > > >
> > > > > have
> > > > >
> > > > > in SQL
> > > > >
> > > > > and
> > > > > have no effect on tables, right? Looks like this is an
> > > > >
> > > > > orthogonal
> > > > >
> > > > > concept
> > > > >
> > > > > with Catalog?
> > > > > If the answer are both yes, then the catalog function will be a
> > > > >
> > > > > weird
> > > > >
> > > > > concept?
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > The way you proposed are basically the same as what Calcite
> > > > >
> > > > > does, I
> > > > >
> > > > > think
> > > > >
> > > > > we are in the same line.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> > > > >
> > > > > ,写道:
> > > > >
> > > > > This sounds exactly as the module approach I mentioned, no?
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > > On 03.09.19 13:42, Danny Chan wrote:
> > > > >
> > > > > Thanks Bowen for bring up this topic, I think it’s a useful
> > > > >
> > > > > refactoring to make our function usage more user friendly.
> > > > >
> > > > > For the topic of how to organize the builtin operators and
> > > > >
> > > > > operators
> > > > >
> > > > > of Hive, here is a solution from Apache Calcite, the Calcite
> > > > >
> > > > > way
> > > > >
> > > > > is
> > > > >
> > > > > to make
> > > > >
> > > > > every dialect operators a “Library”, user can specify which
> > > > >
> > > > > libraries they
> > > > >
> > > > > want to use for a sql query. The builtin operators always
> > > > >
> > > > > comes
> > > > >
> > > > > as
> > > > >
> > > > > the
> > > > >
> > > > > first class objects and the others are used from the order
> > > > >
> > > > > they
> > > > >
> > > > > appears.
> > > > >
> > > > > Maybe you can take a reference.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> > > > >
> > > > > ,写道:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I'd like to kick off a discussion on reworking Flink's
> > > > >
> > > > > FunctionCatalog.
> > > > >
> > > > > It's critically helpful to improve function usability in
> > > > >
> > > > > SQL.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > > > >
> > > > > In short, it:
> > > > > - adds support for precise function reference with
> > > > >
> > > > > fully/partially
> > > > >
> > > > > qualified name
> > > > > - redefines function resolution order for ambiguous
> > > > >
> > > > > function
> > > > >
> > > > > reference
> > > > >
> > > > > - adds support for Hive's rich built-in functions (support
> > > > >
> > > > > for
> > > > >
> > > > > Hive
> > > > >
> > > > > user
> > > > >
> > > > > defined functions was already added in 1.9.0)
> > > > > - clarifies the concept of temporary functions
> > > > >
> > > > > Would love to hear your thoughts.
> > > > >
> > > > > Bowen
> > > > >
> > > > > --
> > > > > Xuefu Zhang
> > > > >
> > > > > "In Honey We Trust!"
> > > > >
> > > > >
> > > > > --
> > > > > Xuefu Zhang
> > > > >
> > > > > "In Honey We Trust!"
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi,

Thanks @Fabian @Dawid and everyone else for sharing your thoughts!

First, I'd like to take Hive built-in functions out of this FLIP to keep
our original scope and make it less controversial on a potential modular
approach. I will remove Hive built-in functions from the google doc.

Then the focus of debate is mainly function resolution order and temp
function namespace, which are somewhat related. I roughly summarized this
thread, and currently we are debating on two approaches with preference
from the following people:

Option 1:
    Proposal: temp functions will be of 1-part path (function name only),
and can override built-in functions. The ambiguous function resolution
order is thus 1) temp functions 2) built-in functions 3) catalog functions
in the current catalog/database
    Votes: Xuefu, Bowen, Fabian, Jark

Option 2:
    Proposal: temp functions will be of 3-part path (with catalog,
database, and function name), and temp functions cannot override built-in
functions. The ambiguous function resolution order is thus 1) built-in
functions 2) temp functions (in 3-part path) 3) catalog functions in the
current catalog/database
    Votes:  Dawid, Timo


Do you think we need a separate voting thread on the two options in the
community, or are we able to conclude from the above summary?



On Wed, Sep 11, 2019 at 8:09 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi Fabian,
> Thank you for your response.
> Regarding the temporary function, just wanted to clarify one thing: the
> 3-part identifier does not mean the user always has to provide the catalog
> & database explicitly. The same way user does not have to provide them in
> e.g. when creating permanent table, view etc. It means though functions are
> always stored within a database. The same way as all the permanent objects
> and other temporary objects(tables, views). If not given explicitly the
> current catalog & database would be used, both in the create statement or
> when using the function.
>
> Point taken though your preference would be to support overriding built-in
> functions.
>
> Best,
> Dawid
>
> On Wed, 11 Sep 2019, 21:14 Fabian Hueske, <fh...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to add my opinion on this topic as well ;-)
> >
> > In general, I think overriding built-in function with temp functions has
> a
> > couple of benefits but also a few challenges:
> >
> > * Users can reimplement the behavior of a built-in functions of a
> different
> > system, e.g., for backward compatibility after a migration.
> > * I don't think that "accidental" overrides and surprising semantics are
> an
> > issue or dangerous. The user registered the temp function in the same
> > session and should therefore be aware of the changed semantics.
> > * I see that not all built-in functions can be overridden, like the CAST
> > example that Dawid gave. However, I think these should be a small
> fraction
> > and such functions could be blacklisted. Sure, that's not super
> consistent,
> > but should (IMO) not be a big issue in practice.
> > * Temp functions should be easy to use. Requiring a 3-part addressing
> makes
> > them a lot less user friendly, IMO. Users need to think about what
> catalog
> > and db to choose when registering them. Also using a temp function in a
> > query becomes less convenient. Moreover, I agree with Bowen's concerns
> that
> > a 3-part addressing scheme reduces the temporal appearance of the
> function.
> >
> > From the three possible solutions, my preference order is
> > 1) 1-part address with override of built-in
> > 2) 1-part address without override of built-in
> > 3) 3-part address
> >
> > Regarding the issue of external built-in functions, I don't think that
> > Timo's proposal of modules is fully orthogonal to this discussion.
> > A Hive function module could be an alternative to offering Hive functions
> > as part of Hive's catalog.
> > From a user's point of view, I think that modules would be a "cleaner"
> > integration ("Why do I need a Hive catalog if all I want to do is apply a
> > Hive function on a Kafka table?").
> > However, the module approach clearly has the problem of dealing with
> > same-named functions in different modules (e.g., a Hive function and a
> > Flink built-in function).
> > The catalog approach as the benefit that functions can be addressed like
> > hiveCat::func (or a similar path).
> >
> > I'm not sure what's the best solution here.
> >
> > Cheers,
> > Fabian
> >
> >
> > Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li <bowenli86@gmail.com
> >:
> >
> > > Hi,
> > >
> > > W.r.t temp functions, I feel both options have their benefits and can
> > > theoretically achieve similar functionalities one way or another. In
> the
> > > end, it's more about use cases, users habits, and trade-offs.
> > >
> > > Re> Not always users are in full control of the catalog functions.
> There
> > is
> > > also the case where different teams manage the catalog & use the
> catalog.
> > >
> > > Temp functions live within a session, and not within a catalog. Having
> > > 3-part paths may implies temp functions are tied to a catalog in two
> > > aspects.
> > > 1) it may indicate each catalog manages their temp functions, which is
> > not
> > > true as we seem all agree they should reside at a central place, either
> > in
> > > FunctionCatalog or CatalogManager
> > > 2) it may indicate there's some access control. When users are
> forbidden
> > to
> > > manipulate some objects in the catalog that's managed by other teams,
> but
> > > are allowed to manipulate some other objects (temp functions in this
> > case)
> > > belonging to the catalog in namespaces, users may think we introduced
> > extra
> > > complexity and confusion with some kind of access control into the
> > problem.
> > > It doesn't feel intuitive enough for end users.
> > >
> > > Thus, I'd be in favor of 1-part path for temporary functions, and other
> > > temp objects.
> > >
> > > Thanks,
> > > Bowen
> > >
> > >
> > >
> > > On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > > wrote:
> > >
> > > > I agree the consequences of the decision are substantial. Let's see
> > what
> > > > others think.
> > > >
> > > > -- Catalog functions are defined by users, and we suppose they can
> > > > drop/alter it in any way they want. Thus, overwriting a catalog
> > function
> > > > doesn't seem to be a strong use case that we should be concerned
> about.
> > > > Rather, there are known use case for overwriting built-in functions.
> > > >
> > > > Not always users are in full control of the catalog functions. There
> is
> > > > also the case where different teams manage the catalog & use the
> > catalog.
> > > > As for overriding built-in functions with 3-part approach user can
> > always
> > > > use an equally named function from a catalog. E.g. to override
> > > >
> > > > *    SELECT explode(arr) FROM ...*
> > > >
> > > > user can always write:
> > > >
> > > > *    SELECT db.explode(arr) FROM ...*
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 06/09/2019 10:54, Xuefu Z wrote:
> > > >
> > > > Hi Dawid,
> > > >
> > > > Thank you for your summary. While the only difference in the two
> > > proposals
> > > > is one- or three-part in naming, the consequence would be
> substantial.
> > > >
> > > > To me, there are two major use cases of temporary functions compared
> to
> > > > persistent ones:
> > > > 1. Temporary in nature and auto managed by the session. More often
> than
> > > > not, admin doesn't even allow user to create persistent functions.
> > > > 2. Provide an opportunity to overwriting system built-in functions.
> > > >
> > > > Since built-in functions has one-part name, requiring three-part name
> > for
> > > > temporary functions eliminates the overwriting opportunity.
> > > >
> > > > One-part naming essentially puts all temp functions under a single
> > > > namespace and simplifies function resolution, such as we don't need
> to
> > > > consider the case of a temp function and a persistent function with
> the
> > > > same name under the same database.
> > > >
> > > > I agree having three-parts does have its merits, such as consistency
> > with
> > > > other temporary objects (table) and minor difference between temp vs
> > > > catalog functions. However, there is a slight difference between
> tables
> > > and
> > > > function in that there is no built-in table in SQL so there is no
> need
> > to
> > > > overwrite it.
> > > >
> > > > I'm not sure if I fully agree the benefits you listed as the
> advantages
> > > of
> > > > the three-part naming of temp functions.
> > > >   -- Allowing overwriting built-in functions is a benefit and the
> > > solution
> > > > for disallowing certain overwriting shouldn't be totally banning it.
> > > >   -- Catalog functions are defined by users, and we suppose they can
> > > > drop/alter it in any way they want. Thus, overwriting a catalog
> > function
> > > > doesn't seem to be a strong use case that we should be concerned
> about.
> > > > Rather, there are known use case for overwriting built-in functions.
> > > >
> > > > Thus, personally I would prefer one-part name for temporary
> functions.
> > In
> > > > lack of SQL standard on this, I certainly like to get opinions from
> > > others
> > > > to see if a consensus can be eventually reached.
> > > >
> > > > (To your point on modular approach to support external built-in
> > > functions,
> > > > we saw the value and are actively looking into it. Thanks for sharing
> > > your
> > > > opinion on that.)
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org>
> > > > wrote:
> > > >
> > > >
> > > > Hi Xuefu,
> > > >
> > > > Thank you for your answers.
> > > >
> > > > Let me summarize my understanding. In principle we differ only in
> > regards
> > > > to the fact if a temporary function can be only 1-part or only 3-part
> > > > identified. I can reconfirm that if the community decides it prefers
> > the
> > > > 1-part approach I will commit to that, with the assumption that we
> will
> > > > force ONLY 1-part function names. (We will parse identifier and throw
> > > > exception if a user tries to register e.g. db.temp_func).
> > > >
> > > > My preference is though the 3-part approach:
> > > >
> > > >    - there are some functions that it makes no sense to override,
> e.g.
> > > >    CAST, moreover I'm afraid that allowing overriding such will lead
> to
> > > high
> > > >    inconsistency, similar to those that I mentioned spark has
> > > >    - you cannot shadow a fully-qualified function. (If a user fully
> > > >    qualifies his/her objects in a SQL query, which is often
> considered
> > a
> > > good
> > > >    practice)
> > > >    - it does not differentiate between functions & temporary
> functions.
> > > >    Temporary functions just differ with regards to their life-cycle.
> > The
> > > >    registration & usage is exactly the same.
> > > >
> > > > As it can be seen, the proposed concept regarding temp function and
> > > > function resolution is quite simple.
> > > >
> > > > Both approaches are equally simple. I would even say the 3-part
> > approach
> > > > is slightly simpler as it does not have to care about some special
> > > built-in
> > > > functions such as CAST.
> > > >
> > > > I don't want to express my opinion on the differentiation between
> > > built-in
> > > > functions and "external" built-in functions in this thread as it is
> > > rather
> > > > orthogonal, but I also like the modular approach and I definitely
> don't
> > > > like the special syntax "cat::function". I think it's better to stick
> > to
> > > a
> > > > standard or at least other proved solutions from other systems.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 05/09/2019 10:12, Xuefu Z wrote:
> > > >
> > > > Hi David,
> > > >
> > > > Thanks for sharing your thoughts and  request for clarifications. I
> > > believe
> > > > that I fully understood your proposal, which does has its merit.
> > However,
> > > > it's different from ours. Here are the answers to your questions:
> > > >
> > > > Re #1: yes, the temp functions in the proposal are global and have
> just
> > > > one-part names, similar to built-in functions. Two- or three-part
> names
> > > are
> > > > not allowed.
> > > >
> > > > Re #2: not applicable as two- or three-part names are disallowed.
> > > >
> > > > Re #3: same as above. Referencing external built-in functions is
> > achieved
> > > > either implicitly (only the built-in functions in the current
> catalogs
> > > are
> > > > considered) or via special syntax such as cat::function. However, we
> > are
> > > > looking into the modular approach that Time suggested with other
> > feedback
> > > > received from the community.
> > > >
> > > > Re #4: the resolution order goes like the following in our proposal:
> > > >
> > > > 1. temporary functions
> > > > 2. bulit-in functions (including those augmented by add-on modules)
> > > > 3. built-in functions in current catalog (this will not be needed if
> > the
> > > > special syntax "cat::function" is required)
> > > > 4. functions in current catalog and db.
> > > >
> > > > If we go with the modular approach and make external built-in
> functions
> > > as
> > > > an add-on module, the 2 and 3 above will be combined. In essence, the
> > > > resolution order is equivalent in the two approaches.
> > > >
> > > > By the way, resolution order matters only for simple name reference.
> > For
> > > > names such as db.function (interpreted as current_cat/db/function) or
> > > > cat.db.function, the reference is unambiguous, so on resolution is
> > > needed.
> > > >
> > > > As it can be seen, the proposed concept regarding temp function and
> > > > function resolution is quite simple. Additionally, the proposed
> > > resolution
> > > > order allows temp function to shadow a built-in function, which is
> > > > important (though not decisive) in our opinion.
> > > >
> > > > I started liking the modular approach as the resolution order will
> only
> > > > include 1, 2, and 4, which is simpler and more generic. That's why I
> > > > suggested we look more into this direction.
> > > >
> > > > Please let me know if there are further questions.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > > wrote:
> > > >
> > > >
> > > > Hi Xuefu,
> > > >
> > > > Just wanted to summarize my opinion on the one topic (temporary
> > > functions).
> > > >
> > > > My preference would be to make temporary functions always 3-part
> > > qualified
> > > > (as a result that would prohibit overriding built-in functions).
> Having
> > > > said that if the community decides that it's better to allow
> overriding
> > > > built-in functions I am fine with it and can commit to that decision.
> > > >
> > > > I wanted to ask if you could clarify a few points for me around that
> > > > option.
> > > >
> > > >    1. Would you enforce temporary functions to be always just a
> single
> > > >    name (without db & cat) as hive does, or would you allow also 3 or
> > > even 2
> > > >    part identifiers?
> > > >    2. Assuming 2/3-part paths. How would you register a function
> from a
> > > >    following statement: CREATE TEMPORARY FUNCTION db.func? Would that
> > > shadow
> > > >    all functions named 'func' in all databases named 'db' in all
> > > catalogs? Or
> > > >    would you shadow only function 'func' in database 'db' in current
> > > catalog?
> > > >    3. This point is still under discussion, but was mentioned a few
> > > >    times, that maybe we want to enable syntax cat.func for "external
> > > built-in
> > > >    functions". How would that affect statement from previous point?
> > Would
> > > >    'db.func' shadow "external built-in function" in 'db' catalog or
> > user
> > > >    functions as in point 2? Or maybe both?
> > > >    4. Lastly in fact to summarize the previous points. Assuming
> > 2/3-part
> > > >    paths. Would the function resolution be actually as follows?:
> > > >       1. temporary functions (1-part path)
> > > >       2. built-in functions
> > > >       3. temporary functions (2-part path)
> > > >       4. 2-part catalog functions a.k.a. "external built-in
> functions"
> > > >       (cat + func) - this is still under discussion, if we want that
> in
> > > the other
> > > >       focal point
> > > >       5. temporary functions (3-part path)
> > > >       6. 3-part catalog functions a.k.a. user functions
> > > >
> > > > I would be really grateful if you could explain me those questions,
> > > thanks.
> > > >
> > > > BTW, Thank you all for a healthy discussion.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 04/09/2019 23:25, Xuefu Z wrote:
> > > >
> > > > Thank all for the sharing thoughts. I think we have gathered some
> > useful
> > > > initial feedback from this long discussion with a couple of focal
> > points
> > > > sticking out.
> > > >
> > > >  We will go back to do more research and adapt our proposal. Once
> it's
> > > > ready, we will ask for a new round of review. If there is any
> > > disagreement,
> > > > we will start a new discussion thread on each rather than having a
> mega
> > > > discussion like this.
> > > >
> > > > Thanks to everyone for participating.
> > > >
> > > > Regards,
> > > > Xuefu
> > > >
> > > >
> > > > On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> wrote:
> > > >
> > > >
> > > > Let me try to summarize and conclude the long thread so far:
> > > >
> > > > 1. For order of temp function v.s. built-in function:
> > > >
> > > > I think Dawid's point that temp function should be of fully qualified
> > > path
> > > > is a better reasoning to back the newly proposed order, and i agree
> we
> > > > don't need to follow Hive/Spark.
> > > >
> > > > However, I'd rather not change fundamentals of temporary functions in
> > > this
> > > > FLIP. It belongs to a bigger story of how temporary objects should be
> > > > redefined and be handled uniformly - currently temporary tables and
> > views
> > > > (those registered from TableEnv#registerTable()) behave different
> than
> > > what
> > > > Dawid propose for temp functions, and we need a FLIP to just unify
> > their
> > > > APIs and behaviors.
> > > >
> > > > I agree that backward compatibility is not an issue w.r.t Jark's
> > points.
> > > >
> > > > ***Seems we do have consensus that it's acceptable to prevent users
> > > > registering a temp function in the same name as a built-in function.
> To
> > > > help us move forward, I'd like to propose setting such a restraint on
> > > temp
> > > > functions in this FLIP to simplify the design and avoid disputes.***
> It
> > > > will also leave rooms for improvements in the future.
> > > >
> > > >
> > > > 2. For Hive built-in function:
> > > >
> > > > Thanks Timo for providing the Presto and Postgres examples. I feel
> > > modular
> > > > built-in functions can be a good fit for the geo and ml example as a
> > > native
> > > > Flink extension, but not sure if it fits well with external
> > integrations.
> > > > Anyway, I think modular built-in functions is a bigger story and can
> be
> > > on
> > > > its own thread too, and our proposal doesn't prevent Flink from doing
> > > that
> > > > in the future.
> > > >
> > > > ***Seems we have consensus that users should be able to use built-in
> > > > functions of Hive or other external systems in SQL explicitly and
> > > > deterministically regardless of Flink built-in functions and the
> > > potential
> > > > modular built-in functions, via some new syntax like "mycat::func"?
> If
> > > so,
> > > > I'd like to propose removing Hive built-in functions from ambiguous
> > > > function resolution order, and empower users with such a syntax. This
> > way
> > > > we sacrifice a little convenience for certainty***
> > > >
> > > >
> > > > What do you think?
> > > >
> > > > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org>
> > > > wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think
> they
> > > >
> > > > are
> > > >
> > > > very inconsistent in that manner (spark being way worse on that).
> > > >
> > > > Hive:
> > > >
> > > > You cannot overwrite all the built-in functions. I could overwrite
> most
> > > >
> > > > of
> > > >
> > > > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > > >
> > > >
> > > > *    ParseException line 1:29 cannot recognize input near 'array'
> 'AS'
> > *
> > > >
> > > > What is interesting is that I cannot ovewrite *array*, but I can
> > ovewrite
> > > > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > > > overwrite a function. When I drop the temporary function the native
> > > > function is still available.
> > > >
> > > > Spark:
> > > >
> > > > Spark's behavior imho is super bad.
> > > >
> > > > Theoretically I could overwrite all functions. I was able e.g. to
> > > > overwrite CAST function. I had to use though CREATE OR REPLACE
> > TEMPORARY
> > > > FUNCTION syntax. Otherwise I get an exception that a function already
> > > > exists. However when I used the CAST function in a query it used the
> > > > native, built-in one.
> > > >
> > > > When I overwrote current_date() function, it was used in a query, but
> > it
> > > > completely replaces the built-in function and I can no longer use the
> > > > native function in any way. I cannot also drop the temporary
> function.
> > I
> > > > get:
> > > >
> > > > *    Error in query: Cannot drop native function 'current_date';*
> > > >
> > > > Additional note, both systems do not allow creating TEMPORARY
> FUNCTIONS
> > > > with a database. Temporary functions are always represented as a
> single
> > > > name.
> > > >
> > > > In my opinion neither of the systems have consistent behavior.
> > Generally
> > > > speaking I think overwriting any system provided functions is just
> > > > dangerous.
> > > >
> > > > Regarding Jark's concerns. Such functions would be registered in a
> > > >
> > > > current
> > > >
> > > > catalog/database schema, so a user could still use its own function,
> > but
> > > > would have to fully qualify the function (because built-in functions
> > take
> > > > precedence). Moreover users would have the same problem with
> permanent
> > > > functions. Imagine a user have a permanent function 'cat.db.explode'.
> > In
> > > > 1.9 the user could use just the 'explode' function as long as the
> > 'cat' &
> > > > 'db' were the default catalog & database. If we introduce 'explode'
> > > > built-in function in 1.10, the user has to fully qualify the
> function.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 04/09/2019 15:19, Timo Walther wrote:
> > > >
> > > > Hi all,
> > > >
> > > > thanks for the healthy discussion. It is already a very long
> discussion
> > > > with a lot of text. So I will just post my opinion to a couple of
> > > > statements:
> > > >
> > > >
> > > > Hive built-in functions are not part of Flink built-in functions,
> they
> > > >
> > > > are catalog functions
> > > >
> > > > That is not entirely true. Correct me if I'm wrong but I think Hive
> > > > built-in functions are also not catalog functions. They are not
> stored
> > in
> > > > every Hive metastore catalog that is freshly created but are a set of
> > > > functions that are listed somewhere and made available.
> > > >
> > > >
> > > > ambiguous functions reference just shouldn't be resolved to a
> different
> > > >
> > > > catalog
> > > >
> > > > I agree. They should not be resolved to a different catalog. That's
> > why I
> > > > am suggesting to split the concept of built-in functions and catalog
> > > >
> > > > lookup
> > > >
> > > > semantics.
> > > >
> > > >
> > > > I don't know if any other databases handle built-in functions like
> that
> > > >
> > > > What I called "module" is:
> > > > - Extension in Postgres [1]
> > > > - Plugin in Presto [2]
> > > >
> > > > Btw. Presto even mentions example modules that are similar to the
> ones
> > > > that we will introduce in the near future both for ML and System XYZ
> > > > compatibility:
> > > > "See either the presto-ml module for machine learning functions or
> the
> > > > presto-teradata-functions module for Teradata-compatible functions,
> > both
> > > >
> > > > in
> > > >
> > > > the root of the Presto source."
> > > >
> > > >
> > > > functions should be either built-in already or just libraries
> > > >
> > > > functions,
> > > >
> > > > and library functions can be adapted to catalog APIs or of some other
> > > > syntax to use
> > > >
> > > > Regarding "built-in already", of course we can add a lot of functions
> > as
> > > > built-ins but we will end-up in a dependency hell in the near future
> if
> > > >
> > > > we
> > > >
> > > > don't introduce a pluggable approach. Library functions is what you
> > also
> > > > suggest but storing them in a catalog means to always fully qualify
> > them
> > > >
> > > > or
> > > >
> > > > modifying the existing catalog design that was inspired by the
> > standard.
> > > >
> > > > I don't think "it brings in even more complicated scenarios to the
> > > > design", it just does clear separation of concerns. Integrating the
> > > > functionality into the current design makes the catalog API more
> > > > complicated.
> > > >
> > > >
> > > > why would users name a temporary function the same as a built-in
> > > >
> > > > function then?
> > > >
> > > > Because you never know what users do. If they don't, my suggested
> > > > resolution order should not be a problem, right?
> > > >
> > > >
> > > > I don't think hive functions deserves be a function module
> > > >
> > > > Our goal is not to create a Hive clone. We need to think forward and
> > Hive
> > > > is just one of many systems that we can support. Not every built-in
> > > > function behaves and will behave exactly like Hive.
> > > >
> > > >
> > > > regarding temporary functions, there are few systems that support it
> > > >
> > > > IMHO Spark and Hive are not always the best examples for consistent
> > > > design. Systems like Postgres, Presto, or SQL Server should be used
> as
> > a
> > > > reference. I don't think that a user can overwrite a built-in
> function
> > > > there.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > > > [2] https://prestodb.github.io/docs/current/develop/functions.html
> > > >
> > > >
> > > > On 04.09.19 13:44, Jark Wu wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Regarding #1 temp function <> built-in function and naming.
> > > > I'm fine with temp functions should precede built-in function and can
> > > > override built-in functions (we already support to override built-in
> > > > function in 1.9).
> > > > If we don't allow the same name as a built-in function, I'm afraid we
> > > >
> > > > will
> > > >
> > > > have compatibility issues in the future.
> > > > Say users register a user defined function named "explode" in 1.9,
> and
> > we
> > > > support a built-in "explode" function in 1.10.
> > > > Then the user's jobs which call the registered "explode" function in
> > 1.9
> > > > will all fail in 1.10 because of naming conflict.
> > > >
> > > > Regarding #2 "External" built-in functions.
> > > > I think if we store external built-in functions in catalog, then
> > > > "hive1::sqrt" is a good way to go.
> > > > However, I would prefer to support a discovery mechanism (e.g. SPI)
> for
> > > > built-in functions as Timo suggested above.
> > > > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > > > function set as built-in functions in an easy way.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > > >
> > > > Hi David,
> > > >
> > > > Thank you for sharing your findings. It seems to me that there is no
> > SQL
> > > > standard regarding temporary functions. There are few systems that
> > > >
> > > > support
> > > >
> > > > it. Here are what I have found:
> > > >
> > > > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > > > 2. Spark: basically follows Hive (
> > > >
> > > >
> > > >
> > >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > > >
> > > > )
> > > > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
> > overwriting
> > > > behavior. (
> > >
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> > > >
> > > > )
> > > >
> > > > Because of lack of standard, it's perfectly fine for Flink to define
> > > > whatever it sees appropriate. Thus, your proposal (no overwriting and
> > > >
> > > > must
> > > >
> > > > have DB as holder) is one option. The advantage is simplicity, The
> > > > downside
> > > > is the deviation from Hive, which is popular and de facto standard in
> > big
> > > > data world.
> > > >
> > > > However, I don't think we have to follow Hive. More importantly, we
> > need
> > > >
> > > > a
> > > >
> > > > consensus. I have no objection if your proposal is generally agreed
> > upon.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org>
> > > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Just an opinion on the built-in <> temporary functions resolution and
> > > > NAMING issue. I think we should not allow overriding the built-in
> > > > functions, as this may pose serious issues and to be honest is rather
> > > > not feasible and would require major rework. What happens if a user
> > > > wants to override CAST? Calls to that function are generated at
> > > > different layers of the stack that unfortunately does not always go
> > > > through the Catalog API (at least yet). Moreover from what I've
> checked
> > > > no other systems allow overriding the built-in functions. All the
> > > > systems I've checked so far register temporary functions in a
> > > > database/schema (either special database for temporary functions, or
> > > > just current database). What I would suggest is to always register
> > > > temporary functions with a 3 part identifier. The same way as tables,
> > > > views etc. This effectively means you cannot override built-in
> > > > functions. With such approach it is natural that the temporary
> > functions
> > > > end up a step lower in the resolution order:
> > > >
> > > > 1. built-in functions (1 part, maybe 2? - this is still under
> > discussion)
> > > >
> > > > 2. temporary functions (always 3 part path)
> > > >
> > > > 3. catalog functions (always 3 part path)
> > > >
> > > > Let me know what do you think.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > >
> > > > On 04/09/2019 06:13, Bowen Li wrote:
> > > >
> > > > Hi,
> > > >
> > > > I agree with Xuefu that the main controversial points are mainly the
> > > >
> > > > two
> > > >
> > > > places. My thoughts on them:
> > > >
> > > > 1) Determinism of referencing Hive built-in functions. We can either
> > > >
> > > > remove
> > > >
> > > > Hive built-in functions from ambiguous function resolution and
> require
> > > > users to use special syntax for their qualified names, or add a
> config
> > > >
> > > > flag
> > > >
> > > > to catalog constructor/yaml for turning on and off Hive built-in
> > > >
> > > > functions
> > > >
> > > > with the flag set to 'false' by default and proper doc added to help
> > > >
> > > > users
> > > >
> > > > make their decisions.
> > > >
> > > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > > >
> > > > function
> > > >
> > > > resolution order. We believe Flink temp functions should precede
> Flink
> > > > built-in functions, and I have presented my reasons. Just in case if
> we
> > > > cannot reach an agreement, I propose forbid users registering temp
> > > > functions in the same name as a built-in function, like MySQL's
> > > >
> > > > approach,
> > > >
> > > > for the moment. It won't have any performance concern, since built-in
> > > > functions are all in memory and thus cost of a name check will be
> > > >
> > > > really
> > > >
> > > > trivial.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > > >
> > > >  From what I have seen, there are a couple of focal disagreements:
> > > >
> > > > 1. Resolution order: temp function --> flink built-in function -->
> > > >
> > > > catalog
> > > >
> > > > function vs flink built-in function --> temp function -> catalog
> > > >
> > > > function.
> > > >
> > > > 2. "External" built-in functions: how to treat built-in functions in
> > > > external system and how users reference them
> > > >
> > > > For #1, I agree with Bowen that temp function needs to be at the
> > > >
> > > > highest
> > > >
> > > > priority because that's how a user might overwrite a built-in
> function
> > > > without referencing a persistent, overwriting catalog function with a
> > > >
> > > > fully
> > > >
> > > > qualified name. Putting built-in functions at the highest priority
> > > > eliminates that usage.
> > > >
> > > > For #2, I saw a general agreement on referencing "external" built-in
> > > > functions such as those in Hive needs to be explicit and
> deterministic
> > > >
> > > > even
> > > >
> > > > though different approaches are proposed. To limit the scope and
> > > >
> > > > simply
> > > >
> > > > the
> > > >
> > > > usage, it seems making sense to me to introduce special syntax for
> > > >
> > > > user  to
> > > >
> > > > explicitly reference an external built-in function such as
> hive1::sqrt
> > > >
> > > > or
> > > >
> > > > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog
> API
> > > >
> > > > call
> > > >
> > > > hive1.getFunction(ObjectPath functionName) where the database name is
> > > > absent for bulit-in functions available in that catalog hive1. I
> > > >
> > > > understand
> > > >
> > > > that Bowen's original proposal was trying to avoid this, but this
> > > >
> > > > could
> > > >
> > > > turn out to be a clean and simple solution.
> > > >
> > > > (Timo's modular approach is great way to "expand" Flink's built-in
> > > >
> > > > function
> > > >
> > > > set, which seems orthogonal and complementary to this, which could be
> > > > tackled in further future work.)
> > > >
> > > > I'd be happy to hear further thoughts on the two points.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <
> > > ykt836@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> > ykt836@gmail.com>
> > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com><
> > ykt836@gmail.com>
> > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> > ykt836@gmail.com>
> > > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
> > > >
> > > > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> > > >
> > > > the
> > > >
> > > > same
> > > > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > > > suggestion.
> > > >
> > > > The reason is backward compatibility. If we follow Bowen's approach,
> > > >
> > > > let's
> > > >
> > > > say we
> > > > first find function in Flink's built-in functions, and then hive's
> > > > built-in. For example, `foo`
> > > > is not supported by Flink, but hive has such built-in function. So
> > > >
> > > > user
> > > >
> > > > will have hive's
> > > > behavior for function `foo`. And in next release, Flink realize this
> > > >
> > > > is a
> > > >
> > > > very popular function
> > > > and add it into Flink's built-in functions, but with different
> > > >
> > > > behavior
> > > >
> > > > as
> > > >
> > > > hive's. So in next
> > > > release, the behavior changes.
> > > >
> > > > With Timo's approach, IIUC user have to tell the framework explicitly
> > > >
> > > > what
> > > >
> > > > kind of
> > > > built-in functions he would like to use. He can just tell framework
> > > >
> > > > to
> > > >
> > > > abandon Flink's built-in
> > > > functions, and use hive's instead. User can only choose between them,
> > > >
> > > > but
> > > >
> > > > not use
> > > > them at the same time. I think this approach is more predictable.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > > >
> > > > section
> > > >
> > > > in the google doc was updated, please take a look first and let me
> > > >
> > > > know
> > > >
> > > > if
> > > >
> > > > you have more questions.
> > > >
> > > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Timo,
> > > >
> > > > Re> 1) We should not have the restriction "hive built-in functions
> > > >
> > > > can
> > > >
> > > > only
> > > >
> > > > be used when current catalog is hive catalog". Switching a catalog
> > > > should only have implications on the cat.db.object resolution but
> > > >
> > > > not
> > > >
> > > > functions. It would be quite convinient for users to use Hive
> > > >
> > > > built-ins
> > > >
> > > > even if they use a Confluent schema registry or just the in-memory
> > > >
> > > > catalog.
> > > >
> > > > There might be a misunderstanding here.
> > > >
> > > > First of all, Hive built-in functions are not part of Flink
> > > >
> > > > built-in
> > > >
> > > > functions, they are catalog functions, thus if the current catalog
> > > >
> > > > is
> > > >
> > > > not a
> > > >
> > > > HiveCatalog but, say, a schema registry catalog, ambiguous
> > > >
> > > > functions
> > > >
> > > > reference just shouldn't be resolved to a different catalog.
> > > >
> > > > Second, Hive built-in functions can potentially be referenced
> > > >
> > > > across
> > > >
> > > > catalog, but it doesn't have db namespace and we currently just
> > > >
> > > > don't
> > > >
> > > > have
> > > >
> > > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > > >
> > > > defined,
> > > >
> > > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > > >
> > > > 2) I would propose to have separate concepts for catalog and
> > > >
> > > > built-in
> > > >
> > > > functions. In particular it would be nice to modularize built-in
> > > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > > >
> > > > maybe
> > > >
> > > > we add more experimental functions in the future or function for
> > > >
> > > > some
> > > >
> > > > special application area (Geo functions, ML functions). A data
> > > >
> > > > platform
> > > >
> > > > team might not want to make every built-in function available. Or a
> > > > function module like ML functions is in a different Maven module.
> > > >
> > > > I think this is orthogonal to this FLIP, especially we don't have
> > > >
> > > > the
> > > >
> > > > "external built-in functions" anymore and currently the built-in
> > > >
> > > > function
> > > >
> > > > category remains untouched.
> > > >
> > > > But just to share some thoughts on the proposal, I'm not sure about
> > > >
> > > > it:
> > > >
> > > > - I don't know if any other databases handle built-in functions
> > > >
> > > > like
> > > >
> > > > that.
> > > >
> > > > Maybe you can give some examples? IMHO, built-in functions are
> > > >
> > > > system
> > > >
> > > > info
> > > >
> > > > and should be deterministic, not depending on loaded libraries. Geo
> > > > functions should be either built-in already or just libraries
> > > >
> > > > functions,
> > > >
> > > > and library functions can be adapted to catalog APIs or of some
> > > >
> > > > other
> > > >
> > > > syntax to use
> > > > - I don't know if all use cases stand, and many can be achieved by
> > > >
> > > > other
> > > >
> > > > approaches too. E.g. experimental functions can be taken good care
> > > >
> > > > of
> > > >
> > > > by
> > > >
> > > > documentations, annotations, etc
> > > > - the proposal basically introduces some concept like a pluggable
> > > >
> > > > built-in
> > > >
> > > > function catalog, despite the already existing catalog APIs
> > > > - it brings in even more complicated scenarios to the design. E.g.
> > > >
> > > > how
> > > >
> > > > do
> > > >
> > > > you handle built-in functions in different modules but different
> > > >
> > > > names?
> > > >
> > > > In short, I'm not sure if it really stands and it looks like an
> > > >
> > > > overkill
> > > >
> > > > to me. I'd rather not go to that route. Related discussion can be
> > > >
> > > > on
> > > >
> > > > its
> > > >
> > > > own thread.
> > > >
> > > > 3) Following the suggestion above, we can have a separate discovery
> > > > mechanism for built-in functions. Instead of just going through a
> > > >
> > > > static
> > > >
> > > > list like in BuiltInFunctionDefinitions, a platform team should be
> > > >
> > > > able
> > > >
> > > > to select function modules like
> > > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > > HiveFunctions) or via service discovery;
> > > >
> > > > Same as above. I'll leave it to its own thread.
> > > >
> > > > re > 3) Dawid and I discussed the resulution order again. I agree
> > > >
> > > > with
> > > >
> > > > Kurt
> > > >
> > > > that we should unify built-in function (external or internal)
> > > >
> > > > under a
> > > >
> > > > common layer. However, the resolution order should be:
> > > >    1. built-in functions
> > > >    2. temporary functions
> > > >    3. regular catalog resolution logic
> > > > Otherwise a temporary function could cause clashes with Flink's
> > > >
> > > > built-in
> > > >
> > > > functions. If you take a look at other vendors, like SQL Server
> > > >
> > > > they
> > > >
> > > > also do not allow to overwrite built-in functions.
> > > >
> > > > ”I agree with Kurt that we should unify built-in function (external
> > > >
> > > > or
> > > >
> > > > internal) under a common layer.“ <- I don't think this is what Kurt
> > > >
> > > > means.
> > > >
> > > > Kurt and I are in favor of unifying built-in functions of external
> > > >
> > > > systems
> > > >
> > > > and catalog functions. Did you type a mistake?
> > > >
> > > > Besides, I'm not sure about the resolution order you proposed.
> > > >
> > > > Temporary
> > > >
> > > > functions have a lifespan over a session and are only visible to
> > > >
> > > > the
> > > >
> > > > session owner, they are unique to each user, and users create them
> > > >
> > > > on
> > > >
> > > > purpose to be the highest priority in order to overwrite system
> > > >
> > > > info
> > > >
> > > > (built-in functions in this case).
> > > >
> > > > In your case, why would users name a temporary function the same
> > > >
> > > > as a
> > > >
> > > > built-in function then? Since using that name in ambiguous function
> > > > reference will always be resolved to built-in functions, creating a
> > > > same-named temp function would be meaningless in the end.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Jingsong,
> > > >
> > > > Re> 1.Hive built-in functions is an intermediate solution. So we
> > > >
> > > > should
> > > >
> > > > not introduce interfaces to influence the framework. To make
> > > > Flink itself more powerful, we should implement the functions
> > > > we need to add.
> > > >
> > > > Yes, please see the doc.
> > > >
> > > > Re> 2.Non-flink built-in functions are easy for users to change
> > > >
> > > > their
> > > >
> > > > behavior. If we support some flink built-in functions in the
> > > > future but act differently from non-flink built-in, this will
> > > >
> > > > lead
> > > >
> > > > to
> > > >
> > > > changes in user behavior.
> > > >
> > > > There's no such concept as "external built-in functions" any more.
> > > > Built-in functions of external systems will be treated as special
> > > >
> > > > catalog
> > > >
> > > > functions.
> > > >
> > > > Re> Another question is, does this fallback include all
> > > >
> > > > hive built-in functions? As far as I know, some hive functions
> > > > have some hacky. If possible, can we start with a white list?
> > > > Once we implement some functions to flink built-in, we can
> > > > also update the whitelist.
> > > >
> > > > Yes, that's something we thought of too. I don't think it's super
> > > > critical to the scope of this FLIP, thus I'd like to leave it to
> > > >
> > > > future
> > > >
> > > > efforts as a nice-to-have feature.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Kurt,
> > > >
> > > > Re: > What I want to propose is we can merge #3 and #4, make them
> > > >
> > > > both
> > > >
> > > > under
> > > >
> > > > "catalog" concept, by extending catalog function to make it have
> > > >
> > > > ability to
> > > >
> > > > have built-in catalog functions. Some benefits I can see from
> > > >
> > > > this
> > > >
> > > > approach:
> > > >
> > > > 1. We don't have to introduce new concept like external built-in
> > > >
> > > > functions.
> > > >
> > > > Actually I don't see a full story about how to treat a built-in
> > > >
> > > > functions, and it
> > > >
> > > > seems a little bit disrupt with catalog. As a result, you have
> > > >
> > > > to
> > > >
> > > > make
> > > >
> > > > some restriction
> > > >
> > > > like "hive built-in functions can only be used when current
> > > >
> > > > catalog
> > > >
> > > > is
> > > >
> > > > hive catalog".
> > > >
> > > > Yes, I've unified #3 and #4 but it seems I didn't update some
> > > >
> > > > part
> > > >
> > > > of
> > > >
> > > > the doc. I've modified those sections, and they are up to date
> > > >
> > > > now.
> > > >
> > > > In short, now built-in function of external systems are defined
> > > >
> > > > as
> > > >
> > > > a
> > > >
> > > > special kind of catalog function in Flink, and handled by Flink
> > > >
> > > > as
> > > >
> > > > following:
> > > > - An external built-in function must be associated with a catalog
> > > >
> > > > for
> > > >
> > > > the purpose of decoupling flink-table and external systems.
> > > > - It always resides in front of catalog functions in ambiguous
> > > >
> > > > function
> > > >
> > > > reference order, just like in its own external system
> > > > - It is a special catalog function that doesn’t have a
> > > >
> > > > schema/database
> > > >
> > > > namespace
> > > > - It goes thru the same instantiation logic as other user defined
> > > > catalog functions in the external system
> > > >
> > > > Please take another look at the doc, and let me know if you have
> > > >
> > > > more
> > > >
> > > > questions.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <
> > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > twalthr@apache.org><tw...@apache.org> <tw...@apache.org> <
> > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org>
> > > >
> > > > wrote:
> > > >
> > > > Hi Kurt,
> > > >
> > > > it should not affect the functions and operations we currently
> > > >
> > > > have
> > > >
> > > > in
> > > >
> > > > SQL. It just categorizes the available built-in functions. It is
> > > >
> > > > kind
> > > >
> > > > of
> > > > an orthogonal concept to the catalog API but built-in functions
> > > >
> > > > deserve
> > > >
> > > > this special kind of treatment. CatalogFunction still fits
> > > >
> > > > perfectly
> > > >
> > > > in
> > > >
> > > > there because the regular catalog object resolution logic is not
> > > > affected. So tables and functions are resolved in the same way
> > > >
> > > > but
> > > >
> > > > with
> > > >
> > > > built-in functions that have priority as in the original design.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 03.09.19 15:26, Kurt Young wrote:
> > > >
> > > > Does this only affect the functions and operations we currently
> > > >
> > > > have
> > > >
> > > > in SQL
> > > >
> > > > and
> > > > have no effect on tables, right? Looks like this is an
> > > >
> > > > orthogonal
> > > >
> > > > concept
> > > >
> > > > with Catalog?
> > > > If the answer are both yes, then the catalog function will be a
> > > >
> > > > weird
> > > >
> > > > concept?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > The way you proposed are basically the same as what Calcite
> > > >
> > > > does, I
> > > >
> > > > think
> > > >
> > > > we are in the same line.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> > > >
> > > > ,写道:
> > > >
> > > > This sounds exactly as the module approach I mentioned, no?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > On 03.09.19 13:42, Danny Chan wrote:
> > > >
> > > > Thanks Bowen for bring up this topic, I think it’s a useful
> > > >
> > > > refactoring to make our function usage more user friendly.
> > > >
> > > > For the topic of how to organize the builtin operators and
> > > >
> > > > operators
> > > >
> > > > of Hive, here is a solution from Apache Calcite, the Calcite
> > > >
> > > > way
> > > >
> > > > is
> > > >
> > > > to make
> > > >
> > > > every dialect operators a “Library”, user can specify which
> > > >
> > > > libraries they
> > > >
> > > > want to use for a sql query. The builtin operators always
> > > >
> > > > comes
> > > >
> > > > as
> > > >
> > > > the
> > > >
> > > > first class objects and the others are used from the order
> > > >
> > > > they
> > > >
> > > > appears.
> > > >
> > > > Maybe you can take a reference.
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> > > >
> > > > ,写道:
> > > >
> > > > Hi folks,
> > > >
> > > > I'd like to kick off a discussion on reworking Flink's
> > > >
> > > > FunctionCatalog.
> > > >
> > > > It's critically helpful to improve function usability in
> > > >
> > > > SQL.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > > >
> > > > In short, it:
> > > > - adds support for precise function reference with
> > > >
> > > > fully/partially
> > > >
> > > > qualified name
> > > > - redefines function resolution order for ambiguous
> > > >
> > > > function
> > > >
> > > > reference
> > > >
> > > > - adds support for Hive's rich built-in functions (support
> > > >
> > > > for
> > > >
> > > > Hive
> > > >
> > > > user
> > > >
> > > > defined functions was already added in 1.9.0)
> > > > - clarifies the concept of temporary functions
> > > >
> > > > Would love to hear your thoughts.
> > > >
> > > > Bowen
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Fabian,
Thank you for your response.
Regarding the temporary function, just wanted to clarify one thing: the
3-part identifier does not mean the user always has to provide the catalog
& database explicitly. The same way user does not have to provide them in
e.g. when creating permanent table, view etc. It means though functions are
always stored within a database. The same way as all the permanent objects
and other temporary objects(tables, views). If not given explicitly the
current catalog & database would be used, both in the create statement or
when using the function.

Point taken though your preference would be to support overriding built-in
functions.

Best,
Dawid

On Wed, 11 Sep 2019, 21:14 Fabian Hueske, <fh...@gmail.com> wrote:

> Hi all,
>
> I'd like to add my opinion on this topic as well ;-)
>
> In general, I think overriding built-in function with temp functions has a
> couple of benefits but also a few challenges:
>
> * Users can reimplement the behavior of a built-in functions of a different
> system, e.g., for backward compatibility after a migration.
> * I don't think that "accidental" overrides and surprising semantics are an
> issue or dangerous. The user registered the temp function in the same
> session and should therefore be aware of the changed semantics.
> * I see that not all built-in functions can be overridden, like the CAST
> example that Dawid gave. However, I think these should be a small fraction
> and such functions could be blacklisted. Sure, that's not super consistent,
> but should (IMO) not be a big issue in practice.
> * Temp functions should be easy to use. Requiring a 3-part addressing makes
> them a lot less user friendly, IMO. Users need to think about what catalog
> and db to choose when registering them. Also using a temp function in a
> query becomes less convenient. Moreover, I agree with Bowen's concerns that
> a 3-part addressing scheme reduces the temporal appearance of the function.
>
> From the three possible solutions, my preference order is
> 1) 1-part address with override of built-in
> 2) 1-part address without override of built-in
> 3) 3-part address
>
> Regarding the issue of external built-in functions, I don't think that
> Timo's proposal of modules is fully orthogonal to this discussion.
> A Hive function module could be an alternative to offering Hive functions
> as part of Hive's catalog.
> From a user's point of view, I think that modules would be a "cleaner"
> integration ("Why do I need a Hive catalog if all I want to do is apply a
> Hive function on a Kafka table?").
> However, the module approach clearly has the problem of dealing with
> same-named functions in different modules (e.g., a Hive function and a
> Flink built-in function).
> The catalog approach as the benefit that functions can be addressed like
> hiveCat::func (or a similar path).
>
> I'm not sure what's the best solution here.
>
> Cheers,
> Fabian
>
>
> Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li <bo...@gmail.com>:
>
> > Hi,
> >
> > W.r.t temp functions, I feel both options have their benefits and can
> > theoretically achieve similar functionalities one way or another. In the
> > end, it's more about use cases, users habits, and trade-offs.
> >
> > Re> Not always users are in full control of the catalog functions. There
> is
> > also the case where different teams manage the catalog & use the catalog.
> >
> > Temp functions live within a session, and not within a catalog. Having
> > 3-part paths may implies temp functions are tied to a catalog in two
> > aspects.
> > 1) it may indicate each catalog manages their temp functions, which is
> not
> > true as we seem all agree they should reside at a central place, either
> in
> > FunctionCatalog or CatalogManager
> > 2) it may indicate there's some access control. When users are forbidden
> to
> > manipulate some objects in the catalog that's managed by other teams, but
> > are allowed to manipulate some other objects (temp functions in this
> case)
> > belonging to the catalog in namespaces, users may think we introduced
> extra
> > complexity and confusion with some kind of access control into the
> problem.
> > It doesn't feel intuitive enough for end users.
> >
> > Thus, I'd be in favor of 1-part path for temporary functions, and other
> > temp objects.
> >
> > Thanks,
> > Bowen
> >
> >
> >
> > On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz <dw...@apache.org>
> > wrote:
> >
> > > I agree the consequences of the decision are substantial. Let's see
> what
> > > others think.
> > >
> > > -- Catalog functions are defined by users, and we suppose they can
> > > drop/alter it in any way they want. Thus, overwriting a catalog
> function
> > > doesn't seem to be a strong use case that we should be concerned about.
> > > Rather, there are known use case for overwriting built-in functions.
> > >
> > > Not always users are in full control of the catalog functions. There is
> > > also the case where different teams manage the catalog & use the
> catalog.
> > > As for overriding built-in functions with 3-part approach user can
> always
> > > use an equally named function from a catalog. E.g. to override
> > >
> > > *    SELECT explode(arr) FROM ...*
> > >
> > > user can always write:
> > >
> > > *    SELECT db.explode(arr) FROM ...*
> > >
> > > Best,
> > >
> > > Dawid
> > > On 06/09/2019 10:54, Xuefu Z wrote:
> > >
> > > Hi Dawid,
> > >
> > > Thank you for your summary. While the only difference in the two
> > proposals
> > > is one- or three-part in naming, the consequence would be substantial.
> > >
> > > To me, there are two major use cases of temporary functions compared to
> > > persistent ones:
> > > 1. Temporary in nature and auto managed by the session. More often than
> > > not, admin doesn't even allow user to create persistent functions.
> > > 2. Provide an opportunity to overwriting system built-in functions.
> > >
> > > Since built-in functions has one-part name, requiring three-part name
> for
> > > temporary functions eliminates the overwriting opportunity.
> > >
> > > One-part naming essentially puts all temp functions under a single
> > > namespace and simplifies function resolution, such as we don't need to
> > > consider the case of a temp function and a persistent function with the
> > > same name under the same database.
> > >
> > > I agree having three-parts does have its merits, such as consistency
> with
> > > other temporary objects (table) and minor difference between temp vs
> > > catalog functions. However, there is a slight difference between tables
> > and
> > > function in that there is no built-in table in SQL so there is no need
> to
> > > overwrite it.
> > >
> > > I'm not sure if I fully agree the benefits you listed as the advantages
> > of
> > > the three-part naming of temp functions.
> > >   -- Allowing overwriting built-in functions is a benefit and the
> > solution
> > > for disallowing certain overwriting shouldn't be totally banning it.
> > >   -- Catalog functions are defined by users, and we suppose they can
> > > drop/alter it in any way they want. Thus, overwriting a catalog
> function
> > > doesn't seem to be a strong use case that we should be concerned about.
> > > Rather, there are known use case for overwriting built-in functions.
> > >
> > > Thus, personally I would prefer one-part name for temporary functions.
> In
> > > lack of SQL standard on this, I certainly like to get opinions from
> > others
> > > to see if a consensus can be eventually reached.
> > >
> > > (To your point on modular approach to support external built-in
> > functions,
> > > we saw the value and are actively looking into it. Thanks for sharing
> > your
> > > opinion on that.)
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > <dw...@apache.org>
> > > wrote:
> > >
> > >
> > > Hi Xuefu,
> > >
> > > Thank you for your answers.
> > >
> > > Let me summarize my understanding. In principle we differ only in
> regards
> > > to the fact if a temporary function can be only 1-part or only 3-part
> > > identified. I can reconfirm that if the community decides it prefers
> the
> > > 1-part approach I will commit to that, with the assumption that we will
> > > force ONLY 1-part function names. (We will parse identifier and throw
> > > exception if a user tries to register e.g. db.temp_func).
> > >
> > > My preference is though the 3-part approach:
> > >
> > >    - there are some functions that it makes no sense to override, e.g.
> > >    CAST, moreover I'm afraid that allowing overriding such will lead to
> > high
> > >    inconsistency, similar to those that I mentioned spark has
> > >    - you cannot shadow a fully-qualified function. (If a user fully
> > >    qualifies his/her objects in a SQL query, which is often considered
> a
> > good
> > >    practice)
> > >    - it does not differentiate between functions & temporary functions.
> > >    Temporary functions just differ with regards to their life-cycle.
> The
> > >    registration & usage is exactly the same.
> > >
> > > As it can be seen, the proposed concept regarding temp function and
> > > function resolution is quite simple.
> > >
> > > Both approaches are equally simple. I would even say the 3-part
> approach
> > > is slightly simpler as it does not have to care about some special
> > built-in
> > > functions such as CAST.
> > >
> > > I don't want to express my opinion on the differentiation between
> > built-in
> > > functions and "external" built-in functions in this thread as it is
> > rather
> > > orthogonal, but I also like the modular approach and I definitely don't
> > > like the special syntax "cat::function". I think it's better to stick
> to
> > a
> > > standard or at least other proved solutions from other systems.
> > >
> > > Best,
> > >
> > > Dawid
> > > On 05/09/2019 10:12, Xuefu Z wrote:
> > >
> > > Hi David,
> > >
> > > Thanks for sharing your thoughts and  request for clarifications. I
> > believe
> > > that I fully understood your proposal, which does has its merit.
> However,
> > > it's different from ours. Here are the answers to your questions:
> > >
> > > Re #1: yes, the temp functions in the proposal are global and have just
> > > one-part names, similar to built-in functions. Two- or three-part names
> > are
> > > not allowed.
> > >
> > > Re #2: not applicable as two- or three-part names are disallowed.
> > >
> > > Re #3: same as above. Referencing external built-in functions is
> achieved
> > > either implicitly (only the built-in functions in the current catalogs
> > are
> > > considered) or via special syntax such as cat::function. However, we
> are
> > > looking into the modular approach that Time suggested with other
> feedback
> > > received from the community.
> > >
> > > Re #4: the resolution order goes like the following in our proposal:
> > >
> > > 1. temporary functions
> > > 2. bulit-in functions (including those augmented by add-on modules)
> > > 3. built-in functions in current catalog (this will not be needed if
> the
> > > special syntax "cat::function" is required)
> > > 4. functions in current catalog and db.
> > >
> > > If we go with the modular approach and make external built-in functions
> > as
> > > an add-on module, the 2 and 3 above will be combined. In essence, the
> > > resolution order is equivalent in the two approaches.
> > >
> > > By the way, resolution order matters only for simple name reference.
> For
> > > names such as db.function (interpreted as current_cat/db/function) or
> > > cat.db.function, the reference is unambiguous, so on resolution is
> > needed.
> > >
> > > As it can be seen, the proposed concept regarding temp function and
> > > function resolution is quite simple. Additionally, the proposed
> > resolution
> > > order allows temp function to shadow a built-in function, which is
> > > important (though not decisive) in our opinion.
> > >
> > > I started liking the modular approach as the resolution order will only
> > > include 1, 2, and 4, which is simpler and more generic. That's why I
> > > suggested we look more into this direction.
> > >
> > > Please let me know if there are further questions.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > >
> > >
> > > On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > > wrote:
> > >
> > >
> > > Hi Xuefu,
> > >
> > > Just wanted to summarize my opinion on the one topic (temporary
> > functions).
> > >
> > > My preference would be to make temporary functions always 3-part
> > qualified
> > > (as a result that would prohibit overriding built-in functions). Having
> > > said that if the community decides that it's better to allow overriding
> > > built-in functions I am fine with it and can commit to that decision.
> > >
> > > I wanted to ask if you could clarify a few points for me around that
> > > option.
> > >
> > >    1. Would you enforce temporary functions to be always just a single
> > >    name (without db & cat) as hive does, or would you allow also 3 or
> > even 2
> > >    part identifiers?
> > >    2. Assuming 2/3-part paths. How would you register a function from a
> > >    following statement: CREATE TEMPORARY FUNCTION db.func? Would that
> > shadow
> > >    all functions named 'func' in all databases named 'db' in all
> > catalogs? Or
> > >    would you shadow only function 'func' in database 'db' in current
> > catalog?
> > >    3. This point is still under discussion, but was mentioned a few
> > >    times, that maybe we want to enable syntax cat.func for "external
> > built-in
> > >    functions". How would that affect statement from previous point?
> Would
> > >    'db.func' shadow "external built-in function" in 'db' catalog or
> user
> > >    functions as in point 2? Or maybe both?
> > >    4. Lastly in fact to summarize the previous points. Assuming
> 2/3-part
> > >    paths. Would the function resolution be actually as follows?:
> > >       1. temporary functions (1-part path)
> > >       2. built-in functions
> > >       3. temporary functions (2-part path)
> > >       4. 2-part catalog functions a.k.a. "external built-in functions"
> > >       (cat + func) - this is still under discussion, if we want that in
> > the other
> > >       focal point
> > >       5. temporary functions (3-part path)
> > >       6. 3-part catalog functions a.k.a. user functions
> > >
> > > I would be really grateful if you could explain me those questions,
> > thanks.
> > >
> > > BTW, Thank you all for a healthy discussion.
> > >
> > > Best,
> > >
> > > Dawid
> > > On 04/09/2019 23:25, Xuefu Z wrote:
> > >
> > > Thank all for the sharing thoughts. I think we have gathered some
> useful
> > > initial feedback from this long discussion with a couple of focal
> points
> > > sticking out.
> > >
> > >  We will go back to do more research and adapt our proposal. Once it's
> > > ready, we will ask for a new round of review. If there is any
> > disagreement,
> > > we will start a new discussion thread on each rather than having a mega
> > > discussion like this.
> > >
> > > Thanks to everyone for participating.
> > >
> > > Regards,
> > > Xuefu
> > >
> > >
> > > On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> wrote:
> > >
> > >
> > > Let me try to summarize and conclude the long thread so far:
> > >
> > > 1. For order of temp function v.s. built-in function:
> > >
> > > I think Dawid's point that temp function should be of fully qualified
> > path
> > > is a better reasoning to back the newly proposed order, and i agree we
> > > don't need to follow Hive/Spark.
> > >
> > > However, I'd rather not change fundamentals of temporary functions in
> > this
> > > FLIP. It belongs to a bigger story of how temporary objects should be
> > > redefined and be handled uniformly - currently temporary tables and
> views
> > > (those registered from TableEnv#registerTable()) behave different than
> > what
> > > Dawid propose for temp functions, and we need a FLIP to just unify
> their
> > > APIs and behaviors.
> > >
> > > I agree that backward compatibility is not an issue w.r.t Jark's
> points.
> > >
> > > ***Seems we do have consensus that it's acceptable to prevent users
> > > registering a temp function in the same name as a built-in function. To
> > > help us move forward, I'd like to propose setting such a restraint on
> > temp
> > > functions in this FLIP to simplify the design and avoid disputes.*** It
> > > will also leave rooms for improvements in the future.
> > >
> > >
> > > 2. For Hive built-in function:
> > >
> > > Thanks Timo for providing the Presto and Postgres examples. I feel
> > modular
> > > built-in functions can be a good fit for the geo and ml example as a
> > native
> > > Flink extension, but not sure if it fits well with external
> integrations.
> > > Anyway, I think modular built-in functions is a bigger story and can be
> > on
> > > its own thread too, and our proposal doesn't prevent Flink from doing
> > that
> > > in the future.
> > >
> > > ***Seems we have consensus that users should be able to use built-in
> > > functions of Hive or other external systems in SQL explicitly and
> > > deterministically regardless of Flink built-in functions and the
> > potential
> > > modular built-in functions, via some new syntax like "mycat::func"? If
> > so,
> > > I'd like to propose removing Hive built-in functions from ambiguous
> > > function resolution order, and empower users with such a syntax. This
> way
> > > we sacrifice a little convenience for certainty***
> > >
> > >
> > > What do you think?
> > >
> > > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > <dw...@apache.org>
> > > wrote:
> > >
> > >
> > > Hi,
> > >
> > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> > >
> > > are
> > >
> > > very inconsistent in that manner (spark being way worse on that).
> > >
> > > Hive:
> > >
> > > You cannot overwrite all the built-in functions. I could overwrite most
> > >
> > > of
> > >
> > > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > >
> > >
> > > *    ParseException line 1:29 cannot recognize input near 'array' 'AS'
> *
> > >
> > > What is interesting is that I cannot ovewrite *array*, but I can
> ovewrite
> > > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > > overwrite a function. When I drop the temporary function the native
> > > function is still available.
> > >
> > > Spark:
> > >
> > > Spark's behavior imho is super bad.
> > >
> > > Theoretically I could overwrite all functions. I was able e.g. to
> > > overwrite CAST function. I had to use though CREATE OR REPLACE
> TEMPORARY
> > > FUNCTION syntax. Otherwise I get an exception that a function already
> > > exists. However when I used the CAST function in a query it used the
> > > native, built-in one.
> > >
> > > When I overwrote current_date() function, it was used in a query, but
> it
> > > completely replaces the built-in function and I can no longer use the
> > > native function in any way. I cannot also drop the temporary function.
> I
> > > get:
> > >
> > > *    Error in query: Cannot drop native function 'current_date';*
> > >
> > > Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> > > with a database. Temporary functions are always represented as a single
> > > name.
> > >
> > > In my opinion neither of the systems have consistent behavior.
> Generally
> > > speaking I think overwriting any system provided functions is just
> > > dangerous.
> > >
> > > Regarding Jark's concerns. Such functions would be registered in a
> > >
> > > current
> > >
> > > catalog/database schema, so a user could still use its own function,
> but
> > > would have to fully qualify the function (because built-in functions
> take
> > > precedence). Moreover users would have the same problem with permanent
> > > functions. Imagine a user have a permanent function 'cat.db.explode'.
> In
> > > 1.9 the user could use just the 'explode' function as long as the
> 'cat' &
> > > 'db' were the default catalog & database. If we introduce 'explode'
> > > built-in function in 1.10, the user has to fully qualify the function.
> > >
> > > Best,
> > >
> > > Dawid
> > > On 04/09/2019 15:19, Timo Walther wrote:
> > >
> > > Hi all,
> > >
> > > thanks for the healthy discussion. It is already a very long discussion
> > > with a lot of text. So I will just post my opinion to a couple of
> > > statements:
> > >
> > >
> > > Hive built-in functions are not part of Flink built-in functions, they
> > >
> > > are catalog functions
> > >
> > > That is not entirely true. Correct me if I'm wrong but I think Hive
> > > built-in functions are also not catalog functions. They are not stored
> in
> > > every Hive metastore catalog that is freshly created but are a set of
> > > functions that are listed somewhere and made available.
> > >
> > >
> > > ambiguous functions reference just shouldn't be resolved to a different
> > >
> > > catalog
> > >
> > > I agree. They should not be resolved to a different catalog. That's
> why I
> > > am suggesting to split the concept of built-in functions and catalog
> > >
> > > lookup
> > >
> > > semantics.
> > >
> > >
> > > I don't know if any other databases handle built-in functions like that
> > >
> > > What I called "module" is:
> > > - Extension in Postgres [1]
> > > - Plugin in Presto [2]
> > >
> > > Btw. Presto even mentions example modules that are similar to the ones
> > > that we will introduce in the near future both for ML and System XYZ
> > > compatibility:
> > > "See either the presto-ml module for machine learning functions or the
> > > presto-teradata-functions module for Teradata-compatible functions,
> both
> > >
> > > in
> > >
> > > the root of the Presto source."
> > >
> > >
> > > functions should be either built-in already or just libraries
> > >
> > > functions,
> > >
> > > and library functions can be adapted to catalog APIs or of some other
> > > syntax to use
> > >
> > > Regarding "built-in already", of course we can add a lot of functions
> as
> > > built-ins but we will end-up in a dependency hell in the near future if
> > >
> > > we
> > >
> > > don't introduce a pluggable approach. Library functions is what you
> also
> > > suggest but storing them in a catalog means to always fully qualify
> them
> > >
> > > or
> > >
> > > modifying the existing catalog design that was inspired by the
> standard.
> > >
> > > I don't think "it brings in even more complicated scenarios to the
> > > design", it just does clear separation of concerns. Integrating the
> > > functionality into the current design makes the catalog API more
> > > complicated.
> > >
> > >
> > > why would users name a temporary function the same as a built-in
> > >
> > > function then?
> > >
> > > Because you never know what users do. If they don't, my suggested
> > > resolution order should not be a problem, right?
> > >
> > >
> > > I don't think hive functions deserves be a function module
> > >
> > > Our goal is not to create a Hive clone. We need to think forward and
> Hive
> > > is just one of many systems that we can support. Not every built-in
> > > function behaves and will behave exactly like Hive.
> > >
> > >
> > > regarding temporary functions, there are few systems that support it
> > >
> > > IMHO Spark and Hive are not always the best examples for consistent
> > > design. Systems like Postgres, Presto, or SQL Server should be used as
> a
> > > reference. I don't think that a user can overwrite a built-in function
> > > there.
> > >
> > > Regards,
> > > Timo
> > >
> > > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > > [2] https://prestodb.github.io/docs/current/develop/functions.html
> > >
> > >
> > > On 04.09.19 13:44, Jark Wu wrote:
> > >
> > > Hi all,
> > >
> > > Regarding #1 temp function <> built-in function and naming.
> > > I'm fine with temp functions should precede built-in function and can
> > > override built-in functions (we already support to override built-in
> > > function in 1.9).
> > > If we don't allow the same name as a built-in function, I'm afraid we
> > >
> > > will
> > >
> > > have compatibility issues in the future.
> > > Say users register a user defined function named "explode" in 1.9, and
> we
> > > support a built-in "explode" function in 1.10.
> > > Then the user's jobs which call the registered "explode" function in
> 1.9
> > > will all fail in 1.10 because of naming conflict.
> > >
> > > Regarding #2 "External" built-in functions.
> > > I think if we store external built-in functions in catalog, then
> > > "hive1::sqrt" is a good way to go.
> > > However, I would prefer to support a discovery mechanism (e.g. SPI) for
> > > built-in functions as Timo suggested above.
> > > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > > function set as built-in functions in an easy way.
> > >
> > > Best,
> > > Jark
> > >
> > > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > >
> > > Hi David,
> > >
> > > Thank you for sharing your findings. It seems to me that there is no
> SQL
> > > standard regarding temporary functions. There are few systems that
> > >
> > > support
> > >
> > > it. Here are what I have found:
> > >
> > > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > > 2. Spark: basically follows Hive (
> > >
> > >
> > >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > >
> > > )
> > > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
> overwriting
> > > behavior. (
> > http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> > >
> > > )
> > >
> > > Because of lack of standard, it's perfectly fine for Flink to define
> > > whatever it sees appropriate. Thus, your proposal (no overwriting and
> > >
> > > must
> > >
> > > have DB as holder) is one option. The advantage is simplicity, The
> > > downside
> > > is the deviation from Hive, which is popular and de facto standard in
> big
> > > data world.
> > >
> > > However, I don't think we have to follow Hive. More importantly, we
> need
> > >
> > > a
> > >
> > > consensus. I have no objection if your proposal is generally agreed
> upon.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <
> dwysakowicz@apache.org
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org>
> > > wrote:
> > >
> > > Hi all,
> > >
> > > Just an opinion on the built-in <> temporary functions resolution and
> > > NAMING issue. I think we should not allow overriding the built-in
> > > functions, as this may pose serious issues and to be honest is rather
> > > not feasible and would require major rework. What happens if a user
> > > wants to override CAST? Calls to that function are generated at
> > > different layers of the stack that unfortunately does not always go
> > > through the Catalog API (at least yet). Moreover from what I've checked
> > > no other systems allow overriding the built-in functions. All the
> > > systems I've checked so far register temporary functions in a
> > > database/schema (either special database for temporary functions, or
> > > just current database). What I would suggest is to always register
> > > temporary functions with a 3 part identifier. The same way as tables,
> > > views etc. This effectively means you cannot override built-in
> > > functions. With such approach it is natural that the temporary
> functions
> > > end up a step lower in the resolution order:
> > >
> > > 1. built-in functions (1 part, maybe 2? - this is still under
> discussion)
> > >
> > > 2. temporary functions (always 3 part path)
> > >
> > > 3. catalog functions (always 3 part path)
> > >
> > > Let me know what do you think.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 04/09/2019 06:13, Bowen Li wrote:
> > >
> > > Hi,
> > >
> > > I agree with Xuefu that the main controversial points are mainly the
> > >
> > > two
> > >
> > > places. My thoughts on them:
> > >
> > > 1) Determinism of referencing Hive built-in functions. We can either
> > >
> > > remove
> > >
> > > Hive built-in functions from ambiguous function resolution and require
> > > users to use special syntax for their qualified names, or add a config
> > >
> > > flag
> > >
> > > to catalog constructor/yaml for turning on and off Hive built-in
> > >
> > > functions
> > >
> > > with the flag set to 'false' by default and proper doc added to help
> > >
> > > users
> > >
> > > make their decisions.
> > >
> > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > >
> > > function
> > >
> > > resolution order. We believe Flink temp functions should precede Flink
> > > built-in functions, and I have presented my reasons. Just in case if we
> > > cannot reach an agreement, I propose forbid users registering temp
> > > functions in the same name as a built-in function, like MySQL's
> > >
> > > approach,
> > >
> > > for the moment. It won't have any performance concern, since built-in
> > > functions are all in memory and thus cost of a name check will be
> > >
> > > really
> > >
> > > trivial.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> > usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> > >
> > >  From what I have seen, there are a couple of focal disagreements:
> > >
> > > 1. Resolution order: temp function --> flink built-in function -->
> > >
> > > catalog
> > >
> > > function vs flink built-in function --> temp function -> catalog
> > >
> > > function.
> > >
> > > 2. "External" built-in functions: how to treat built-in functions in
> > > external system and how users reference them
> > >
> > > For #1, I agree with Bowen that temp function needs to be at the
> > >
> > > highest
> > >
> > > priority because that's how a user might overwrite a built-in function
> > > without referencing a persistent, overwriting catalog function with a
> > >
> > > fully
> > >
> > > qualified name. Putting built-in functions at the highest priority
> > > eliminates that usage.
> > >
> > > For #2, I saw a general agreement on referencing "external" built-in
> > > functions such as those in Hive needs to be explicit and deterministic
> > >
> > > even
> > >
> > > though different approaches are proposed. To limit the scope and
> > >
> > > simply
> > >
> > > the
> > >
> > > usage, it seems making sense to me to introduce special syntax for
> > >
> > > user  to
> > >
> > > explicitly reference an external built-in function such as hive1::sqrt
> > >
> > > or
> > >
> > > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> > >
> > > call
> > >
> > > hive1.getFunction(ObjectPath functionName) where the database name is
> > > absent for bulit-in functions available in that catalog hive1. I
> > >
> > > understand
> > >
> > > that Bowen's original proposal was trying to avoid this, but this
> > >
> > > could
> > >
> > > turn out to be a clean and simple solution.
> > >
> > > (Timo's modular approach is great way to "expand" Flink's built-in
> > >
> > > function
> > >
> > > set, which seems orthogonal and complementary to this, which could be
> > > tackled in further future work.)
> > >
> > > I'd be happy to hear further thoughts on the two points.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <
> > ykt836@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> ykt836@gmail.com>
> > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com><
> ykt836@gmail.com>
> > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <
> ykt836@gmail.com>
> > <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
> > >
> > > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> > >
> > > the
> > >
> > > same
> > > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > > suggestion.
> > >
> > > The reason is backward compatibility. If we follow Bowen's approach,
> > >
> > > let's
> > >
> > > say we
> > > first find function in Flink's built-in functions, and then hive's
> > > built-in. For example, `foo`
> > > is not supported by Flink, but hive has such built-in function. So
> > >
> > > user
> > >
> > > will have hive's
> > > behavior for function `foo`. And in next release, Flink realize this
> > >
> > > is a
> > >
> > > very popular function
> > > and add it into Flink's built-in functions, but with different
> > >
> > > behavior
> > >
> > > as
> > >
> > > hive's. So in next
> > > release, the behavior changes.
> > >
> > > With Timo's approach, IIUC user have to tell the framework explicitly
> > >
> > > what
> > >
> > > kind of
> > > built-in functions he would like to use. He can just tell framework
> > >
> > > to
> > >
> > > abandon Flink's built-in
> > > functions, and use hive's instead. User can only choose between them,
> > >
> > > but
> > >
> > > not use
> > > them at the same time. I think this approach is more predictable.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > >
> > > section
> > >
> > > in the google doc was updated, please take a look first and let me
> > >
> > > know
> > >
> > > if
> > >
> > > you have more questions.
> > >
> > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Timo,
> > >
> > > Re> 1) We should not have the restriction "hive built-in functions
> > >
> > > can
> > >
> > > only
> > >
> > > be used when current catalog is hive catalog". Switching a catalog
> > > should only have implications on the cat.db.object resolution but
> > >
> > > not
> > >
> > > functions. It would be quite convinient for users to use Hive
> > >
> > > built-ins
> > >
> > > even if they use a Confluent schema registry or just the in-memory
> > >
> > > catalog.
> > >
> > > There might be a misunderstanding here.
> > >
> > > First of all, Hive built-in functions are not part of Flink
> > >
> > > built-in
> > >
> > > functions, they are catalog functions, thus if the current catalog
> > >
> > > is
> > >
> > > not a
> > >
> > > HiveCatalog but, say, a schema registry catalog, ambiguous
> > >
> > > functions
> > >
> > > reference just shouldn't be resolved to a different catalog.
> > >
> > > Second, Hive built-in functions can potentially be referenced
> > >
> > > across
> > >
> > > catalog, but it doesn't have db namespace and we currently just
> > >
> > > don't
> > >
> > > have
> > >
> > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > >
> > > defined,
> > >
> > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > >
> > > 2) I would propose to have separate concepts for catalog and
> > >
> > > built-in
> > >
> > > functions. In particular it would be nice to modularize built-in
> > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > >
> > > maybe
> > >
> > > we add more experimental functions in the future or function for
> > >
> > > some
> > >
> > > special application area (Geo functions, ML functions). A data
> > >
> > > platform
> > >
> > > team might not want to make every built-in function available. Or a
> > > function module like ML functions is in a different Maven module.
> > >
> > > I think this is orthogonal to this FLIP, especially we don't have
> > >
> > > the
> > >
> > > "external built-in functions" anymore and currently the built-in
> > >
> > > function
> > >
> > > category remains untouched.
> > >
> > > But just to share some thoughts on the proposal, I'm not sure about
> > >
> > > it:
> > >
> > > - I don't know if any other databases handle built-in functions
> > >
> > > like
> > >
> > > that.
> > >
> > > Maybe you can give some examples? IMHO, built-in functions are
> > >
> > > system
> > >
> > > info
> > >
> > > and should be deterministic, not depending on loaded libraries. Geo
> > > functions should be either built-in already or just libraries
> > >
> > > functions,
> > >
> > > and library functions can be adapted to catalog APIs or of some
> > >
> > > other
> > >
> > > syntax to use
> > > - I don't know if all use cases stand, and many can be achieved by
> > >
> > > other
> > >
> > > approaches too. E.g. experimental functions can be taken good care
> > >
> > > of
> > >
> > > by
> > >
> > > documentations, annotations, etc
> > > - the proposal basically introduces some concept like a pluggable
> > >
> > > built-in
> > >
> > > function catalog, despite the already existing catalog APIs
> > > - it brings in even more complicated scenarios to the design. E.g.
> > >
> > > how
> > >
> > > do
> > >
> > > you handle built-in functions in different modules but different
> > >
> > > names?
> > >
> > > In short, I'm not sure if it really stands and it looks like an
> > >
> > > overkill
> > >
> > > to me. I'd rather not go to that route. Related discussion can be
> > >
> > > on
> > >
> > > its
> > >
> > > own thread.
> > >
> > > 3) Following the suggestion above, we can have a separate discovery
> > > mechanism for built-in functions. Instead of just going through a
> > >
> > > static
> > >
> > > list like in BuiltInFunctionDefinitions, a platform team should be
> > >
> > > able
> > >
> > > to select function modules like
> > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > HiveFunctions) or via service discovery;
> > >
> > > Same as above. I'll leave it to its own thread.
> > >
> > > re > 3) Dawid and I discussed the resulution order again. I agree
> > >
> > > with
> > >
> > > Kurt
> > >
> > > that we should unify built-in function (external or internal)
> > >
> > > under a
> > >
> > > common layer. However, the resolution order should be:
> > >    1. built-in functions
> > >    2. temporary functions
> > >    3. regular catalog resolution logic
> > > Otherwise a temporary function could cause clashes with Flink's
> > >
> > > built-in
> > >
> > > functions. If you take a look at other vendors, like SQL Server
> > >
> > > they
> > >
> > > also do not allow to overwrite built-in functions.
> > >
> > > ”I agree with Kurt that we should unify built-in function (external
> > >
> > > or
> > >
> > > internal) under a common layer.“ <- I don't think this is what Kurt
> > >
> > > means.
> > >
> > > Kurt and I are in favor of unifying built-in functions of external
> > >
> > > systems
> > >
> > > and catalog functions. Did you type a mistake?
> > >
> > > Besides, I'm not sure about the resolution order you proposed.
> > >
> > > Temporary
> > >
> > > functions have a lifespan over a session and are only visible to
> > >
> > > the
> > >
> > > session owner, they are unique to each user, and users create them
> > >
> > > on
> > >
> > > purpose to be the highest priority in order to overwrite system
> > >
> > > info
> > >
> > > (built-in functions in this case).
> > >
> > > In your case, why would users name a temporary function the same
> > >
> > > as a
> > >
> > > built-in function then? Since using that name in ambiguous function
> > > reference will always be resolved to built-in functions, creating a
> > > same-named temp function would be meaningless in the end.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Jingsong,
> > >
> > > Re> 1.Hive built-in functions is an intermediate solution. So we
> > >
> > > should
> > >
> > > not introduce interfaces to influence the framework. To make
> > > Flink itself more powerful, we should implement the functions
> > > we need to add.
> > >
> > > Yes, please see the doc.
> > >
> > > Re> 2.Non-flink built-in functions are easy for users to change
> > >
> > > their
> > >
> > > behavior. If we support some flink built-in functions in the
> > > future but act differently from non-flink built-in, this will
> > >
> > > lead
> > >
> > > to
> > >
> > > changes in user behavior.
> > >
> > > There's no such concept as "external built-in functions" any more.
> > > Built-in functions of external systems will be treated as special
> > >
> > > catalog
> > >
> > > functions.
> > >
> > > Re> Another question is, does this fallback include all
> > >
> > > hive built-in functions? As far as I know, some hive functions
> > > have some hacky. If possible, can we start with a white list?
> > > Once we implement some functions to flink built-in, we can
> > > also update the whitelist.
> > >
> > > Yes, that's something we thought of too. I don't think it's super
> > > critical to the scope of this FLIP, thus I'd like to leave it to
> > >
> > > future
> > >
> > > efforts as a nice-to-have feature.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> > bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Kurt,
> > >
> > > Re: > What I want to propose is we can merge #3 and #4, make them
> > >
> > > both
> > >
> > > under
> > >
> > > "catalog" concept, by extending catalog function to make it have
> > >
> > > ability to
> > >
> > > have built-in catalog functions. Some benefits I can see from
> > >
> > > this
> > >
> > > approach:
> > >
> > > 1. We don't have to introduce new concept like external built-in
> > >
> > > functions.
> > >
> > > Actually I don't see a full story about how to treat a built-in
> > >
> > > functions, and it
> > >
> > > seems a little bit disrupt with catalog. As a result, you have
> > >
> > > to
> > >
> > > make
> > >
> > > some restriction
> > >
> > > like "hive built-in functions can only be used when current
> > >
> > > catalog
> > >
> > > is
> > >
> > > hive catalog".
> > >
> > > Yes, I've unified #3 and #4 but it seems I didn't update some
> > >
> > > part
> > >
> > > of
> > >
> > > the doc. I've modified those sections, and they are up to date
> > >
> > > now.
> > >
> > > In short, now built-in function of external systems are defined
> > >
> > > as
> > >
> > > a
> > >
> > > special kind of catalog function in Flink, and handled by Flink
> > >
> > > as
> > >
> > > following:
> > > - An external built-in function must be associated with a catalog
> > >
> > > for
> > >
> > > the purpose of decoupling flink-table and external systems.
> > > - It always resides in front of catalog functions in ambiguous
> > >
> > > function
> > >
> > > reference order, just like in its own external system
> > > - It is a special catalog function that doesn’t have a
> > >
> > > schema/database
> > >
> > > namespace
> > > - It goes thru the same instantiation logic as other user defined
> > > catalog functions in the external system
> > >
> > > Please take another look at the doc, and let me know if you have
> > >
> > > more
> > >
> > > questions.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <
> > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > twalthr@apache.org><tw...@apache.org> <tw...@apache.org> <
> > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> > twalthr@apache.org> <tw...@apache.org> <tw...@apache.org>
> > >
> > > wrote:
> > >
> > > Hi Kurt,
> > >
> > > it should not affect the functions and operations we currently
> > >
> > > have
> > >
> > > in
> > >
> > > SQL. It just categorizes the available built-in functions. It is
> > >
> > > kind
> > >
> > > of
> > > an orthogonal concept to the catalog API but built-in functions
> > >
> > > deserve
> > >
> > > this special kind of treatment. CatalogFunction still fits
> > >
> > > perfectly
> > >
> > > in
> > >
> > > there because the regular catalog object resolution logic is not
> > > affected. So tables and functions are resolved in the same way
> > >
> > > but
> > >
> > > with
> > >
> > > built-in functions that have priority as in the original design.
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 03.09.19 15:26, Kurt Young wrote:
> > >
> > > Does this only affect the functions and operations we currently
> > >
> > > have
> > >
> > > in SQL
> > >
> > > and
> > > have no effect on tables, right? Looks like this is an
> > >
> > > orthogonal
> > >
> > > concept
> > >
> > > with Catalog?
> > > If the answer are both yes, then the catalog function will be a
> > >
> > > weird
> > >
> > > concept?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> > >
> > > wrote:
> > >
> > > The way you proposed are basically the same as what Calcite
> > >
> > > does, I
> > >
> > > think
> > >
> > > we are in the same line.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> > >
> > > ,写道:
> > >
> > > This sounds exactly as the module approach I mentioned, no?
> > >
> > > Regards,
> > > Timo
> > >
> > > On 03.09.19 13:42, Danny Chan wrote:
> > >
> > > Thanks Bowen for bring up this topic, I think it’s a useful
> > >
> > > refactoring to make our function usage more user friendly.
> > >
> > > For the topic of how to organize the builtin operators and
> > >
> > > operators
> > >
> > > of Hive, here is a solution from Apache Calcite, the Calcite
> > >
> > > way
> > >
> > > is
> > >
> > > to make
> > >
> > > every dialect operators a “Library”, user can specify which
> > >
> > > libraries they
> > >
> > > want to use for a sql query. The builtin operators always
> > >
> > > comes
> > >
> > > as
> > >
> > > the
> > >
> > > first class objects and the others are used from the order
> > >
> > > they
> > >
> > > appears.
> > >
> > > Maybe you can take a reference.
> > >
> > > [1]
> > >
> > >
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> > >
> > > ,写道:
> > >
> > > Hi folks,
> > >
> > > I'd like to kick off a discussion on reworking Flink's
> > >
> > > FunctionCatalog.
> > >
> > > It's critically helpful to improve function usability in
> > >
> > > SQL.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > >
> > > In short, it:
> > > - adds support for precise function reference with
> > >
> > > fully/partially
> > >
> > > qualified name
> > > - redefines function resolution order for ambiguous
> > >
> > > function
> > >
> > > reference
> > >
> > > - adds support for Hive's rich built-in functions (support
> > >
> > > for
> > >
> > > Hive
> > >
> > > user
> > >
> > > defined functions was already added in 1.9.0)
> > > - clarifies the concept of temporary functions
> > >
> > > Would love to hear your thoughts.
> > >
> > > Bowen
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Fabian Hueske <fh...@gmail.com>.
Hi all,

I'd like to add my opinion on this topic as well ;-)

In general, I think overriding built-in function with temp functions has a
couple of benefits but also a few challenges:

* Users can reimplement the behavior of a built-in functions of a different
system, e.g., for backward compatibility after a migration.
* I don't think that "accidental" overrides and surprising semantics are an
issue or dangerous. The user registered the temp function in the same
session and should therefore be aware of the changed semantics.
* I see that not all built-in functions can be overridden, like the CAST
example that Dawid gave. However, I think these should be a small fraction
and such functions could be blacklisted. Sure, that's not super consistent,
but should (IMO) not be a big issue in practice.
* Temp functions should be easy to use. Requiring a 3-part addressing makes
them a lot less user friendly, IMO. Users need to think about what catalog
and db to choose when registering them. Also using a temp function in a
query becomes less convenient. Moreover, I agree with Bowen's concerns that
a 3-part addressing scheme reduces the temporal appearance of the function.

From the three possible solutions, my preference order is
1) 1-part address with override of built-in
2) 1-part address without override of built-in
3) 3-part address

Regarding the issue of external built-in functions, I don't think that
Timo's proposal of modules is fully orthogonal to this discussion.
A Hive function module could be an alternative to offering Hive functions
as part of Hive's catalog.
From a user's point of view, I think that modules would be a "cleaner"
integration ("Why do I need a Hive catalog if all I want to do is apply a
Hive function on a Kafka table?").
However, the module approach clearly has the problem of dealing with
same-named functions in different modules (e.g., a Hive function and a
Flink built-in function).
The catalog approach as the benefit that functions can be addressed like
hiveCat::func (or a similar path).

I'm not sure what's the best solution here.

Cheers,
Fabian


Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li <bo...@gmail.com>:

> Hi,
>
> W.r.t temp functions, I feel both options have their benefits and can
> theoretically achieve similar functionalities one way or another. In the
> end, it's more about use cases, users habits, and trade-offs.
>
> Re> Not always users are in full control of the catalog functions. There is
> also the case where different teams manage the catalog & use the catalog.
>
> Temp functions live within a session, and not within a catalog. Having
> 3-part paths may implies temp functions are tied to a catalog in two
> aspects.
> 1) it may indicate each catalog manages their temp functions, which is not
> true as we seem all agree they should reside at a central place, either in
> FunctionCatalog or CatalogManager
> 2) it may indicate there's some access control. When users are forbidden to
> manipulate some objects in the catalog that's managed by other teams, but
> are allowed to manipulate some other objects (temp functions in this case)
> belonging to the catalog in namespaces, users may think we introduced extra
> complexity and confusion with some kind of access control into the problem.
> It doesn't feel intuitive enough for end users.
>
> Thus, I'd be in favor of 1-part path for temporary functions, and other
> temp objects.
>
> Thanks,
> Bowen
>
>
>
> On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > I agree the consequences of the decision are substantial. Let's see what
> > others think.
> >
> > -- Catalog functions are defined by users, and we suppose they can
> > drop/alter it in any way they want. Thus, overwriting a catalog function
> > doesn't seem to be a strong use case that we should be concerned about.
> > Rather, there are known use case for overwriting built-in functions.
> >
> > Not always users are in full control of the catalog functions. There is
> > also the case where different teams manage the catalog & use the catalog.
> > As for overriding built-in functions with 3-part approach user can always
> > use an equally named function from a catalog. E.g. to override
> >
> > *    SELECT explode(arr) FROM ...*
> >
> > user can always write:
> >
> > *    SELECT db.explode(arr) FROM ...*
> >
> > Best,
> >
> > Dawid
> > On 06/09/2019 10:54, Xuefu Z wrote:
> >
> > Hi Dawid,
> >
> > Thank you for your summary. While the only difference in the two
> proposals
> > is one- or three-part in naming, the consequence would be substantial.
> >
> > To me, there are two major use cases of temporary functions compared to
> > persistent ones:
> > 1. Temporary in nature and auto managed by the session. More often than
> > not, admin doesn't even allow user to create persistent functions.
> > 2. Provide an opportunity to overwriting system built-in functions.
> >
> > Since built-in functions has one-part name, requiring three-part name for
> > temporary functions eliminates the overwriting opportunity.
> >
> > One-part naming essentially puts all temp functions under a single
> > namespace and simplifies function resolution, such as we don't need to
> > consider the case of a temp function and a persistent function with the
> > same name under the same database.
> >
> > I agree having three-parts does have its merits, such as consistency with
> > other temporary objects (table) and minor difference between temp vs
> > catalog functions. However, there is a slight difference between tables
> and
> > function in that there is no built-in table in SQL so there is no need to
> > overwrite it.
> >
> > I'm not sure if I fully agree the benefits you listed as the advantages
> of
> > the three-part naming of temp functions.
> >   -- Allowing overwriting built-in functions is a benefit and the
> solution
> > for disallowing certain overwriting shouldn't be totally banning it.
> >   -- Catalog functions are defined by users, and we suppose they can
> > drop/alter it in any way they want. Thus, overwriting a catalog function
> > doesn't seem to be a strong use case that we should be concerned about.
> > Rather, there are known use case for overwriting built-in functions.
> >
> > Thus, personally I would prefer one-part name for temporary functions. In
> > lack of SQL standard on this, I certainly like to get opinions from
> others
> > to see if a consensus can be eventually reached.
> >
> > (To your point on modular approach to support external built-in
> functions,
> > we saw the value and are actively looking into it. Thanks for sharing
> your
> > opinion on that.)
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org>
> > wrote:
> >
> >
> > Hi Xuefu,
> >
> > Thank you for your answers.
> >
> > Let me summarize my understanding. In principle we differ only in regards
> > to the fact if a temporary function can be only 1-part or only 3-part
> > identified. I can reconfirm that if the community decides it prefers the
> > 1-part approach I will commit to that, with the assumption that we will
> > force ONLY 1-part function names. (We will parse identifier and throw
> > exception if a user tries to register e.g. db.temp_func).
> >
> > My preference is though the 3-part approach:
> >
> >    - there are some functions that it makes no sense to override, e.g.
> >    CAST, moreover I'm afraid that allowing overriding such will lead to
> high
> >    inconsistency, similar to those that I mentioned spark has
> >    - you cannot shadow a fully-qualified function. (If a user fully
> >    qualifies his/her objects in a SQL query, which is often considered a
> good
> >    practice)
> >    - it does not differentiate between functions & temporary functions.
> >    Temporary functions just differ with regards to their life-cycle. The
> >    registration & usage is exactly the same.
> >
> > As it can be seen, the proposed concept regarding temp function and
> > function resolution is quite simple.
> >
> > Both approaches are equally simple. I would even say the 3-part approach
> > is slightly simpler as it does not have to care about some special
> built-in
> > functions such as CAST.
> >
> > I don't want to express my opinion on the differentiation between
> built-in
> > functions and "external" built-in functions in this thread as it is
> rather
> > orthogonal, but I also like the modular approach and I definitely don't
> > like the special syntax "cat::function". I think it's better to stick to
> a
> > standard or at least other proved solutions from other systems.
> >
> > Best,
> >
> > Dawid
> > On 05/09/2019 10:12, Xuefu Z wrote:
> >
> > Hi David,
> >
> > Thanks for sharing your thoughts and  request for clarifications. I
> believe
> > that I fully understood your proposal, which does has its merit. However,
> > it's different from ours. Here are the answers to your questions:
> >
> > Re #1: yes, the temp functions in the proposal are global and have just
> > one-part names, similar to built-in functions. Two- or three-part names
> are
> > not allowed.
> >
> > Re #2: not applicable as two- or three-part names are disallowed.
> >
> > Re #3: same as above. Referencing external built-in functions is achieved
> > either implicitly (only the built-in functions in the current catalogs
> are
> > considered) or via special syntax such as cat::function. However, we are
> > looking into the modular approach that Time suggested with other feedback
> > received from the community.
> >
> > Re #4: the resolution order goes like the following in our proposal:
> >
> > 1. temporary functions
> > 2. bulit-in functions (including those augmented by add-on modules)
> > 3. built-in functions in current catalog (this will not be needed if the
> > special syntax "cat::function" is required)
> > 4. functions in current catalog and db.
> >
> > If we go with the modular approach and make external built-in functions
> as
> > an add-on module, the 2 and 3 above will be combined. In essence, the
> > resolution order is equivalent in the two approaches.
> >
> > By the way, resolution order matters only for simple name reference. For
> > names such as db.function (interpreted as current_cat/db/function) or
> > cat.db.function, the reference is unambiguous, so on resolution is
> needed.
> >
> > As it can be seen, the proposed concept regarding temp function and
> > function resolution is quite simple. Additionally, the proposed
> resolution
> > order allows temp function to shadow a built-in function, which is
> > important (though not decisive) in our opinion.
> >
> > I started liking the modular approach as the resolution order will only
> > include 1, 2, and 4, which is simpler and more generic. That's why I
> > suggested we look more into this direction.
> >
> > Please let me know if there are further questions.
> >
> > Thanks,
> > Xuefu
> >
> >
> >
> >
> > On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> > wrote:
> >
> >
> > Hi Xuefu,
> >
> > Just wanted to summarize my opinion on the one topic (temporary
> functions).
> >
> > My preference would be to make temporary functions always 3-part
> qualified
> > (as a result that would prohibit overriding built-in functions). Having
> > said that if the community decides that it's better to allow overriding
> > built-in functions I am fine with it and can commit to that decision.
> >
> > I wanted to ask if you could clarify a few points for me around that
> > option.
> >
> >    1. Would you enforce temporary functions to be always just a single
> >    name (without db & cat) as hive does, or would you allow also 3 or
> even 2
> >    part identifiers?
> >    2. Assuming 2/3-part paths. How would you register a function from a
> >    following statement: CREATE TEMPORARY FUNCTION db.func? Would that
> shadow
> >    all functions named 'func' in all databases named 'db' in all
> catalogs? Or
> >    would you shadow only function 'func' in database 'db' in current
> catalog?
> >    3. This point is still under discussion, but was mentioned a few
> >    times, that maybe we want to enable syntax cat.func for "external
> built-in
> >    functions". How would that affect statement from previous point? Would
> >    'db.func' shadow "external built-in function" in 'db' catalog or user
> >    functions as in point 2? Or maybe both?
> >    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
> >    paths. Would the function resolution be actually as follows?:
> >       1. temporary functions (1-part path)
> >       2. built-in functions
> >       3. temporary functions (2-part path)
> >       4. 2-part catalog functions a.k.a. "external built-in functions"
> >       (cat + func) - this is still under discussion, if we want that in
> the other
> >       focal point
> >       5. temporary functions (3-part path)
> >       6. 3-part catalog functions a.k.a. user functions
> >
> > I would be really grateful if you could explain me those questions,
> thanks.
> >
> > BTW, Thank you all for a healthy discussion.
> >
> > Best,
> >
> > Dawid
> > On 04/09/2019 23:25, Xuefu Z wrote:
> >
> > Thank all for the sharing thoughts. I think we have gathered some useful
> > initial feedback from this long discussion with a couple of focal points
> > sticking out.
> >
> >  We will go back to do more research and adapt our proposal. Once it's
> > ready, we will ask for a new round of review. If there is any
> disagreement,
> > we will start a new discussion thread on each rather than having a mega
> > discussion like this.
> >
> > Thanks to everyone for participating.
> >
> > Regards,
> > Xuefu
> >
> >
> > On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> wrote:
> >
> >
> > Let me try to summarize and conclude the long thread so far:
> >
> > 1. For order of temp function v.s. built-in function:
> >
> > I think Dawid's point that temp function should be of fully qualified
> path
> > is a better reasoning to back the newly proposed order, and i agree we
> > don't need to follow Hive/Spark.
> >
> > However, I'd rather not change fundamentals of temporary functions in
> this
> > FLIP. It belongs to a bigger story of how temporary objects should be
> > redefined and be handled uniformly - currently temporary tables and views
> > (those registered from TableEnv#registerTable()) behave different than
> what
> > Dawid propose for temp functions, and we need a FLIP to just unify their
> > APIs and behaviors.
> >
> > I agree that backward compatibility is not an issue w.r.t Jark's points.
> >
> > ***Seems we do have consensus that it's acceptable to prevent users
> > registering a temp function in the same name as a built-in function. To
> > help us move forward, I'd like to propose setting such a restraint on
> temp
> > functions in this FLIP to simplify the design and avoid disputes.*** It
> > will also leave rooms for improvements in the future.
> >
> >
> > 2. For Hive built-in function:
> >
> > Thanks Timo for providing the Presto and Postgres examples. I feel
> modular
> > built-in functions can be a good fit for the geo and ml example as a
> native
> > Flink extension, but not sure if it fits well with external integrations.
> > Anyway, I think modular built-in functions is a bigger story and can be
> on
> > its own thread too, and our proposal doesn't prevent Flink from doing
> that
> > in the future.
> >
> > ***Seems we have consensus that users should be able to use built-in
> > functions of Hive or other external systems in SQL explicitly and
> > deterministically regardless of Flink built-in functions and the
> potential
> > modular built-in functions, via some new syntax like "mycat::func"? If
> so,
> > I'd like to propose removing Hive built-in functions from ambiguous
> > function resolution order, and empower users with such a syntax. This way
> > we sacrifice a little convenience for certainty***
> >
> >
> > What do you think?
> >
> > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> <dw...@apache.org>
> > wrote:
> >
> >
> > Hi,
> >
> > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> >
> > are
> >
> > very inconsistent in that manner (spark being way worse on that).
> >
> > Hive:
> >
> > You cannot overwrite all the built-in functions. I could overwrite most
> >
> > of
> >
> > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > functions I cannot overwrite e.g. CAST, ARRAY I get:
> >
> >
> > *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
> >
> > What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > overwrite a function. When I drop the temporary function the native
> > function is still available.
> >
> > Spark:
> >
> > Spark's behavior imho is super bad.
> >
> > Theoretically I could overwrite all functions. I was able e.g. to
> > overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> > FUNCTION syntax. Otherwise I get an exception that a function already
> > exists. However when I used the CAST function in a query it used the
> > native, built-in one.
> >
> > When I overwrote current_date() function, it was used in a query, but it
> > completely replaces the built-in function and I can no longer use the
> > native function in any way. I cannot also drop the temporary function. I
> > get:
> >
> > *    Error in query: Cannot drop native function 'current_date';*
> >
> > Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> > with a database. Temporary functions are always represented as a single
> > name.
> >
> > In my opinion neither of the systems have consistent behavior. Generally
> > speaking I think overwriting any system provided functions is just
> > dangerous.
> >
> > Regarding Jark's concerns. Such functions would be registered in a
> >
> > current
> >
> > catalog/database schema, so a user could still use its own function, but
> > would have to fully qualify the function (because built-in functions take
> > precedence). Moreover users would have the same problem with permanent
> > functions. Imagine a user have a permanent function 'cat.db.explode'. In
> > 1.9 the user could use just the 'explode' function as long as the 'cat' &
> > 'db' were the default catalog & database. If we introduce 'explode'
> > built-in function in 1.10, the user has to fully qualify the function.
> >
> > Best,
> >
> > Dawid
> > On 04/09/2019 15:19, Timo Walther wrote:
> >
> > Hi all,
> >
> > thanks for the healthy discussion. It is already a very long discussion
> > with a lot of text. So I will just post my opinion to a couple of
> > statements:
> >
> >
> > Hive built-in functions are not part of Flink built-in functions, they
> >
> > are catalog functions
> >
> > That is not entirely true. Correct me if I'm wrong but I think Hive
> > built-in functions are also not catalog functions. They are not stored in
> > every Hive metastore catalog that is freshly created but are a set of
> > functions that are listed somewhere and made available.
> >
> >
> > ambiguous functions reference just shouldn't be resolved to a different
> >
> > catalog
> >
> > I agree. They should not be resolved to a different catalog. That's why I
> > am suggesting to split the concept of built-in functions and catalog
> >
> > lookup
> >
> > semantics.
> >
> >
> > I don't know if any other databases handle built-in functions like that
> >
> > What I called "module" is:
> > - Extension in Postgres [1]
> > - Plugin in Presto [2]
> >
> > Btw. Presto even mentions example modules that are similar to the ones
> > that we will introduce in the near future both for ML and System XYZ
> > compatibility:
> > "See either the presto-ml module for machine learning functions or the
> > presto-teradata-functions module for Teradata-compatible functions, both
> >
> > in
> >
> > the root of the Presto source."
> >
> >
> > functions should be either built-in already or just libraries
> >
> > functions,
> >
> > and library functions can be adapted to catalog APIs or of some other
> > syntax to use
> >
> > Regarding "built-in already", of course we can add a lot of functions as
> > built-ins but we will end-up in a dependency hell in the near future if
> >
> > we
> >
> > don't introduce a pluggable approach. Library functions is what you also
> > suggest but storing them in a catalog means to always fully qualify them
> >
> > or
> >
> > modifying the existing catalog design that was inspired by the standard.
> >
> > I don't think "it brings in even more complicated scenarios to the
> > design", it just does clear separation of concerns. Integrating the
> > functionality into the current design makes the catalog API more
> > complicated.
> >
> >
> > why would users name a temporary function the same as a built-in
> >
> > function then?
> >
> > Because you never know what users do. If they don't, my suggested
> > resolution order should not be a problem, right?
> >
> >
> > I don't think hive functions deserves be a function module
> >
> > Our goal is not to create a Hive clone. We need to think forward and Hive
> > is just one of many systems that we can support. Not every built-in
> > function behaves and will behave exactly like Hive.
> >
> >
> > regarding temporary functions, there are few systems that support it
> >
> > IMHO Spark and Hive are not always the best examples for consistent
> > design. Systems like Postgres, Presto, or SQL Server should be used as a
> > reference. I don't think that a user can overwrite a built-in function
> > there.
> >
> > Regards,
> > Timo
> >
> > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > [2] https://prestodb.github.io/docs/current/develop/functions.html
> >
> >
> > On 04.09.19 13:44, Jark Wu wrote:
> >
> > Hi all,
> >
> > Regarding #1 temp function <> built-in function and naming.
> > I'm fine with temp functions should precede built-in function and can
> > override built-in functions (we already support to override built-in
> > function in 1.9).
> > If we don't allow the same name as a built-in function, I'm afraid we
> >
> > will
> >
> > have compatibility issues in the future.
> > Say users register a user defined function named "explode" in 1.9, and we
> > support a built-in "explode" function in 1.10.
> > Then the user's jobs which call the registered "explode" function in 1.9
> > will all fail in 1.10 because of naming conflict.
> >
> > Regarding #2 "External" built-in functions.
> > I think if we store external built-in functions in catalog, then
> > "hive1::sqrt" is a good way to go.
> > However, I would prefer to support a discovery mechanism (e.g. SPI) for
> > built-in functions as Timo suggested above.
> > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > function set as built-in functions in an easy way.
> >
> > Best,
> > Jark
> >
> > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> >
> > Hi David,
> >
> > Thank you for sharing your findings. It seems to me that there is no SQL
> > standard regarding temporary functions. There are few systems that
> >
> > support
> >
> > it. Here are what I have found:
> >
> > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > 2. Spark: basically follows Hive (
> >
> >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> >
> > )
> > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> > behavior. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> >
> > )
> >
> > Because of lack of standard, it's perfectly fine for Flink to define
> > whatever it sees appropriate. Thus, your proposal (no overwriting and
> >
> > must
> >
> > have DB as holder) is one option. The advantage is simplicity, The
> > downside
> > is the deviation from Hive, which is popular and de facto standard in big
> > data world.
> >
> > However, I don't think we have to follow Hive. More importantly, we need
> >
> > a
> >
> > consensus. I have no objection if your proposal is generally agreed upon.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakowicz@apache.org
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org>
> > wrote:
> >
> > Hi all,
> >
> > Just an opinion on the built-in <> temporary functions resolution and
> > NAMING issue. I think we should not allow overriding the built-in
> > functions, as this may pose serious issues and to be honest is rather
> > not feasible and would require major rework. What happens if a user
> > wants to override CAST? Calls to that function are generated at
> > different layers of the stack that unfortunately does not always go
> > through the Catalog API (at least yet). Moreover from what I've checked
> > no other systems allow overriding the built-in functions. All the
> > systems I've checked so far register temporary functions in a
> > database/schema (either special database for temporary functions, or
> > just current database). What I would suggest is to always register
> > temporary functions with a 3 part identifier. The same way as tables,
> > views etc. This effectively means you cannot override built-in
> > functions. With such approach it is natural that the temporary functions
> > end up a step lower in the resolution order:
> >
> > 1. built-in functions (1 part, maybe 2? - this is still under discussion)
> >
> > 2. temporary functions (always 3 part path)
> >
> > 3. catalog functions (always 3 part path)
> >
> > Let me know what do you think.
> >
> > Best,
> >
> > Dawid
> >
> > On 04/09/2019 06:13, Bowen Li wrote:
> >
> > Hi,
> >
> > I agree with Xuefu that the main controversial points are mainly the
> >
> > two
> >
> > places. My thoughts on them:
> >
> > 1) Determinism of referencing Hive built-in functions. We can either
> >
> > remove
> >
> > Hive built-in functions from ambiguous function resolution and require
> > users to use special syntax for their qualified names, or add a config
> >
> > flag
> >
> > to catalog constructor/yaml for turning on and off Hive built-in
> >
> > functions
> >
> > with the flag set to 'false' by default and proper doc added to help
> >
> > users
> >
> > make their decisions.
> >
> > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> >
> > function
> >
> > resolution order. We believe Flink temp functions should precede Flink
> > built-in functions, and I have presented my reasons. Just in case if we
> > cannot reach an agreement, I propose forbid users registering temp
> > functions in the same name as a built-in function, like MySQL's
> >
> > approach,
> >
> > for the moment. It won't have any performance concern, since built-in
> > functions are all in memory and thus cost of a name check will be
> >
> > really
> >
> > trivial.
> >
> >
> > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com><us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> <
> usxuefu@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
> >
> >  From what I have seen, there are a couple of focal disagreements:
> >
> > 1. Resolution order: temp function --> flink built-in function -->
> >
> > catalog
> >
> > function vs flink built-in function --> temp function -> catalog
> >
> > function.
> >
> > 2. "External" built-in functions: how to treat built-in functions in
> > external system and how users reference them
> >
> > For #1, I agree with Bowen that temp function needs to be at the
> >
> > highest
> >
> > priority because that's how a user might overwrite a built-in function
> > without referencing a persistent, overwriting catalog function with a
> >
> > fully
> >
> > qualified name. Putting built-in functions at the highest priority
> > eliminates that usage.
> >
> > For #2, I saw a general agreement on referencing "external" built-in
> > functions such as those in Hive needs to be explicit and deterministic
> >
> > even
> >
> > though different approaches are proposed. To limit the scope and
> >
> > simply
> >
> > the
> >
> > usage, it seems making sense to me to introduce special syntax for
> >
> > user  to
> >
> > explicitly reference an external built-in function such as hive1::sqrt
> >
> > or
> >
> > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> >
> > call
> >
> > hive1.getFunction(ObjectPath functionName) where the database name is
> > absent for bulit-in functions available in that catalog hive1. I
> >
> > understand
> >
> > that Bowen's original proposal was trying to avoid this, but this
> >
> > could
> >
> > turn out to be a clean and simple solution.
> >
> > (Timo's modular approach is great way to "expand" Flink's built-in
> >
> > function
> >
> > set, which seems orthogonal and complementary to this, which could be
> > tackled in further future work.)
> >
> > I'd be happy to hear further thoughts on the two points.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <
> ykt836@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com>
> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com>
> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com>
> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
> >
> > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> >
> > the
> >
> > same
> > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > suggestion.
> >
> > The reason is backward compatibility. If we follow Bowen's approach,
> >
> > let's
> >
> > say we
> > first find function in Flink's built-in functions, and then hive's
> > built-in. For example, `foo`
> > is not supported by Flink, but hive has such built-in function. So
> >
> > user
> >
> > will have hive's
> > behavior for function `foo`. And in next release, Flink realize this
> >
> > is a
> >
> > very popular function
> > and add it into Flink's built-in functions, but with different
> >
> > behavior
> >
> > as
> >
> > hive's. So in next
> > release, the behavior changes.
> >
> > With Timo's approach, IIUC user have to tell the framework explicitly
> >
> > what
> >
> > kind of
> > built-in functions he would like to use. He can just tell framework
> >
> > to
> >
> > abandon Flink's built-in
> > functions, and use hive's instead. User can only choose between them,
> >
> > but
> >
> > not use
> > them at the same time. I think this approach is more predictable.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> >
> > section
> >
> > in the google doc was updated, please take a look first and let me
> >
> > know
> >
> > if
> >
> > you have more questions.
> >
> > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Timo,
> >
> > Re> 1) We should not have the restriction "hive built-in functions
> >
> > can
> >
> > only
> >
> > be used when current catalog is hive catalog". Switching a catalog
> > should only have implications on the cat.db.object resolution but
> >
> > not
> >
> > functions. It would be quite convinient for users to use Hive
> >
> > built-ins
> >
> > even if they use a Confluent schema registry or just the in-memory
> >
> > catalog.
> >
> > There might be a misunderstanding here.
> >
> > First of all, Hive built-in functions are not part of Flink
> >
> > built-in
> >
> > functions, they are catalog functions, thus if the current catalog
> >
> > is
> >
> > not a
> >
> > HiveCatalog but, say, a schema registry catalog, ambiguous
> >
> > functions
> >
> > reference just shouldn't be resolved to a different catalog.
> >
> > Second, Hive built-in functions can potentially be referenced
> >
> > across
> >
> > catalog, but it doesn't have db namespace and we currently just
> >
> > don't
> >
> > have
> >
> > a SQL syntax for it. It can be enabled when such a SQL syntax is
> >
> > defined,
> >
> > e.g. "catalog::function", but it's out of scope of this FLIP.
> >
> > 2) I would propose to have separate concepts for catalog and
> >
> > built-in
> >
> > functions. In particular it would be nice to modularize built-in
> > functions. Some built-in functions are very crucial (like AS, CAST,
> > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> >
> > maybe
> >
> > we add more experimental functions in the future or function for
> >
> > some
> >
> > special application area (Geo functions, ML functions). A data
> >
> > platform
> >
> > team might not want to make every built-in function available. Or a
> > function module like ML functions is in a different Maven module.
> >
> > I think this is orthogonal to this FLIP, especially we don't have
> >
> > the
> >
> > "external built-in functions" anymore and currently the built-in
> >
> > function
> >
> > category remains untouched.
> >
> > But just to share some thoughts on the proposal, I'm not sure about
> >
> > it:
> >
> > - I don't know if any other databases handle built-in functions
> >
> > like
> >
> > that.
> >
> > Maybe you can give some examples? IMHO, built-in functions are
> >
> > system
> >
> > info
> >
> > and should be deterministic, not depending on loaded libraries. Geo
> > functions should be either built-in already or just libraries
> >
> > functions,
> >
> > and library functions can be adapted to catalog APIs or of some
> >
> > other
> >
> > syntax to use
> > - I don't know if all use cases stand, and many can be achieved by
> >
> > other
> >
> > approaches too. E.g. experimental functions can be taken good care
> >
> > of
> >
> > by
> >
> > documentations, annotations, etc
> > - the proposal basically introduces some concept like a pluggable
> >
> > built-in
> >
> > function catalog, despite the already existing catalog APIs
> > - it brings in even more complicated scenarios to the design. E.g.
> >
> > how
> >
> > do
> >
> > you handle built-in functions in different modules but different
> >
> > names?
> >
> > In short, I'm not sure if it really stands and it looks like an
> >
> > overkill
> >
> > to me. I'd rather not go to that route. Related discussion can be
> >
> > on
> >
> > its
> >
> > own thread.
> >
> > 3) Following the suggestion above, we can have a separate discovery
> > mechanism for built-in functions. Instead of just going through a
> >
> > static
> >
> > list like in BuiltInFunctionDefinitions, a platform team should be
> >
> > able
> >
> > to select function modules like
> > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > HiveFunctions) or via service discovery;
> >
> > Same as above. I'll leave it to its own thread.
> >
> > re > 3) Dawid and I discussed the resulution order again. I agree
> >
> > with
> >
> > Kurt
> >
> > that we should unify built-in function (external or internal)
> >
> > under a
> >
> > common layer. However, the resolution order should be:
> >    1. built-in functions
> >    2. temporary functions
> >    3. regular catalog resolution logic
> > Otherwise a temporary function could cause clashes with Flink's
> >
> > built-in
> >
> > functions. If you take a look at other vendors, like SQL Server
> >
> > they
> >
> > also do not allow to overwrite built-in functions.
> >
> > ”I agree with Kurt that we should unify built-in function (external
> >
> > or
> >
> > internal) under a common layer.“ <- I don't think this is what Kurt
> >
> > means.
> >
> > Kurt and I are in favor of unifying built-in functions of external
> >
> > systems
> >
> > and catalog functions. Did you type a mistake?
> >
> > Besides, I'm not sure about the resolution order you proposed.
> >
> > Temporary
> >
> > functions have a lifespan over a session and are only visible to
> >
> > the
> >
> > session owner, they are unique to each user, and users create them
> >
> > on
> >
> > purpose to be the highest priority in order to overwrite system
> >
> > info
> >
> > (built-in functions in this case).
> >
> > In your case, why would users name a temporary function the same
> >
> > as a
> >
> > built-in function then? Since using that name in ambiguous function
> > reference will always be resolved to built-in functions, creating a
> > same-named temp function would be meaningless in the end.
> >
> >
> > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Jingsong,
> >
> > Re> 1.Hive built-in functions is an intermediate solution. So we
> >
> > should
> >
> > not introduce interfaces to influence the framework. To make
> > Flink itself more powerful, we should implement the functions
> > we need to add.
> >
> > Yes, please see the doc.
> >
> > Re> 2.Non-flink built-in functions are easy for users to change
> >
> > their
> >
> > behavior. If we support some flink built-in functions in the
> > future but act differently from non-flink built-in, this will
> >
> > lead
> >
> > to
> >
> > changes in user behavior.
> >
> > There's no such concept as "external built-in functions" any more.
> > Built-in functions of external systems will be treated as special
> >
> > catalog
> >
> > functions.
> >
> > Re> Another question is, does this fallback include all
> >
> > hive built-in functions? As far as I know, some hive functions
> > have some hacky. If possible, can we start with a white list?
> > Once we implement some functions to flink built-in, we can
> > also update the whitelist.
> >
> > Yes, that's something we thought of too. I don't think it's super
> > critical to the scope of this FLIP, thus I'd like to leave it to
> >
> > future
> >
> > efforts as a nice-to-have feature.
> >
> >
> > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com><bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <
> bowenli86@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > Re: > What I want to propose is we can merge #3 and #4, make them
> >
> > both
> >
> > under
> >
> > "catalog" concept, by extending catalog function to make it have
> >
> > ability to
> >
> > have built-in catalog functions. Some benefits I can see from
> >
> > this
> >
> > approach:
> >
> > 1. We don't have to introduce new concept like external built-in
> >
> > functions.
> >
> > Actually I don't see a full story about how to treat a built-in
> >
> > functions, and it
> >
> > seems a little bit disrupt with catalog. As a result, you have
> >
> > to
> >
> > make
> >
> > some restriction
> >
> > like "hive built-in functions can only be used when current
> >
> > catalog
> >
> > is
> >
> > hive catalog".
> >
> > Yes, I've unified #3 and #4 but it seems I didn't update some
> >
> > part
> >
> > of
> >
> > the doc. I've modified those sections, and they are up to date
> >
> > now.
> >
> > In short, now built-in function of external systems are defined
> >
> > as
> >
> > a
> >
> > special kind of catalog function in Flink, and handled by Flink
> >
> > as
> >
> > following:
> > - An external built-in function must be associated with a catalog
> >
> > for
> >
> > the purpose of decoupling flink-table and external systems.
> > - It always resides in front of catalog functions in ambiguous
> >
> > function
> >
> > reference order, just like in its own external system
> > - It is a special catalog function that doesn’t have a
> >
> > schema/database
> >
> > namespace
> > - It goes thru the same instantiation logic as other user defined
> > catalog functions in the external system
> >
> > Please take another look at the doc, and let me know if you have
> >
> > more
> >
> > questions.
> >
> >
> > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <
> twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> twalthr@apache.org><tw...@apache.org> <tw...@apache.org> <
> twalthr@apache.org> <tw...@apache.org> <tw...@apache.org> <
> twalthr@apache.org> <tw...@apache.org> <tw...@apache.org>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > it should not affect the functions and operations we currently
> >
> > have
> >
> > in
> >
> > SQL. It just categorizes the available built-in functions. It is
> >
> > kind
> >
> > of
> > an orthogonal concept to the catalog API but built-in functions
> >
> > deserve
> >
> > this special kind of treatment. CatalogFunction still fits
> >
> > perfectly
> >
> > in
> >
> > there because the regular catalog object resolution logic is not
> > affected. So tables and functions are resolved in the same way
> >
> > but
> >
> > with
> >
> > built-in functions that have priority as in the original design.
> >
> > Regards,
> > Timo
> >
> >
> > On 03.09.19 15:26, Kurt Young wrote:
> >
> > Does this only affect the functions and operations we currently
> >
> > have
> >
> > in SQL
> >
> > and
> > have no effect on tables, right? Looks like this is an
> >
> > orthogonal
> >
> > concept
> >
> > with Catalog?
> > If the answer are both yes, then the catalog function will be a
> >
> > weird
> >
> > concept?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> >
> > wrote:
> >
> > The way you proposed are basically the same as what Calcite
> >
> > does, I
> >
> > think
> >
> > we are in the same line.
> >
> > Best,
> > Danny Chan
> > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> >
> > ,写道:
> >
> > This sounds exactly as the module approach I mentioned, no?
> >
> > Regards,
> > Timo
> >
> > On 03.09.19 13:42, Danny Chan wrote:
> >
> > Thanks Bowen for bring up this topic, I think it’s a useful
> >
> > refactoring to make our function usage more user friendly.
> >
> > For the topic of how to organize the builtin operators and
> >
> > operators
> >
> > of Hive, here is a solution from Apache Calcite, the Calcite
> >
> > way
> >
> > is
> >
> > to make
> >
> > every dialect operators a “Library”, user can specify which
> >
> > libraries they
> >
> > want to use for a sql query. The builtin operators always
> >
> > comes
> >
> > as
> >
> > the
> >
> > first class objects and the others are used from the order
> >
> > they
> >
> > appears.
> >
> > Maybe you can take a reference.
> >
> > [1]
> >
> >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >
> > Best,
> > Danny Chan
> > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> >
> > ,写道:
> >
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's
> >
> > FunctionCatalog.
> >
> > It's critically helpful to improve function usability in
> >
> > SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with
> >
> > fully/partially
> >
> > qualified name
> > - redefines function resolution order for ambiguous
> >
> > function
> >
> > reference
> >
> > - adds support for Hive's rich built-in functions (support
> >
> > for
> >
> > Hive
> >
> > user
> >
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi,

W.r.t temp functions, I feel both options have their benefits and can
theoretically achieve similar functionalities one way or another. In the
end, it's more about use cases, users habits, and trade-offs.

Re> Not always users are in full control of the catalog functions. There is
also the case where different teams manage the catalog & use the catalog.

Temp functions live within a session, and not within a catalog. Having
3-part paths may implies temp functions are tied to a catalog in two
aspects.
1) it may indicate each catalog manages their temp functions, which is not
true as we seem all agree they should reside at a central place, either in
FunctionCatalog or CatalogManager
2) it may indicate there's some access control. When users are forbidden to
manipulate some objects in the catalog that's managed by other teams, but
are allowed to manipulate some other objects (temp functions in this case)
belonging to the catalog in namespaces, users may think we introduced extra
complexity and confusion with some kind of access control into the problem.
It doesn't feel intuitive enough for end users.

Thus, I'd be in favor of 1-part path for temporary functions, and other
temp objects.

Thanks,
Bowen



On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> I agree the consequences of the decision are substantial. Let's see what
> others think.
>
> -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Not always users are in full control of the catalog functions. There is
> also the case where different teams manage the catalog & use the catalog.
> As for overriding built-in functions with 3-part approach user can always
> use an equally named function from a catalog. E.g. to override
>
> *    SELECT explode(arr) FROM ...*
>
> user can always write:
>
> *    SELECT db.explode(arr) FROM ...*
>
> Best,
>
> Dawid
> On 06/09/2019 10:54, Xuefu Z wrote:
>
> Hi Dawid,
>
> Thank you for your summary. While the only difference in the two proposals
> is one- or three-part in naming, the consequence would be substantial.
>
> To me, there are two major use cases of temporary functions compared to
> persistent ones:
> 1. Temporary in nature and auto managed by the session. More often than
> not, admin doesn't even allow user to create persistent functions.
> 2. Provide an opportunity to overwriting system built-in functions.
>
> Since built-in functions has one-part name, requiring three-part name for
> temporary functions eliminates the overwriting opportunity.
>
> One-part naming essentially puts all temp functions under a single
> namespace and simplifies function resolution, such as we don't need to
> consider the case of a temp function and a persistent function with the
> same name under the same database.
>
> I agree having three-parts does have its merits, such as consistency with
> other temporary objects (table) and minor difference between temp vs
> catalog functions. However, there is a slight difference between tables and
> function in that there is no built-in table in SQL so there is no need to
> overwrite it.
>
> I'm not sure if I fully agree the benefits you listed as the advantages of
> the three-part naming of temp functions.
>   -- Allowing overwriting built-in functions is a benefit and the solution
> for disallowing certain overwriting shouldn't be totally banning it.
>   -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Thus, personally I would prefer one-part name for temporary functions. In
> lack of SQL standard on this, I certainly like to get opinions from others
> to see if a consensus can be eventually reached.
>
> (To your point on modular approach to support external built-in functions,
> we saw the value and are actively looking into it. Thanks for sharing your
> opinion on that.)
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi Xuefu,
>
> Thank you for your answers.
>
> Let me summarize my understanding. In principle we differ only in regards
> to the fact if a temporary function can be only 1-part or only 3-part
> identified. I can reconfirm that if the community decides it prefers the
> 1-part approach I will commit to that, with the assumption that we will
> force ONLY 1-part function names. (We will parse identifier and throw
> exception if a user tries to register e.g. db.temp_func).
>
> My preference is though the 3-part approach:
>
>    - there are some functions that it makes no sense to override, e.g.
>    CAST, moreover I'm afraid that allowing overriding such will lead to high
>    inconsistency, similar to those that I mentioned spark has
>    - you cannot shadow a fully-qualified function. (If a user fully
>    qualifies his/her objects in a SQL query, which is often considered a good
>    practice)
>    - it does not differentiate between functions & temporary functions.
>    Temporary functions just differ with regards to their life-cycle. The
>    registration & usage is exactly the same.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple.
>
> Both approaches are equally simple. I would even say the 3-part approach
> is slightly simpler as it does not have to care about some special built-in
> functions such as CAST.
>
> I don't want to express my opinion on the differentiation between built-in
> functions and "external" built-in functions in this thread as it is rather
> orthogonal, but I also like the modular approach and I definitely don't
> like the special syntax "cat::function". I think it's better to stick to a
> standard or at least other proved solutions from other systems.
>
> Best,
>
> Dawid
> On 05/09/2019 10:12, Xuefu Z wrote:
>
> Hi David,
>
> Thanks for sharing your thoughts and  request for clarifications. I believe
> that I fully understood your proposal, which does has its merit. However,
> it's different from ours. Here are the answers to your questions:
>
> Re #1: yes, the temp functions in the proposal are global and have just
> one-part names, similar to built-in functions. Two- or three-part names are
> not allowed.
>
> Re #2: not applicable as two- or three-part names are disallowed.
>
> Re #3: same as above. Referencing external built-in functions is achieved
> either implicitly (only the built-in functions in the current catalogs are
> considered) or via special syntax such as cat::function. However, we are
> looking into the modular approach that Time suggested with other feedback
> received from the community.
>
> Re #4: the resolution order goes like the following in our proposal:
>
> 1. temporary functions
> 2. bulit-in functions (including those augmented by add-on modules)
> 3. built-in functions in current catalog (this will not be needed if the
> special syntax "cat::function" is required)
> 4. functions in current catalog and db.
>
> If we go with the modular approach and make external built-in functions as
> an add-on module, the 2 and 3 above will be combined. In essence, the
> resolution order is equivalent in the two approaches.
>
> By the way, resolution order matters only for simple name reference. For
> names such as db.function (interpreted as current_cat/db/function) or
> cat.db.function, the reference is unambiguous, so on resolution is needed.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple. Additionally, the proposed resolution
> order allows temp function to shadow a built-in function, which is
> important (though not decisive) in our opinion.
>
> I started liking the modular approach as the resolution order will only
> include 1, 2, and 4, which is simpler and more generic. That's why I
> suggested we look more into this direction.
>
> Please let me know if there are further questions.
>
> Thanks,
> Xuefu
>
>
>
>
> On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi Xuefu,
>
> Just wanted to summarize my opinion on the one topic (temporary functions).
>
> My preference would be to make temporary functions always 3-part qualified
> (as a result that would prohibit overriding built-in functions). Having
> said that if the community decides that it's better to allow overriding
> built-in functions I am fine with it and can commit to that decision.
>
> I wanted to ask if you could clarify a few points for me around that
> option.
>
>    1. Would you enforce temporary functions to be always just a single
>    name (without db & cat) as hive does, or would you allow also 3 or even 2
>    part identifiers?
>    2. Assuming 2/3-part paths. How would you register a function from a
>    following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>    all functions named 'func' in all databases named 'db' in all catalogs? Or
>    would you shadow only function 'func' in database 'db' in current catalog?
>    3. This point is still under discussion, but was mentioned a few
>    times, that maybe we want to enable syntax cat.func for "external built-in
>    functions". How would that affect statement from previous point? Would
>    'db.func' shadow "external built-in function" in 'db' catalog or user
>    functions as in point 2? Or maybe both?
>    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>    paths. Would the function resolution be actually as follows?:
>       1. temporary functions (1-part path)
>       2. built-in functions
>       3. temporary functions (2-part path)
>       4. 2-part catalog functions a.k.a. "external built-in functions"
>       (cat + func) - this is still under discussion, if we want that in the other
>       focal point
>       5. temporary functions (3-part path)
>       6. 3-part catalog functions a.k.a. user functions
>
> I would be really grateful if you could explain me those questions, thanks.
>
> BTW, Thank you all for a healthy discussion.
>
> Best,
>
> Dawid
> On 04/09/2019 23:25, Xuefu Z wrote:
>
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>
>
> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>
> are
>
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most
>
> of
>
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> *map* or *struct*. Though hive behaves reasonable well if I manage to
> overwrite a function. When I drop the temporary function the native
> function is still available.
>
> Spark:
>
> Spark's behavior imho is super bad.
>
> Theoretically I could overwrite all functions. I was able e.g. to
> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> FUNCTION syntax. Otherwise I get an exception that a function already
> exists. However when I used the CAST function in a query it used the
> native, built-in one.
>
> When I overwrote current_date() function, it was used in a query, but it
> completely replaces the built-in function and I can no longer use the
> native function in any way. I cannot also drop the temporary function. I
> get:
>
> *    Error in query: Cannot drop native function 'current_date';*
>
> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> with a database. Temporary functions are always represented as a single
> name.
>
> In my opinion neither of the systems have consistent behavior. Generally
> speaking I think overwriting any system provided functions is just
> dangerous.
>
> Regarding Jark's concerns. Such functions would be registered in a
>
> current
>
> catalog/database schema, so a user could still use its own function, but
> would have to fully qualify the function (because built-in functions take
> precedence). Moreover users would have the same problem with permanent
> functions. Imagine a user have a permanent function 'cat.db.explode'. In
> 1.9 the user could use just the 'explode' function as long as the 'cat' &
> 'db' were the default catalog & database. If we introduce 'explode'
> built-in function in 1.10, the user has to fully qualify the function.
>
> Best,
>
> Dawid
> On 04/09/2019 15:19, Timo Walther wrote:
>
> Hi all,
>
> thanks for the healthy discussion. It is already a very long discussion
> with a lot of text. So I will just post my opinion to a couple of
> statements:
>
>
> Hive built-in functions are not part of Flink built-in functions, they
>
> are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored in
> every Hive metastore catalog that is freshly created but are a set of
> functions that are listed somewhere and made available.
>
>
> ambiguous functions reference just shouldn't be resolved to a different
>
> catalog
>
> I agree. They should not be resolved to a different catalog. That's why I
> am suggesting to split the concept of built-in functions and catalog
>
> lookup
>
> semantics.
>
>
> I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions, both
>
> in
>
> the root of the Presto source."
>
>
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions as
> built-ins but we will end-up in a dependency hell in the near future if
>
> we
>
> don't introduce a pluggable approach. Library functions is what you also
> suggest but storing them in a catalog means to always fully qualify them
>
> or
>
> modifying the existing catalog design that was inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
>
> why would users name a temporary function the same as a built-in
>
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
>
> I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and Hive
> is just one of many systems that we can support. Not every built-in
> function behaves and will behave exactly like Hive.
>
>
> regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as a
> reference. I don't think that a user can overwrite a built-in function
> there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we
>
> will
>
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that
>
> support
>
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>
> )
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and
>
> must
>
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need
>
> a
>
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that Bowen's original proposal was trying to avoid this, but this
>
> could
>
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in
>
> function
>
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
>
> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>
> the
>
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach,
>
> let's
>
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So
>
> user
>
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this
>
> is a
>
> very popular function
> and add it into Flink's built-in functions, but with different
>
> behavior
>
> as
>
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly
>
> what
>
> kind of
> built-in functions he would like to use. He can just tell framework
>
> to
>
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them,
>
> but
>
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>
> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>
> section
>
> in the google doc was updated, please take a look first and let me
>
> know
>
> if
>
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions
>
> can
>
> only
>
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but
>
> not
>
> functions. It would be quite convinient for users to use Hive
>
> built-ins
>
> even if they use a Confluent schema registry or just the in-memory
>
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink
>
> built-in
>
> functions, they are catalog functions, thus if the current catalog
>
> is
>
> not a
>
> HiveCatalog but, say, a schema registry catalog, ambiguous
>
> functions
>
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced
>
> across
>
> catalog, but it doesn't have db namespace and we currently just
>
> don't
>
> have
>
> a SQL syntax for it. It can be enabled when such a SQL syntax is
>
> defined,
>
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and
>
> built-in
>
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>
> maybe
>
> we add more experimental functions in the future or function for
>
> some
>
> special application area (Geo functions, ML functions). A data
>
> platform
>
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have
>
> the
>
> "external built-in functions" anymore and currently the built-in
>
> function
>
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about
>
> it:
>
> - I don't know if any other databases handle built-in functions
>
> like
>
> that.
>
> Maybe you can give some examples? IMHO, built-in functions are
>
> system
>
> info
>
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some
>
> other
>
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by
>
> other
>
> approaches too. E.g. experimental functions can be taken good care
>
> of
>
> by
>
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable
>
> built-in
>
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g.
>
> how
>
> do
>
> you handle built-in functions in different modules but different
>
> names?
>
> In short, I'm not sure if it really stands and it looks like an
>
> overkill
>
> to me. I'd rather not go to that route. Related discussion can be
>
> on
>
> its
>
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a
>
> static
>
> list like in BuiltInFunctionDefinitions, a platform team should be
>
> able
>
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree
>
> with
>
> Kurt
>
> that we should unify built-in function (external or internal)
>
> under a
>
> common layer. However, the resolution order should be:
>    1. built-in functions
>    2. temporary functions
>    3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's
>
> built-in
>
> functions. If you take a look at other vendors, like SQL Server
>
> they
>
> also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external
>
> or
>
> internal) under a common layer.“ <- I don't think this is what Kurt
>
> means.
>
> Kurt and I are in favor of unifying built-in functions of external
>
> systems
>
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed.
>
> Temporary
>
> functions have a lifespan over a session and are only visible to
>
> the
>
> session owner, they are unique to each user, and users create them
>
> on
>
> purpose to be the highest priority in order to overwrite system
>
> info
>
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same
>
> as a
>
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we
>
> should
>
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change
>
> their
>
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will
>
> lead
>
> to
>
> changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special
>
> catalog
>
> functions.
>
> Re> Another question is, does this fallback include all
>
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super
> critical to the scope of this FLIP, thus I'd like to leave it to
>
> future
>
> efforts as a nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them
>
> both
>
> under
>
> "catalog" concept, by extending catalog function to make it have
>
> ability to
>
> have built-in catalog functions. Some benefits I can see from
>
> this
>
> approach:
>
> 1. We don't have to introduce new concept like external built-in
>
> functions.
>
> Actually I don't see a full story about how to treat a built-in
>
> functions, and it
>
> seems a little bit disrupt with catalog. As a result, you have
>
> to
>
> make
>
> some restriction
>
> like "hive built-in functions can only be used when current
>
> catalog
>
> is
>
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some
>
> part
>
> of
>
> the doc. I've modified those sections, and they are up to date
>
> now.
>
> In short, now built-in function of external systems are defined
>
> as
>
> a
>
> special kind of catalog function in Flink, and handled by Flink
>
> as
>
> following:
> - An external built-in function must be associated with a catalog
>
> for
>
> the purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous
>
> function
>
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a
>
> schema/database
>
> namespace
> - It goes thru the same instantiation logic as other user defined
> catalog functions in the external system
>
> Please take another look at the doc, and let me know if you have
>
> more
>
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org>
>
> wrote:
>
> Hi Kurt,
>
> it should not affect the functions and operations we currently
>
> have
>
> in
>
> SQL. It just categorizes the available built-in functions. It is
>
> kind
>
> of
> an orthogonal concept to the catalog API but built-in functions
>
> deserve
>
> this special kind of treatment. CatalogFunction still fits
>
> perfectly
>
> in
>
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way
>
> but
>
> with
>
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
>
> Does this only affect the functions and operations we currently
>
> have
>
> in SQL
>
> and
> have no effect on tables, right? Looks like this is an
>
> orthogonal
>
> concept
>
> with Catalog?
> If the answer are both yes, then the catalog function will be a
>
> weird
>
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
>
> wrote:
>
> The way you proposed are basically the same as what Calcite
>
> does, I
>
> think
>
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>
> ,写道:
>
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
>
> Thanks Bowen for bring up this topic, I think it’s a useful
>
> refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and
>
> operators
>
> of Hive, here is a solution from Apache Calcite, the Calcite
>
> way
>
> is
>
> to make
>
> every dialect operators a “Library”, user can specify which
>
> libraries they
>
> want to use for a sql query. The builtin operators always
>
> comes
>
> as
>
> the
>
> first class objects and the others are used from the order
>
> they
>
> appears.
>
> Maybe you can take a reference.
>
> [1]
>
>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>
> ,写道:
>
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's
>
> FunctionCatalog.
>
> It's critically helpful to improve function usability in
>
> SQL.
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with
>
> fully/partially
>
> qualified name
> - redefines function resolution order for ambiguous
>
> function
>
> reference
>
> - adds support for Hive's rich built-in functions (support
>
> for
>
> Hive
>
> user
>
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
>
>
>
>
>
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
I agree the consequences of the decision are substantial. Let's see what
others think.

-- Catalog functions are defined by users, and we suppose they can
drop/alter it in any way they want. Thus, overwriting a catalog function
doesn't seem to be a strong use case that we should be concerned about.
Rather, there are known use case for overwriting built-in functions.

Not always users are in full control of the catalog functions. There is
also the case where different teams manage the catalog & use the
catalog. As for overriding built-in functions with 3-part approach user
can always use an equally named function from a catalog. E.g. to override

/    SELECT explode(arr) FROM .../

user can always write:

/    SELECT db.explode(arr) FROM .../

Best,

Dawid
//

On 06/09/2019 10:54, Xuefu Z wrote:
> Hi Dawid,
>
> Thank you for your summary. While the only difference in the two proposals
> is one- or three-part in naming, the consequence would be substantial.
>
> To me, there are two major use cases of temporary functions compared to
> persistent ones:
> 1. Temporary in nature and auto managed by the session. More often than
> not, admin doesn't even allow user to create persistent functions.
> 2. Provide an opportunity to overwriting system built-in functions.
>
> Since built-in functions has one-part name, requiring three-part name for
> temporary functions eliminates the overwriting opportunity.
>
> One-part naming essentially puts all temp functions under a single
> namespace and simplifies function resolution, such as we don't need to
> consider the case of a temp function and a persistent function with the
> same name under the same database.
>
> I agree having three-parts does have its merits, such as consistency with
> other temporary objects (table) and minor difference between temp vs
> catalog functions. However, there is a slight difference between tables and
> function in that there is no built-in table in SQL so there is no need to
> overwrite it.
>
> I'm not sure if I fully agree the benefits you listed as the advantages of
> the three-part naming of temp functions.
>   -- Allowing overwriting built-in functions is a benefit and the solution
> for disallowing certain overwriting shouldn't be totally banning it.
>   -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Thus, personally I would prefer one-part name for temporary functions. In
> lack of SQL standard on this, I certainly like to get opinions from others
> to see if a consensus can be eventually reached.
>
> (To your point on modular approach to support external built-in functions,
> we saw the value and are actively looking into it. Thanks for sharing your
> opinion on that.)
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hi Xuefu,
>>
>> Thank you for your answers.
>>
>> Let me summarize my understanding. In principle we differ only in regards
>> to the fact if a temporary function can be only 1-part or only 3-part
>> identified. I can reconfirm that if the community decides it prefers the
>> 1-part approach I will commit to that, with the assumption that we will
>> force ONLY 1-part function names. (We will parse identifier and throw
>> exception if a user tries to register e.g. db.temp_func).
>>
>> My preference is though the 3-part approach:
>>
>>    - there are some functions that it makes no sense to override, e.g.
>>    CAST, moreover I'm afraid that allowing overriding such will lead to high
>>    inconsistency, similar to those that I mentioned spark has
>>    - you cannot shadow a fully-qualified function. (If a user fully
>>    qualifies his/her objects in a SQL query, which is often considered a good
>>    practice)
>>    - it does not differentiate between functions & temporary functions.
>>    Temporary functions just differ with regards to their life-cycle. The
>>    registration & usage is exactly the same.
>>
>> As it can be seen, the proposed concept regarding temp function and
>> function resolution is quite simple.
>>
>> Both approaches are equally simple. I would even say the 3-part approach
>> is slightly simpler as it does not have to care about some special built-in
>> functions such as CAST.
>>
>> I don't want to express my opinion on the differentiation between built-in
>> functions and "external" built-in functions in this thread as it is rather
>> orthogonal, but I also like the modular approach and I definitely don't
>> like the special syntax "cat::function". I think it's better to stick to a
>> standard or at least other proved solutions from other systems.
>>
>> Best,
>>
>> Dawid
>> On 05/09/2019 10:12, Xuefu Z wrote:
>>
>> Hi David,
>>
>> Thanks for sharing your thoughts and  request for clarifications. I believe
>> that I fully understood your proposal, which does has its merit. However,
>> it's different from ours. Here are the answers to your questions:
>>
>> Re #1: yes, the temp functions in the proposal are global and have just
>> one-part names, similar to built-in functions. Two- or three-part names are
>> not allowed.
>>
>> Re #2: not applicable as two- or three-part names are disallowed.
>>
>> Re #3: same as above. Referencing external built-in functions is achieved
>> either implicitly (only the built-in functions in the current catalogs are
>> considered) or via special syntax such as cat::function. However, we are
>> looking into the modular approach that Time suggested with other feedback
>> received from the community.
>>
>> Re #4: the resolution order goes like the following in our proposal:
>>
>> 1. temporary functions
>> 2. bulit-in functions (including those augmented by add-on modules)
>> 3. built-in functions in current catalog (this will not be needed if the
>> special syntax "cat::function" is required)
>> 4. functions in current catalog and db.
>>
>> If we go with the modular approach and make external built-in functions as
>> an add-on module, the 2 and 3 above will be combined. In essence, the
>> resolution order is equivalent in the two approaches.
>>
>> By the way, resolution order matters only for simple name reference. For
>> names such as db.function (interpreted as current_cat/db/function) or
>> cat.db.function, the reference is unambiguous, so on resolution is needed.
>>
>> As it can be seen, the proposed concept regarding temp function and
>> function resolution is quite simple. Additionally, the proposed resolution
>> order allows temp function to shadow a built-in function, which is
>> important (though not decisive) in our opinion.
>>
>> I started liking the modular approach as the resolution order will only
>> include 1, 2, and 4, which is simpler and more generic. That's why I
>> suggested we look more into this direction.
>>
>> Please let me know if there are further questions.
>>
>> Thanks,
>> Xuefu
>>
>>
>>
>>
>> On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
>> wrote:
>>
>>
>> Hi Xuefu,
>>
>> Just wanted to summarize my opinion on the one topic (temporary functions).
>>
>> My preference would be to make temporary functions always 3-part qualified
>> (as a result that would prohibit overriding built-in functions). Having
>> said that if the community decides that it's better to allow overriding
>> built-in functions I am fine with it and can commit to that decision.
>>
>> I wanted to ask if you could clarify a few points for me around that
>> option.
>>
>>    1. Would you enforce temporary functions to be always just a single
>>    name (without db & cat) as hive does, or would you allow also 3 or even 2
>>    part identifiers?
>>    2. Assuming 2/3-part paths. How would you register a function from a
>>    following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>>    all functions named 'func' in all databases named 'db' in all catalogs? Or
>>    would you shadow only function 'func' in database 'db' in current catalog?
>>    3. This point is still under discussion, but was mentioned a few
>>    times, that maybe we want to enable syntax cat.func for "external built-in
>>    functions". How would that affect statement from previous point? Would
>>    'db.func' shadow "external built-in function" in 'db' catalog or user
>>    functions as in point 2? Or maybe both?
>>    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>>    paths. Would the function resolution be actually as follows?:
>>       1. temporary functions (1-part path)
>>       2. built-in functions
>>       3. temporary functions (2-part path)
>>       4. 2-part catalog functions a.k.a. "external built-in functions"
>>       (cat + func) - this is still under discussion, if we want that in the other
>>       focal point
>>       5. temporary functions (3-part path)
>>       6. 3-part catalog functions a.k.a. user functions
>>
>> I would be really grateful if you could explain me those questions, thanks.
>>
>> BTW, Thank you all for a healthy discussion.
>>
>> Best,
>>
>> Dawid
>> On 04/09/2019 23:25, Xuefu Z wrote:
>>
>> Thank all for the sharing thoughts. I think we have gathered some useful
>> initial feedback from this long discussion with a couple of focal points
>> sticking out.
>>
>>  We will go back to do more research and adapt our proposal. Once it's
>> ready, we will ask for a new round of review. If there is any disagreement,
>> we will start a new discussion thread on each rather than having a mega
>> discussion like this.
>>
>> Thanks to everyone for participating.
>>
>> Regards,
>> Xuefu
>>
>>
>> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>>
>>
>> Let me try to summarize and conclude the long thread so far:
>>
>> 1. For order of temp function v.s. built-in function:
>>
>> I think Dawid's point that temp function should be of fully qualified path
>> is a better reasoning to back the newly proposed order, and i agree we
>> don't need to follow Hive/Spark.
>>
>> However, I'd rather not change fundamentals of temporary functions in this
>> FLIP. It belongs to a bigger story of how temporary objects should be
>> redefined and be handled uniformly - currently temporary tables and views
>> (those registered from TableEnv#registerTable()) behave different than what
>> Dawid propose for temp functions, and we need a FLIP to just unify their
>> APIs and behaviors.
>>
>> I agree that backward compatibility is not an issue w.r.t Jark's points.
>>
>> ***Seems we do have consensus that it's acceptable to prevent users
>> registering a temp function in the same name as a built-in function. To
>> help us move forward, I'd like to propose setting such a restraint on temp
>> functions in this FLIP to simplify the design and avoid disputes.*** It
>> will also leave rooms for improvements in the future.
>>
>>
>> 2. For Hive built-in function:
>>
>> Thanks Timo for providing the Presto and Postgres examples. I feel modular
>> built-in functions can be a good fit for the geo and ml example as a native
>> Flink extension, but not sure if it fits well with external integrations.
>> Anyway, I think modular built-in functions is a bigger story and can be on
>> its own thread too, and our proposal doesn't prevent Flink from doing that
>> in the future.
>>
>> ***Seems we have consensus that users should be able to use built-in
>> functions of Hive or other external systems in SQL explicitly and
>> deterministically regardless of Flink built-in functions and the potential
>> modular built-in functions, via some new syntax like "mycat::func"? If so,
>> I'd like to propose removing Hive built-in functions from ambiguous
>> function resolution order, and empower users with such a syntax. This way
>> we sacrifice a little convenience for certainty***
>>
>>
>> What do you think?
>>
>> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
>> wrote:
>>
>>
>> Hi,
>>
>> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
>> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>>
>> are
>>
>> very inconsistent in that manner (spark being way worse on that).
>>
>> Hive:
>>
>> You cannot overwrite all the built-in functions. I could overwrite most
>>
>> of
>>
>> the functions I tried e.g. length, e, pi, round, rtrim, but there are
>> functions I cannot overwrite e.g. CAST, ARRAY I get:
>>
>>
>> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>>
>> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
>> *map* or *struct*. Though hive behaves reasonable well if I manage to
>> overwrite a function. When I drop the temporary function the native
>> function is still available.
>>
>> Spark:
>>
>> Spark's behavior imho is super bad.
>>
>> Theoretically I could overwrite all functions. I was able e.g. to
>> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
>> FUNCTION syntax. Otherwise I get an exception that a function already
>> exists. However when I used the CAST function in a query it used the
>> native, built-in one.
>>
>> When I overwrote current_date() function, it was used in a query, but it
>> completely replaces the built-in function and I can no longer use the
>> native function in any way. I cannot also drop the temporary function. I
>> get:
>>
>> *    Error in query: Cannot drop native function 'current_date';*
>>
>> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
>> with a database. Temporary functions are always represented as a single
>> name.
>>
>> In my opinion neither of the systems have consistent behavior. Generally
>> speaking I think overwriting any system provided functions is just
>> dangerous.
>>
>> Regarding Jark's concerns. Such functions would be registered in a
>>
>> current
>>
>> catalog/database schema, so a user could still use its own function, but
>> would have to fully qualify the function (because built-in functions take
>> precedence). Moreover users would have the same problem with permanent
>> functions. Imagine a user have a permanent function 'cat.db.explode'. In
>> 1.9 the user could use just the 'explode' function as long as the 'cat' &
>> 'db' were the default catalog & database. If we introduce 'explode'
>> built-in function in 1.10, the user has to fully qualify the function.
>>
>> Best,
>>
>> Dawid
>> On 04/09/2019 15:19, Timo Walther wrote:
>>
>> Hi all,
>>
>> thanks for the healthy discussion. It is already a very long discussion
>> with a lot of text. So I will just post my opinion to a couple of
>> statements:
>>
>>
>> Hive built-in functions are not part of Flink built-in functions, they
>>
>> are catalog functions
>>
>> That is not entirely true. Correct me if I'm wrong but I think Hive
>> built-in functions are also not catalog functions. They are not stored in
>> every Hive metastore catalog that is freshly created but are a set of
>> functions that are listed somewhere and made available.
>>
>>
>> ambiguous functions reference just shouldn't be resolved to a different
>>
>> catalog
>>
>> I agree. They should not be resolved to a different catalog. That's why I
>> am suggesting to split the concept of built-in functions and catalog
>>
>> lookup
>>
>> semantics.
>>
>>
>> I don't know if any other databases handle built-in functions like that
>>
>> What I called "module" is:
>> - Extension in Postgres [1]
>> - Plugin in Presto [2]
>>
>> Btw. Presto even mentions example modules that are similar to the ones
>> that we will introduce in the near future both for ML and System XYZ
>> compatibility:
>> "See either the presto-ml module for machine learning functions or the
>> presto-teradata-functions module for Teradata-compatible functions, both
>>
>> in
>>
>> the root of the Presto source."
>>
>>
>> functions should be either built-in already or just libraries
>>
>> functions,
>>
>> and library functions can be adapted to catalog APIs or of some other
>> syntax to use
>>
>> Regarding "built-in already", of course we can add a lot of functions as
>> built-ins but we will end-up in a dependency hell in the near future if
>>
>> we
>>
>> don't introduce a pluggable approach. Library functions is what you also
>> suggest but storing them in a catalog means to always fully qualify them
>>
>> or
>>
>> modifying the existing catalog design that was inspired by the standard.
>>
>> I don't think "it brings in even more complicated scenarios to the
>> design", it just does clear separation of concerns. Integrating the
>> functionality into the current design makes the catalog API more
>> complicated.
>>
>>
>> why would users name a temporary function the same as a built-in
>>
>> function then?
>>
>> Because you never know what users do. If they don't, my suggested
>> resolution order should not be a problem, right?
>>
>>
>> I don't think hive functions deserves be a function module
>>
>> Our goal is not to create a Hive clone. We need to think forward and Hive
>> is just one of many systems that we can support. Not every built-in
>> function behaves and will behave exactly like Hive.
>>
>>
>> regarding temporary functions, there are few systems that support it
>>
>> IMHO Spark and Hive are not always the best examples for consistent
>> design. Systems like Postgres, Presto, or SQL Server should be used as a
>> reference. I don't think that a user can overwrite a built-in function
>> there.
>>
>> Regards,
>> Timo
>>
>> [1] https://www.postgresql.org/docs/10/extend-extensions.html
>> [2] https://prestodb.github.io/docs/current/develop/functions.html
>>
>>
>> On 04.09.19 13:44, Jark Wu wrote:
>>
>> Hi all,
>>
>> Regarding #1 temp function <> built-in function and naming.
>> I'm fine with temp functions should precede built-in function and can
>> override built-in functions (we already support to override built-in
>> function in 1.9).
>> If we don't allow the same name as a built-in function, I'm afraid we
>>
>> will
>>
>> have compatibility issues in the future.
>> Say users register a user defined function named "explode" in 1.9, and we
>> support a built-in "explode" function in 1.10.
>> Then the user's jobs which call the registered "explode" function in 1.9
>> will all fail in 1.10 because of naming conflict.
>>
>> Regarding #2 "External" built-in functions.
>> I think if we store external built-in functions in catalog, then
>> "hive1::sqrt" is a good way to go.
>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>> built-in functions as Timo suggested above.
>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>> function set as built-in functions in an easy way.
>>
>> Best,
>> Jark
>>
>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>>
>> Hi David,
>>
>> Thank you for sharing your findings. It seems to me that there is no SQL
>> standard regarding temporary functions. There are few systems that
>>
>> support
>>
>> it. Here are what I have found:
>>
>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>> 2. Spark: basically follows Hive (
>>
>>
>>
>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>
>> )
>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
>> behavior. (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>>
>> )
>>
>> Because of lack of standard, it's perfectly fine for Flink to define
>> whatever it sees appropriate. Thus, your proposal (no overwriting and
>>
>> must
>>
>> have DB as holder) is one option. The advantage is simplicity, The
>> downside
>> is the deviation from Hive, which is popular and de facto standard in big
>> data world.
>>
>> However, I don't think we have to follow Hive. More importantly, we need
>>
>> a
>>
>> consensus. I have no objection if your proposal is generally agreed upon.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
>> wrote:
>>
>> Hi all,
>>
>> Just an opinion on the built-in <> temporary functions resolution and
>> NAMING issue. I think we should not allow overriding the built-in
>> functions, as this may pose serious issues and to be honest is rather
>> not feasible and would require major rework. What happens if a user
>> wants to override CAST? Calls to that function are generated at
>> different layers of the stack that unfortunately does not always go
>> through the Catalog API (at least yet). Moreover from what I've checked
>> no other systems allow overriding the built-in functions. All the
>> systems I've checked so far register temporary functions in a
>> database/schema (either special database for temporary functions, or
>> just current database). What I would suggest is to always register
>> temporary functions with a 3 part identifier. The same way as tables,
>> views etc. This effectively means you cannot override built-in
>> functions. With such approach it is natural that the temporary functions
>> end up a step lower in the resolution order:
>>
>> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>>
>> 2. temporary functions (always 3 part path)
>>
>> 3. catalog functions (always 3 part path)
>>
>> Let me know what do you think.
>>
>> Best,
>>
>> Dawid
>>
>> On 04/09/2019 06:13, Bowen Li wrote:
>>
>> Hi,
>>
>> I agree with Xuefu that the main controversial points are mainly the
>>
>> two
>>
>> places. My thoughts on them:
>>
>> 1) Determinism of referencing Hive built-in functions. We can either
>>
>> remove
>>
>> Hive built-in functions from ambiguous function resolution and require
>> users to use special syntax for their qualified names, or add a config
>>
>> flag
>>
>> to catalog constructor/yaml for turning on and off Hive built-in
>>
>> functions
>>
>> with the flag set to 'false' by default and proper doc added to help
>>
>> users
>>
>> make their decisions.
>>
>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>
>> function
>>
>> resolution order. We believe Flink temp functions should precede Flink
>> built-in functions, and I have presented my reasons. Just in case if we
>> cannot reach an agreement, I propose forbid users registering temp
>> functions in the same name as a built-in function, like MySQL's
>>
>> approach,
>>
>> for the moment. It won't have any performance concern, since built-in
>> functions are all in memory and thus cost of a name check will be
>>
>> really
>>
>> trivial.
>>
>>
>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>>
>>  From what I have seen, there are a couple of focal disagreements:
>>
>> 1. Resolution order: temp function --> flink built-in function -->
>>
>> catalog
>>
>> function vs flink built-in function --> temp function -> catalog
>>
>> function.
>>
>> 2. "External" built-in functions: how to treat built-in functions in
>> external system and how users reference them
>>
>> For #1, I agree with Bowen that temp function needs to be at the
>>
>> highest
>>
>> priority because that's how a user might overwrite a built-in function
>> without referencing a persistent, overwriting catalog function with a
>>
>> fully
>>
>> qualified name. Putting built-in functions at the highest priority
>> eliminates that usage.
>>
>> For #2, I saw a general agreement on referencing "external" built-in
>> functions such as those in Hive needs to be explicit and deterministic
>>
>> even
>>
>> though different approaches are proposed. To limit the scope and
>>
>> simply
>>
>> the
>>
>> usage, it seems making sense to me to introduce special syntax for
>>
>> user  to
>>
>> explicitly reference an external built-in function such as hive1::sqrt
>>
>> or
>>
>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>>
>> call
>>
>> hive1.getFunction(ObjectPath functionName) where the database name is
>> absent for bulit-in functions available in that catalog hive1. I
>>
>> understand
>>
>> that Bowen's original proposal was trying to avoid this, but this
>>
>> could
>>
>> turn out to be a clean and simple solution.
>>
>> (Timo's modular approach is great way to "expand" Flink's built-in
>>
>> function
>>
>> set, which seems orthogonal and complementary to this, which could be
>> tackled in further future work.)
>>
>> I'd be happy to hear further thoughts on the two points.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
>>
>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>>
>> the
>>
>> same
>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>> suggestion.
>>
>> The reason is backward compatibility. If we follow Bowen's approach,
>>
>> let's
>>
>> say we
>> first find function in Flink's built-in functions, and then hive's
>> built-in. For example, `foo`
>> is not supported by Flink, but hive has such built-in function. So
>>
>> user
>>
>> will have hive's
>> behavior for function `foo`. And in next release, Flink realize this
>>
>> is a
>>
>> very popular function
>> and add it into Flink's built-in functions, but with different
>>
>> behavior
>>
>> as
>>
>> hive's. So in next
>> release, the behavior changes.
>>
>> With Timo's approach, IIUC user have to tell the framework explicitly
>>
>> what
>>
>> kind of
>> built-in functions he would like to use. He can just tell framework
>>
>> to
>>
>> abandon Flink's built-in
>> functions, and use hive's instead. User can only choose between them,
>>
>> but
>>
>> not use
>> them at the same time. I think this approach is more predictable.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>>
>> Hi all,
>>
>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>>
>> section
>>
>> in the google doc was updated, please take a look first and let me
>>
>> know
>>
>> if
>>
>> you have more questions.
>>
>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Timo,
>>
>> Re> 1) We should not have the restriction "hive built-in functions
>>
>> can
>>
>> only
>>
>> be used when current catalog is hive catalog". Switching a catalog
>> should only have implications on the cat.db.object resolution but
>>
>> not
>>
>> functions. It would be quite convinient for users to use Hive
>>
>> built-ins
>>
>> even if they use a Confluent schema registry or just the in-memory
>>
>> catalog.
>>
>> There might be a misunderstanding here.
>>
>> First of all, Hive built-in functions are not part of Flink
>>
>> built-in
>>
>> functions, they are catalog functions, thus if the current catalog
>>
>> is
>>
>> not a
>>
>> HiveCatalog but, say, a schema registry catalog, ambiguous
>>
>> functions
>>
>> reference just shouldn't be resolved to a different catalog.
>>
>> Second, Hive built-in functions can potentially be referenced
>>
>> across
>>
>> catalog, but it doesn't have db namespace and we currently just
>>
>> don't
>>
>> have
>>
>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>
>> defined,
>>
>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>
>> 2) I would propose to have separate concepts for catalog and
>>
>> built-in
>>
>> functions. In particular it would be nice to modularize built-in
>> functions. Some built-in functions are very crucial (like AS, CAST,
>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>
>> maybe
>>
>> we add more experimental functions in the future or function for
>>
>> some
>>
>> special application area (Geo functions, ML functions). A data
>>
>> platform
>>
>> team might not want to make every built-in function available. Or a
>> function module like ML functions is in a different Maven module.
>>
>> I think this is orthogonal to this FLIP, especially we don't have
>>
>> the
>>
>> "external built-in functions" anymore and currently the built-in
>>
>> function
>>
>> category remains untouched.
>>
>> But just to share some thoughts on the proposal, I'm not sure about
>>
>> it:
>>
>> - I don't know if any other databases handle built-in functions
>>
>> like
>>
>> that.
>>
>> Maybe you can give some examples? IMHO, built-in functions are
>>
>> system
>>
>> info
>>
>> and should be deterministic, not depending on loaded libraries. Geo
>> functions should be either built-in already or just libraries
>>
>> functions,
>>
>> and library functions can be adapted to catalog APIs or of some
>>
>> other
>>
>> syntax to use
>> - I don't know if all use cases stand, and many can be achieved by
>>
>> other
>>
>> approaches too. E.g. experimental functions can be taken good care
>>
>> of
>>
>> by
>>
>> documentations, annotations, etc
>> - the proposal basically introduces some concept like a pluggable
>>
>> built-in
>>
>> function catalog, despite the already existing catalog APIs
>> - it brings in even more complicated scenarios to the design. E.g.
>>
>> how
>>
>> do
>>
>> you handle built-in functions in different modules but different
>>
>> names?
>>
>> In short, I'm not sure if it really stands and it looks like an
>>
>> overkill
>>
>> to me. I'd rather not go to that route. Related discussion can be
>>
>> on
>>
>> its
>>
>> own thread.
>>
>> 3) Following the suggestion above, we can have a separate discovery
>> mechanism for built-in functions. Instead of just going through a
>>
>> static
>>
>> list like in BuiltInFunctionDefinitions, a platform team should be
>>
>> able
>>
>> to select function modules like
>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>> HiveFunctions) or via service discovery;
>>
>> Same as above. I'll leave it to its own thread.
>>
>> re > 3) Dawid and I discussed the resulution order again. I agree
>>
>> with
>>
>> Kurt
>>
>> that we should unify built-in function (external or internal)
>>
>> under a
>>
>> common layer. However, the resolution order should be:
>>    1. built-in functions
>>    2. temporary functions
>>    3. regular catalog resolution logic
>> Otherwise a temporary function could cause clashes with Flink's
>>
>> built-in
>>
>> functions. If you take a look at other vendors, like SQL Server
>>
>> they
>>
>> also do not allow to overwrite built-in functions.
>>
>> ”I agree with Kurt that we should unify built-in function (external
>>
>> or
>>
>> internal) under a common layer.“ <- I don't think this is what Kurt
>>
>> means.
>>
>> Kurt and I are in favor of unifying built-in functions of external
>>
>> systems
>>
>> and catalog functions. Did you type a mistake?
>>
>> Besides, I'm not sure about the resolution order you proposed.
>>
>> Temporary
>>
>> functions have a lifespan over a session and are only visible to
>>
>> the
>>
>> session owner, they are unique to each user, and users create them
>>
>> on
>>
>> purpose to be the highest priority in order to overwrite system
>>
>> info
>>
>> (built-in functions in this case).
>>
>> In your case, why would users name a temporary function the same
>>
>> as a
>>
>> built-in function then? Since using that name in ambiguous function
>> reference will always be resolved to built-in functions, creating a
>> same-named temp function would be meaningless in the end.
>>
>>
>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Jingsong,
>>
>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>
>> should
>>
>> not introduce interfaces to influence the framework. To make
>> Flink itself more powerful, we should implement the functions
>> we need to add.
>>
>> Yes, please see the doc.
>>
>> Re> 2.Non-flink built-in functions are easy for users to change
>>
>> their
>>
>> behavior. If we support some flink built-in functions in the
>> future but act differently from non-flink built-in, this will
>>
>> lead
>>
>> to
>>
>> changes in user behavior.
>>
>> There's no such concept as "external built-in functions" any more.
>> Built-in functions of external systems will be treated as special
>>
>> catalog
>>
>> functions.
>>
>> Re> Another question is, does this fallback include all
>>
>> hive built-in functions? As far as I know, some hive functions
>> have some hacky. If possible, can we start with a white list?
>> Once we implement some functions to flink built-in, we can
>> also update the whitelist.
>>
>> Yes, that's something we thought of too. I don't think it's super
>> critical to the scope of this FLIP, thus I'd like to leave it to
>>
>> future
>>
>> efforts as a nice-to-have feature.
>>
>>
>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Kurt,
>>
>> Re: > What I want to propose is we can merge #3 and #4, make them
>>
>> both
>>
>> under
>>
>> "catalog" concept, by extending catalog function to make it have
>>
>> ability to
>>
>> have built-in catalog functions. Some benefits I can see from
>>
>> this
>>
>> approach:
>>
>> 1. We don't have to introduce new concept like external built-in
>>
>> functions.
>>
>> Actually I don't see a full story about how to treat a built-in
>>
>> functions, and it
>>
>> seems a little bit disrupt with catalog. As a result, you have
>>
>> to
>>
>> make
>>
>> some restriction
>>
>> like "hive built-in functions can only be used when current
>>
>> catalog
>>
>> is
>>
>> hive catalog".
>>
>> Yes, I've unified #3 and #4 but it seems I didn't update some
>>
>> part
>>
>> of
>>
>> the doc. I've modified those sections, and they are up to date
>>
>> now.
>>
>> In short, now built-in function of external systems are defined
>>
>> as
>>
>> a
>>
>> special kind of catalog function in Flink, and handled by Flink
>>
>> as
>>
>> following:
>> - An external built-in function must be associated with a catalog
>>
>> for
>>
>> the purpose of decoupling flink-table and external systems.
>> - It always resides in front of catalog functions in ambiguous
>>
>> function
>>
>> reference order, just like in its own external system
>> - It is a special catalog function that doesn’t have a
>>
>> schema/database
>>
>> namespace
>> - It goes thru the same instantiation logic as other user defined
>> catalog functions in the external system
>>
>> Please take another look at the doc, and let me know if you have
>>
>> more
>>
>> questions.
>>
>>
>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org>
>>
>> wrote:
>>
>> Hi Kurt,
>>
>> it should not affect the functions and operations we currently
>>
>> have
>>
>> in
>>
>> SQL. It just categorizes the available built-in functions. It is
>>
>> kind
>>
>> of
>> an orthogonal concept to the catalog API but built-in functions
>>
>> deserve
>>
>> this special kind of treatment. CatalogFunction still fits
>>
>> perfectly
>>
>> in
>>
>> there because the regular catalog object resolution logic is not
>> affected. So tables and functions are resolved in the same way
>>
>> but
>>
>> with
>>
>> built-in functions that have priority as in the original design.
>>
>> Regards,
>> Timo
>>
>>
>> On 03.09.19 15:26, Kurt Young wrote:
>>
>> Does this only affect the functions and operations we currently
>>
>> have
>>
>> in SQL
>>
>> and
>> have no effect on tables, right? Looks like this is an
>>
>> orthogonal
>>
>> concept
>>
>> with Catalog?
>> If the answer are both yes, then the catalog function will be a
>>
>> weird
>>
>> concept?
>>
>> Best,
>> Kurt
>>
>>
>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
>>
>> wrote:
>>
>> The way you proposed are basically the same as what Calcite
>>
>> does, I
>>
>> think
>>
>> we are in the same line.
>>
>> Best,
>> Danny Chan
>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>>
>> ,写道:
>>
>> This sounds exactly as the module approach I mentioned, no?
>>
>> Regards,
>> Timo
>>
>> On 03.09.19 13:42, Danny Chan wrote:
>>
>> Thanks Bowen for bring up this topic, I think it’s a useful
>>
>> refactoring to make our function usage more user friendly.
>>
>> For the topic of how to organize the builtin operators and
>>
>> operators
>>
>> of Hive, here is a solution from Apache Calcite, the Calcite
>>
>> way
>>
>> is
>>
>> to make
>>
>> every dialect operators a “Library”, user can specify which
>>
>> libraries they
>>
>> want to use for a sql query. The builtin operators always
>>
>> comes
>>
>> as
>>
>> the
>>
>> first class objects and the others are used from the order
>>
>> they
>>
>> appears.
>>
>> Maybe you can take a reference.
>>
>> [1]
>>
>>
>>
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>
>> Best,
>> Danny Chan
>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>>
>> ,写道:
>>
>> Hi folks,
>>
>> I'd like to kick off a discussion on reworking Flink's
>>
>> FunctionCatalog.
>>
>> It's critically helpful to improve function usability in
>>
>> SQL.
>>
>>
>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>
>> In short, it:
>> - adds support for precise function reference with
>>
>> fully/partially
>>
>> qualified name
>> - redefines function resolution order for ambiguous
>>
>> function
>>
>> reference
>>
>> - adds support for Hive's rich built-in functions (support
>>
>> for
>>
>> Hive
>>
>> user
>>
>> defined functions was already added in 1.9.0)
>> - clarifies the concept of temporary functions
>>
>> Would love to hear your thoughts.
>>
>> Bowen
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
>>
>>
>>
>>
>>
>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi Dawid,

Thank you for your summary. While the only difference in the two proposals
is one- or three-part in naming, the consequence would be substantial.

To me, there are two major use cases of temporary functions compared to
persistent ones:
1. Temporary in nature and auto managed by the session. More often than
not, admin doesn't even allow user to create persistent functions.
2. Provide an opportunity to overwriting system built-in functions.

Since built-in functions has one-part name, requiring three-part name for
temporary functions eliminates the overwriting opportunity.

One-part naming essentially puts all temp functions under a single
namespace and simplifies function resolution, such as we don't need to
consider the case of a temp function and a persistent function with the
same name under the same database.

I agree having three-parts does have its merits, such as consistency with
other temporary objects (table) and minor difference between temp vs
catalog functions. However, there is a slight difference between tables and
function in that there is no built-in table in SQL so there is no need to
overwrite it.

I'm not sure if I fully agree the benefits you listed as the advantages of
the three-part naming of temp functions.
  -- Allowing overwriting built-in functions is a benefit and the solution
for disallowing certain overwriting shouldn't be totally banning it.
  -- Catalog functions are defined by users, and we suppose they can
drop/alter it in any way they want. Thus, overwriting a catalog function
doesn't seem to be a strong use case that we should be concerned about.
Rather, there are known use case for overwriting built-in functions.

Thus, personally I would prefer one-part name for temporary functions. In
lack of SQL standard on this, I certainly like to get opinions from others
to see if a consensus can be eventually reached.

(To your point on modular approach to support external built-in functions,
we saw the value and are actively looking into it. Thanks for sharing your
opinion on that.)

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi Xuefu,
>
> Thank you for your answers.
>
> Let me summarize my understanding. In principle we differ only in regards
> to the fact if a temporary function can be only 1-part or only 3-part
> identified. I can reconfirm that if the community decides it prefers the
> 1-part approach I will commit to that, with the assumption that we will
> force ONLY 1-part function names. (We will parse identifier and throw
> exception if a user tries to register e.g. db.temp_func).
>
> My preference is though the 3-part approach:
>
>    - there are some functions that it makes no sense to override, e.g.
>    CAST, moreover I'm afraid that allowing overriding such will lead to high
>    inconsistency, similar to those that I mentioned spark has
>    - you cannot shadow a fully-qualified function. (If a user fully
>    qualifies his/her objects in a SQL query, which is often considered a good
>    practice)
>    - it does not differentiate between functions & temporary functions.
>    Temporary functions just differ with regards to their life-cycle. The
>    registration & usage is exactly the same.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple.
>
> Both approaches are equally simple. I would even say the 3-part approach
> is slightly simpler as it does not have to care about some special built-in
> functions such as CAST.
>
> I don't want to express my opinion on the differentiation between built-in
> functions and "external" built-in functions in this thread as it is rather
> orthogonal, but I also like the modular approach and I definitely don't
> like the special syntax "cat::function". I think it's better to stick to a
> standard or at least other proved solutions from other systems.
>
> Best,
>
> Dawid
> On 05/09/2019 10:12, Xuefu Z wrote:
>
> Hi David,
>
> Thanks for sharing your thoughts and  request for clarifications. I believe
> that I fully understood your proposal, which does has its merit. However,
> it's different from ours. Here are the answers to your questions:
>
> Re #1: yes, the temp functions in the proposal are global and have just
> one-part names, similar to built-in functions. Two- or three-part names are
> not allowed.
>
> Re #2: not applicable as two- or three-part names are disallowed.
>
> Re #3: same as above. Referencing external built-in functions is achieved
> either implicitly (only the built-in functions in the current catalogs are
> considered) or via special syntax such as cat::function. However, we are
> looking into the modular approach that Time suggested with other feedback
> received from the community.
>
> Re #4: the resolution order goes like the following in our proposal:
>
> 1. temporary functions
> 2. bulit-in functions (including those augmented by add-on modules)
> 3. built-in functions in current catalog (this will not be needed if the
> special syntax "cat::function" is required)
> 4. functions in current catalog and db.
>
> If we go with the modular approach and make external built-in functions as
> an add-on module, the 2 and 3 above will be combined. In essence, the
> resolution order is equivalent in the two approaches.
>
> By the way, resolution order matters only for simple name reference. For
> names such as db.function (interpreted as current_cat/db/function) or
> cat.db.function, the reference is unambiguous, so on resolution is needed.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple. Additionally, the proposed resolution
> order allows temp function to shadow a built-in function, which is
> important (though not decisive) in our opinion.
>
> I started liking the modular approach as the resolution order will only
> include 1, 2, and 4, which is simpler and more generic. That's why I
> suggested we look more into this direction.
>
> Please let me know if there are further questions.
>
> Thanks,
> Xuefu
>
>
>
>
> On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi Xuefu,
>
> Just wanted to summarize my opinion on the one topic (temporary functions).
>
> My preference would be to make temporary functions always 3-part qualified
> (as a result that would prohibit overriding built-in functions). Having
> said that if the community decides that it's better to allow overriding
> built-in functions I am fine with it and can commit to that decision.
>
> I wanted to ask if you could clarify a few points for me around that
> option.
>
>    1. Would you enforce temporary functions to be always just a single
>    name (without db & cat) as hive does, or would you allow also 3 or even 2
>    part identifiers?
>    2. Assuming 2/3-part paths. How would you register a function from a
>    following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>    all functions named 'func' in all databases named 'db' in all catalogs? Or
>    would you shadow only function 'func' in database 'db' in current catalog?
>    3. This point is still under discussion, but was mentioned a few
>    times, that maybe we want to enable syntax cat.func for "external built-in
>    functions". How would that affect statement from previous point? Would
>    'db.func' shadow "external built-in function" in 'db' catalog or user
>    functions as in point 2? Or maybe both?
>    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>    paths. Would the function resolution be actually as follows?:
>       1. temporary functions (1-part path)
>       2. built-in functions
>       3. temporary functions (2-part path)
>       4. 2-part catalog functions a.k.a. "external built-in functions"
>       (cat + func) - this is still under discussion, if we want that in the other
>       focal point
>       5. temporary functions (3-part path)
>       6. 3-part catalog functions a.k.a. user functions
>
> I would be really grateful if you could explain me those questions, thanks.
>
> BTW, Thank you all for a healthy discussion.
>
> Best,
>
> Dawid
> On 04/09/2019 23:25, Xuefu Z wrote:
>
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>
>
> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>
> are
>
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most
>
> of
>
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> *map* or *struct*. Though hive behaves reasonable well if I manage to
> overwrite a function. When I drop the temporary function the native
> function is still available.
>
> Spark:
>
> Spark's behavior imho is super bad.
>
> Theoretically I could overwrite all functions. I was able e.g. to
> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> FUNCTION syntax. Otherwise I get an exception that a function already
> exists. However when I used the CAST function in a query it used the
> native, built-in one.
>
> When I overwrote current_date() function, it was used in a query, but it
> completely replaces the built-in function and I can no longer use the
> native function in any way. I cannot also drop the temporary function. I
> get:
>
> *    Error in query: Cannot drop native function 'current_date';*
>
> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> with a database. Temporary functions are always represented as a single
> name.
>
> In my opinion neither of the systems have consistent behavior. Generally
> speaking I think overwriting any system provided functions is just
> dangerous.
>
> Regarding Jark's concerns. Such functions would be registered in a
>
> current
>
> catalog/database schema, so a user could still use its own function, but
> would have to fully qualify the function (because built-in functions take
> precedence). Moreover users would have the same problem with permanent
> functions. Imagine a user have a permanent function 'cat.db.explode'. In
> 1.9 the user could use just the 'explode' function as long as the 'cat' &
> 'db' were the default catalog & database. If we introduce 'explode'
> built-in function in 1.10, the user has to fully qualify the function.
>
> Best,
>
> Dawid
> On 04/09/2019 15:19, Timo Walther wrote:
>
> Hi all,
>
> thanks for the healthy discussion. It is already a very long discussion
> with a lot of text. So I will just post my opinion to a couple of
> statements:
>
>
> Hive built-in functions are not part of Flink built-in functions, they
>
> are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored in
> every Hive metastore catalog that is freshly created but are a set of
> functions that are listed somewhere and made available.
>
>
> ambiguous functions reference just shouldn't be resolved to a different
>
> catalog
>
> I agree. They should not be resolved to a different catalog. That's why I
> am suggesting to split the concept of built-in functions and catalog
>
> lookup
>
> semantics.
>
>
> I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions, both
>
> in
>
> the root of the Presto source."
>
>
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions as
> built-ins but we will end-up in a dependency hell in the near future if
>
> we
>
> don't introduce a pluggable approach. Library functions is what you also
> suggest but storing them in a catalog means to always fully qualify them
>
> or
>
> modifying the existing catalog design that was inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
>
> why would users name a temporary function the same as a built-in
>
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
>
> I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and Hive
> is just one of many systems that we can support. Not every built-in
> function behaves and will behave exactly like Hive.
>
>
> regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as a
> reference. I don't think that a user can overwrite a built-in function
> there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we
>
> will
>
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that
>
> support
>
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
>
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>
> )
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and
>
> must
>
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need
>
> a
>
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that Bowen's original proposal was trying to avoid this, but this
>
> could
>
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in
>
> function
>
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
>
> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>
> the
>
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach,
>
> let's
>
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So
>
> user
>
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this
>
> is a
>
> very popular function
> and add it into Flink's built-in functions, but with different
>
> behavior
>
> as
>
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly
>
> what
>
> kind of
> built-in functions he would like to use. He can just tell framework
>
> to
>
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them,
>
> but
>
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>
> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>
> section
>
> in the google doc was updated, please take a look first and let me
>
> know
>
> if
>
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions
>
> can
>
> only
>
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but
>
> not
>
> functions. It would be quite convinient for users to use Hive
>
> built-ins
>
> even if they use a Confluent schema registry or just the in-memory
>
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink
>
> built-in
>
> functions, they are catalog functions, thus if the current catalog
>
> is
>
> not a
>
> HiveCatalog but, say, a schema registry catalog, ambiguous
>
> functions
>
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced
>
> across
>
> catalog, but it doesn't have db namespace and we currently just
>
> don't
>
> have
>
> a SQL syntax for it. It can be enabled when such a SQL syntax is
>
> defined,
>
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and
>
> built-in
>
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>
> maybe
>
> we add more experimental functions in the future or function for
>
> some
>
> special application area (Geo functions, ML functions). A data
>
> platform
>
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have
>
> the
>
> "external built-in functions" anymore and currently the built-in
>
> function
>
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about
>
> it:
>
> - I don't know if any other databases handle built-in functions
>
> like
>
> that.
>
> Maybe you can give some examples? IMHO, built-in functions are
>
> system
>
> info
>
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some
>
> other
>
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by
>
> other
>
> approaches too. E.g. experimental functions can be taken good care
>
> of
>
> by
>
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable
>
> built-in
>
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g.
>
> how
>
> do
>
> you handle built-in functions in different modules but different
>
> names?
>
> In short, I'm not sure if it really stands and it looks like an
>
> overkill
>
> to me. I'd rather not go to that route. Related discussion can be
>
> on
>
> its
>
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a
>
> static
>
> list like in BuiltInFunctionDefinitions, a platform team should be
>
> able
>
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree
>
> with
>
> Kurt
>
> that we should unify built-in function (external or internal)
>
> under a
>
> common layer. However, the resolution order should be:
>    1. built-in functions
>    2. temporary functions
>    3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's
>
> built-in
>
> functions. If you take a look at other vendors, like SQL Server
>
> they
>
> also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external
>
> or
>
> internal) under a common layer.“ <- I don't think this is what Kurt
>
> means.
>
> Kurt and I are in favor of unifying built-in functions of external
>
> systems
>
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed.
>
> Temporary
>
> functions have a lifespan over a session and are only visible to
>
> the
>
> session owner, they are unique to each user, and users create them
>
> on
>
> purpose to be the highest priority in order to overwrite system
>
> info
>
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same
>
> as a
>
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we
>
> should
>
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change
>
> their
>
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will
>
> lead
>
> to
>
> changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special
>
> catalog
>
> functions.
>
> Re> Another question is, does this fallback include all
>
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super
> critical to the scope of this FLIP, thus I'd like to leave it to
>
> future
>
> efforts as a nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them
>
> both
>
> under
>
> "catalog" concept, by extending catalog function to make it have
>
> ability to
>
> have built-in catalog functions. Some benefits I can see from
>
> this
>
> approach:
>
> 1. We don't have to introduce new concept like external built-in
>
> functions.
>
> Actually I don't see a full story about how to treat a built-in
>
> functions, and it
>
> seems a little bit disrupt with catalog. As a result, you have
>
> to
>
> make
>
> some restriction
>
> like "hive built-in functions can only be used when current
>
> catalog
>
> is
>
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some
>
> part
>
> of
>
> the doc. I've modified those sections, and they are up to date
>
> now.
>
> In short, now built-in function of external systems are defined
>
> as
>
> a
>
> special kind of catalog function in Flink, and handled by Flink
>
> as
>
> following:
> - An external built-in function must be associated with a catalog
>
> for
>
> the purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous
>
> function
>
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a
>
> schema/database
>
> namespace
> - It goes thru the same instantiation logic as other user defined
> catalog functions in the external system
>
> Please take another look at the doc, and let me know if you have
>
> more
>
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org> <tw...@apache.org>
>
> wrote:
>
> Hi Kurt,
>
> it should not affect the functions and operations we currently
>
> have
>
> in
>
> SQL. It just categorizes the available built-in functions. It is
>
> kind
>
> of
> an orthogonal concept to the catalog API but built-in functions
>
> deserve
>
> this special kind of treatment. CatalogFunction still fits
>
> perfectly
>
> in
>
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way
>
> but
>
> with
>
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
>
> Does this only affect the functions and operations we currently
>
> have
>
> in SQL
>
> and
> have no effect on tables, right? Looks like this is an
>
> orthogonal
>
> concept
>
> with Catalog?
> If the answer are both yes, then the catalog function will be a
>
> weird
>
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
>
> wrote:
>
> The way you proposed are basically the same as what Calcite
>
> does, I
>
> think
>
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>
> ,写道:
>
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
>
> Thanks Bowen for bring up this topic, I think it’s a useful
>
> refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and
>
> operators
>
> of Hive, here is a solution from Apache Calcite, the Calcite
>
> way
>
> is
>
> to make
>
> every dialect operators a “Library”, user can specify which
>
> libraries they
>
> want to use for a sql query. The builtin operators always
>
> comes
>
> as
>
> the
>
> first class objects and the others are used from the order
>
> they
>
> appears.
>
> Maybe you can take a reference.
>
> [1]
>
>
>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>
> ,写道:
>
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's
>
> FunctionCatalog.
>
> It's critically helpful to improve function usability in
>
> SQL.
>
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with
>
> fully/partially
>
> qualified name
> - redefines function resolution order for ambiguous
>
> function
>
> reference
>
> - adds support for Hive's rich built-in functions (support
>
> for
>
> Hive
>
> user
>
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
>
>
>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Xuefu,

Thank you for your answers.

Let me summarize my understanding. In principle we differ only in
regards to the fact if a temporary function can be only 1-part or only
3-part identified. I can reconfirm that if the community decides it
prefers the 1-part approach I will commit to that, with the assumption
that we will force ONLY 1-part function names. (We will parse identifier
and throw exception if a user tries to register e.g. db.temp_func).

My preference is though the 3-part approach:

  * there are some functions that it makes no sense to override, e.g.
    CAST, moreover I'm afraid that allowing overriding such will lead to
    high inconsistency, similar to those that I mentioned spark has
  * you cannot shadow a fully-qualified function. (If a user fully
    qualifies his/her objects in a SQL query, which is often considered
    a good practice)
  * it does not differentiate between functions & temporary functions.
    Temporary functions just differ with regards to their life-cycle.
    The registration & usage is exactly the same.

As it can be seen, the proposed concept regarding temp function and
function resolution is quite simple.

Both approaches are equally simple. I would even say the 3-part approach
is slightly simpler as it does not have to care about some special
built-in functions such as CAST.

I don't want to express my opinion on the differentiation between
built-in functions and "external" built-in functions in this thread as
it is rather orthogonal, but I also like the modular approach and I
definitely don't like the special syntax "cat::function". I think it's
better to stick to a standard or at least other proved solutions from
other systems.

Best,

Dawid

On 05/09/2019 10:12, Xuefu Z wrote:
> Hi David,
>
> Thanks for sharing your thoughts and  request for clarifications. I believe
> that I fully understood your proposal, which does has its merit. However,
> it's different from ours. Here are the answers to your questions:
>
> Re #1: yes, the temp functions in the proposal are global and have just
> one-part names, similar to built-in functions. Two- or three-part names are
> not allowed.
>
> Re #2: not applicable as two- or three-part names are disallowed.
>
> Re #3: same as above. Referencing external built-in functions is achieved
> either implicitly (only the built-in functions in the current catalogs are
> considered) or via special syntax such as cat::function. However, we are
> looking into the modular approach that Time suggested with other feedback
> received from the community.
>
> Re #4: the resolution order goes like the following in our proposal:
>
> 1. temporary functions
> 2. bulit-in functions (including those augmented by add-on modules)
> 3. built-in functions in current catalog (this will not be needed if the
> special syntax "cat::function" is required)
> 4. functions in current catalog and db.
>
> If we go with the modular approach and make external built-in functions as
> an add-on module, the 2 and 3 above will be combined. In essence, the
> resolution order is equivalent in the two approaches.
>
> By the way, resolution order matters only for simple name reference. For
> names such as db.function (interpreted as current_cat/db/function) or
> cat.db.function, the reference is unambiguous, so on resolution is needed.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple. Additionally, the proposed resolution
> order allows temp function to shadow a built-in function, which is
> important (though not decisive) in our opinion.
>
> I started liking the modular approach as the resolution order will only
> include 1, 2, and 4, which is simpler and more generic. That's why I
> suggested we look more into this direction.
>
> Please let me know if there are further questions.
>
> Thanks,
> Xuefu
>
>
>
>
> On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hi Xuefu,
>>
>> Just wanted to summarize my opinion on the one topic (temporary functions).
>>
>> My preference would be to make temporary functions always 3-part qualified
>> (as a result that would prohibit overriding built-in functions). Having
>> said that if the community decides that it's better to allow overriding
>> built-in functions I am fine with it and can commit to that decision.
>>
>> I wanted to ask if you could clarify a few points for me around that
>> option.
>>
>>    1. Would you enforce temporary functions to be always just a single
>>    name (without db & cat) as hive does, or would you allow also 3 or even 2
>>    part identifiers?
>>    2. Assuming 2/3-part paths. How would you register a function from a
>>    following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>>    all functions named 'func' in all databases named 'db' in all catalogs? Or
>>    would you shadow only function 'func' in database 'db' in current catalog?
>>    3. This point is still under discussion, but was mentioned a few
>>    times, that maybe we want to enable syntax cat.func for "external built-in
>>    functions". How would that affect statement from previous point? Would
>>    'db.func' shadow "external built-in function" in 'db' catalog or user
>>    functions as in point 2? Or maybe both?
>>    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>>    paths. Would the function resolution be actually as follows?:
>>       1. temporary functions (1-part path)
>>       2. built-in functions
>>       3. temporary functions (2-part path)
>>       4. 2-part catalog functions a.k.a. "external built-in functions"
>>       (cat + func) - this is still under discussion, if we want that in the other
>>       focal point
>>       5. temporary functions (3-part path)
>>       6. 3-part catalog functions a.k.a. user functions
>>
>> I would be really grateful if you could explain me those questions, thanks.
>>
>> BTW, Thank you all for a healthy discussion.
>>
>> Best,
>>
>> Dawid
>> On 04/09/2019 23:25, Xuefu Z wrote:
>>
>> Thank all for the sharing thoughts. I think we have gathered some useful
>> initial feedback from this long discussion with a couple of focal points
>> sticking out.
>>
>>  We will go back to do more research and adapt our proposal. Once it's
>> ready, we will ask for a new round of review. If there is any disagreement,
>> we will start a new discussion thread on each rather than having a mega
>> discussion like this.
>>
>> Thanks to everyone for participating.
>>
>> Regards,
>> Xuefu
>>
>>
>> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> wrote:
>>
>>
>> Let me try to summarize and conclude the long thread so far:
>>
>> 1. For order of temp function v.s. built-in function:
>>
>> I think Dawid's point that temp function should be of fully qualified path
>> is a better reasoning to back the newly proposed order, and i agree we
>> don't need to follow Hive/Spark.
>>
>> However, I'd rather not change fundamentals of temporary functions in this
>> FLIP. It belongs to a bigger story of how temporary objects should be
>> redefined and be handled uniformly - currently temporary tables and views
>> (those registered from TableEnv#registerTable()) behave different than what
>> Dawid propose for temp functions, and we need a FLIP to just unify their
>> APIs and behaviors.
>>
>> I agree that backward compatibility is not an issue w.r.t Jark's points.
>>
>> ***Seems we do have consensus that it's acceptable to prevent users
>> registering a temp function in the same name as a built-in function. To
>> help us move forward, I'd like to propose setting such a restraint on temp
>> functions in this FLIP to simplify the design and avoid disputes.*** It
>> will also leave rooms for improvements in the future.
>>
>>
>> 2. For Hive built-in function:
>>
>> Thanks Timo for providing the Presto and Postgres examples. I feel modular
>> built-in functions can be a good fit for the geo and ml example as a native
>> Flink extension, but not sure if it fits well with external integrations.
>> Anyway, I think modular built-in functions is a bigger story and can be on
>> its own thread too, and our proposal doesn't prevent Flink from doing that
>> in the future.
>>
>> ***Seems we have consensus that users should be able to use built-in
>> functions of Hive or other external systems in SQL explicitly and
>> deterministically regardless of Flink built-in functions and the potential
>> modular built-in functions, via some new syntax like "mycat::func"? If so,
>> I'd like to propose removing Hive built-in functions from ambiguous
>> function resolution order, and empower users with such a syntax. This way
>> we sacrifice a little convenience for certainty***
>>
>>
>> What do you think?
>>
>> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
>> wrote:
>>
>>
>> Hi,
>>
>> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
>> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>>
>> are
>>
>> very inconsistent in that manner (spark being way worse on that).
>>
>> Hive:
>>
>> You cannot overwrite all the built-in functions. I could overwrite most
>>
>> of
>>
>> the functions I tried e.g. length, e, pi, round, rtrim, but there are
>> functions I cannot overwrite e.g. CAST, ARRAY I get:
>>
>>
>> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>>
>> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
>> *map* or *struct*. Though hive behaves reasonable well if I manage to
>> overwrite a function. When I drop the temporary function the native
>> function is still available.
>>
>> Spark:
>>
>> Spark's behavior imho is super bad.
>>
>> Theoretically I could overwrite all functions. I was able e.g. to
>> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
>> FUNCTION syntax. Otherwise I get an exception that a function already
>> exists. However when I used the CAST function in a query it used the
>> native, built-in one.
>>
>> When I overwrote current_date() function, it was used in a query, but it
>> completely replaces the built-in function and I can no longer use the
>> native function in any way. I cannot also drop the temporary function. I
>> get:
>>
>> *    Error in query: Cannot drop native function 'current_date';*
>>
>> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
>> with a database. Temporary functions are always represented as a single
>> name.
>>
>> In my opinion neither of the systems have consistent behavior. Generally
>> speaking I think overwriting any system provided functions is just
>> dangerous.
>>
>> Regarding Jark's concerns. Such functions would be registered in a
>>
>> current
>>
>> catalog/database schema, so a user could still use its own function, but
>> would have to fully qualify the function (because built-in functions take
>> precedence). Moreover users would have the same problem with permanent
>> functions. Imagine a user have a permanent function 'cat.db.explode'. In
>> 1.9 the user could use just the 'explode' function as long as the 'cat' &
>> 'db' were the default catalog & database. If we introduce 'explode'
>> built-in function in 1.10, the user has to fully qualify the function.
>>
>> Best,
>>
>> Dawid
>> On 04/09/2019 15:19, Timo Walther wrote:
>>
>> Hi all,
>>
>> thanks for the healthy discussion. It is already a very long discussion
>> with a lot of text. So I will just post my opinion to a couple of
>> statements:
>>
>>
>> Hive built-in functions are not part of Flink built-in functions, they
>>
>> are catalog functions
>>
>> That is not entirely true. Correct me if I'm wrong but I think Hive
>> built-in functions are also not catalog functions. They are not stored in
>> every Hive metastore catalog that is freshly created but are a set of
>> functions that are listed somewhere and made available.
>>
>>
>> ambiguous functions reference just shouldn't be resolved to a different
>>
>> catalog
>>
>> I agree. They should not be resolved to a different catalog. That's why I
>> am suggesting to split the concept of built-in functions and catalog
>>
>> lookup
>>
>> semantics.
>>
>>
>> I don't know if any other databases handle built-in functions like that
>>
>> What I called "module" is:
>> - Extension in Postgres [1]
>> - Plugin in Presto [2]
>>
>> Btw. Presto even mentions example modules that are similar to the ones
>> that we will introduce in the near future both for ML and System XYZ
>> compatibility:
>> "See either the presto-ml module for machine learning functions or the
>> presto-teradata-functions module for Teradata-compatible functions, both
>>
>> in
>>
>> the root of the Presto source."
>>
>>
>> functions should be either built-in already or just libraries
>>
>> functions,
>>
>> and library functions can be adapted to catalog APIs or of some other
>> syntax to use
>>
>> Regarding "built-in already", of course we can add a lot of functions as
>> built-ins but we will end-up in a dependency hell in the near future if
>>
>> we
>>
>> don't introduce a pluggable approach. Library functions is what you also
>> suggest but storing them in a catalog means to always fully qualify them
>>
>> or
>>
>> modifying the existing catalog design that was inspired by the standard.
>>
>> I don't think "it brings in even more complicated scenarios to the
>> design", it just does clear separation of concerns. Integrating the
>> functionality into the current design makes the catalog API more
>> complicated.
>>
>>
>> why would users name a temporary function the same as a built-in
>>
>> function then?
>>
>> Because you never know what users do. If they don't, my suggested
>> resolution order should not be a problem, right?
>>
>>
>> I don't think hive functions deserves be a function module
>>
>> Our goal is not to create a Hive clone. We need to think forward and Hive
>> is just one of many systems that we can support. Not every built-in
>> function behaves and will behave exactly like Hive.
>>
>>
>> regarding temporary functions, there are few systems that support it
>>
>> IMHO Spark and Hive are not always the best examples for consistent
>> design. Systems like Postgres, Presto, or SQL Server should be used as a
>> reference. I don't think that a user can overwrite a built-in function
>> there.
>>
>> Regards,
>> Timo
>>
>> [1] https://www.postgresql.org/docs/10/extend-extensions.html
>> [2] https://prestodb.github.io/docs/current/develop/functions.html
>>
>>
>> On 04.09.19 13:44, Jark Wu wrote:
>>
>> Hi all,
>>
>> Regarding #1 temp function <> built-in function and naming.
>> I'm fine with temp functions should precede built-in function and can
>> override built-in functions (we already support to override built-in
>> function in 1.9).
>> If we don't allow the same name as a built-in function, I'm afraid we
>>
>> will
>>
>> have compatibility issues in the future.
>> Say users register a user defined function named "explode" in 1.9, and we
>> support a built-in "explode" function in 1.10.
>> Then the user's jobs which call the registered "explode" function in 1.9
>> will all fail in 1.10 because of naming conflict.
>>
>> Regarding #2 "External" built-in functions.
>> I think if we store external built-in functions in catalog, then
>> "hive1::sqrt" is a good way to go.
>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>> built-in functions as Timo suggested above.
>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>> function set as built-in functions in an easy way.
>>
>> Best,
>> Jark
>>
>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>>
>> Hi David,
>>
>> Thank you for sharing your findings. It seems to me that there is no SQL
>> standard regarding temporary functions. There are few systems that
>>
>> support
>>
>> it. Here are what I have found:
>>
>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>> 2. Spark: basically follows Hive (
>>
>>
>>
>>
>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>
>> )
>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
>> behavior. (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>>
>> )
>>
>> Because of lack of standard, it's perfectly fine for Flink to define
>> whatever it sees appropriate. Thus, your proposal (no overwriting and
>>
>> must
>>
>> have DB as holder) is one option. The advantage is simplicity, The
>> downside
>> is the deviation from Hive, which is popular and de facto standard in big
>> data world.
>>
>> However, I don't think we have to follow Hive. More importantly, we need
>>
>> a
>>
>> consensus. I have no objection if your proposal is generally agreed upon.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakowicz@apache.org
>> <dw...@apache.org> <dw...@apache.org>
>> wrote:
>>
>> Hi all,
>>
>> Just an opinion on the built-in <> temporary functions resolution and
>> NAMING issue. I think we should not allow overriding the built-in
>> functions, as this may pose serious issues and to be honest is rather
>> not feasible and would require major rework. What happens if a user
>> wants to override CAST? Calls to that function are generated at
>> different layers of the stack that unfortunately does not always go
>> through the Catalog API (at least yet). Moreover from what I've checked
>> no other systems allow overriding the built-in functions. All the
>> systems I've checked so far register temporary functions in a
>> database/schema (either special database for temporary functions, or
>> just current database). What I would suggest is to always register
>> temporary functions with a 3 part identifier. The same way as tables,
>> views etc. This effectively means you cannot override built-in
>> functions. With such approach it is natural that the temporary functions
>> end up a step lower in the resolution order:
>>
>> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>>
>> 2. temporary functions (always 3 part path)
>>
>> 3. catalog functions (always 3 part path)
>>
>> Let me know what do you think.
>>
>> Best,
>>
>> Dawid
>>
>> On 04/09/2019 06:13, Bowen Li wrote:
>>
>> Hi,
>>
>> I agree with Xuefu that the main controversial points are mainly the
>>
>> two
>>
>> places. My thoughts on them:
>>
>> 1) Determinism of referencing Hive built-in functions. We can either
>>
>> remove
>>
>> Hive built-in functions from ambiguous function resolution and require
>> users to use special syntax for their qualified names, or add a config
>>
>> flag
>>
>> to catalog constructor/yaml for turning on and off Hive built-in
>>
>> functions
>>
>> with the flag set to 'false' by default and proper doc added to help
>>
>> users
>>
>> make their decisions.
>>
>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>
>> function
>>
>> resolution order. We believe Flink temp functions should precede Flink
>> built-in functions, and I have presented my reasons. Just in case if we
>> cannot reach an agreement, I propose forbid users registering temp
>> functions in the same name as a built-in function, like MySQL's
>>
>> approach,
>>
>> for the moment. It won't have any performance concern, since built-in
>> functions are all in memory and thus cost of a name check will be
>>
>> really
>>
>> trivial.
>>
>>
>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>>
>>  From what I have seen, there are a couple of focal disagreements:
>>
>> 1. Resolution order: temp function --> flink built-in function -->
>>
>> catalog
>>
>> function vs flink built-in function --> temp function -> catalog
>>
>> function.
>>
>> 2. "External" built-in functions: how to treat built-in functions in
>> external system and how users reference them
>>
>> For #1, I agree with Bowen that temp function needs to be at the
>>
>> highest
>>
>> priority because that's how a user might overwrite a built-in function
>> without referencing a persistent, overwriting catalog function with a
>>
>> fully
>>
>> qualified name. Putting built-in functions at the highest priority
>> eliminates that usage.
>>
>> For #2, I saw a general agreement on referencing "external" built-in
>> functions such as those in Hive needs to be explicit and deterministic
>>
>> even
>>
>> though different approaches are proposed. To limit the scope and
>>
>> simply
>>
>> the
>>
>> usage, it seems making sense to me to introduce special syntax for
>>
>> user  to
>>
>> explicitly reference an external built-in function such as hive1::sqrt
>>
>> or
>>
>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>>
>> call
>>
>> hive1.getFunction(ObjectPath functionName) where the database name is
>> absent for bulit-in functions available in that catalog hive1. I
>>
>> understand
>>
>> that Bowen's original proposal was trying to avoid this, but this
>>
>> could
>>
>> turn out to be a clean and simple solution.
>>
>> (Timo's modular approach is great way to "expand" Flink's built-in
>>
>> function
>>
>> set, which seems orthogonal and complementary to this, which could be
>> tackled in further future work.)
>>
>> I'd be happy to hear further thoughts on the two points.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
>>
>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>>
>> the
>>
>> same
>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>> suggestion.
>>
>> The reason is backward compatibility. If we follow Bowen's approach,
>>
>> let's
>>
>> say we
>> first find function in Flink's built-in functions, and then hive's
>> built-in. For example, `foo`
>> is not supported by Flink, but hive has such built-in function. So
>>
>> user
>>
>> will have hive's
>> behavior for function `foo`. And in next release, Flink realize this
>>
>> is a
>>
>> very popular function
>> and add it into Flink's built-in functions, but with different
>>
>> behavior
>>
>> as
>>
>> hive's. So in next
>> release, the behavior changes.
>>
>> With Timo's approach, IIUC user have to tell the framework explicitly
>>
>> what
>>
>> kind of
>> built-in functions he would like to use. He can just tell framework
>>
>> to
>>
>> abandon Flink's built-in
>> functions, and use hive's instead. User can only choose between them,
>>
>> but
>>
>> not use
>> them at the same time. I think this approach is more predictable.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>>
>> Hi all,
>>
>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>>
>> section
>>
>> in the google doc was updated, please take a look first and let me
>>
>> know
>>
>> if
>>
>> you have more questions.
>>
>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Timo,
>>
>> Re> 1) We should not have the restriction "hive built-in functions
>>
>> can
>>
>> only
>>
>> be used when current catalog is hive catalog". Switching a catalog
>> should only have implications on the cat.db.object resolution but
>>
>> not
>>
>> functions. It would be quite convinient for users to use Hive
>>
>> built-ins
>>
>> even if they use a Confluent schema registry or just the in-memory
>>
>> catalog.
>>
>> There might be a misunderstanding here.
>>
>> First of all, Hive built-in functions are not part of Flink
>>
>> built-in
>>
>> functions, they are catalog functions, thus if the current catalog
>>
>> is
>>
>> not a
>>
>> HiveCatalog but, say, a schema registry catalog, ambiguous
>>
>> functions
>>
>> reference just shouldn't be resolved to a different catalog.
>>
>> Second, Hive built-in functions can potentially be referenced
>>
>> across
>>
>> catalog, but it doesn't have db namespace and we currently just
>>
>> don't
>>
>> have
>>
>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>
>> defined,
>>
>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>
>> 2) I would propose to have separate concepts for catalog and
>>
>> built-in
>>
>> functions. In particular it would be nice to modularize built-in
>> functions. Some built-in functions are very crucial (like AS, CAST,
>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>
>> maybe
>>
>> we add more experimental functions in the future or function for
>>
>> some
>>
>> special application area (Geo functions, ML functions). A data
>>
>> platform
>>
>> team might not want to make every built-in function available. Or a
>> function module like ML functions is in a different Maven module.
>>
>> I think this is orthogonal to this FLIP, especially we don't have
>>
>> the
>>
>> "external built-in functions" anymore and currently the built-in
>>
>> function
>>
>> category remains untouched.
>>
>> But just to share some thoughts on the proposal, I'm not sure about
>>
>> it:
>>
>> - I don't know if any other databases handle built-in functions
>>
>> like
>>
>> that.
>>
>> Maybe you can give some examples? IMHO, built-in functions are
>>
>> system
>>
>> info
>>
>> and should be deterministic, not depending on loaded libraries. Geo
>> functions should be either built-in already or just libraries
>>
>> functions,
>>
>> and library functions can be adapted to catalog APIs or of some
>>
>> other
>>
>> syntax to use
>> - I don't know if all use cases stand, and many can be achieved by
>>
>> other
>>
>> approaches too. E.g. experimental functions can be taken good care
>>
>> of
>>
>> by
>>
>> documentations, annotations, etc
>> - the proposal basically introduces some concept like a pluggable
>>
>> built-in
>>
>> function catalog, despite the already existing catalog APIs
>> - it brings in even more complicated scenarios to the design. E.g.
>>
>> how
>>
>> do
>>
>> you handle built-in functions in different modules but different
>>
>> names?
>>
>> In short, I'm not sure if it really stands and it looks like an
>>
>> overkill
>>
>> to me. I'd rather not go to that route. Related discussion can be
>>
>> on
>>
>> its
>>
>> own thread.
>>
>> 3) Following the suggestion above, we can have a separate discovery
>> mechanism for built-in functions. Instead of just going through a
>>
>> static
>>
>> list like in BuiltInFunctionDefinitions, a platform team should be
>>
>> able
>>
>> to select function modules like
>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>> HiveFunctions) or via service discovery;
>>
>> Same as above. I'll leave it to its own thread.
>>
>> re > 3) Dawid and I discussed the resulution order again. I agree
>>
>> with
>>
>> Kurt
>>
>> that we should unify built-in function (external or internal)
>>
>> under a
>>
>> common layer. However, the resolution order should be:
>>    1. built-in functions
>>    2. temporary functions
>>    3. regular catalog resolution logic
>> Otherwise a temporary function could cause clashes with Flink's
>>
>> built-in
>>
>> functions. If you take a look at other vendors, like SQL Server
>>
>> they
>>
>> also do not allow to overwrite built-in functions.
>>
>> ”I agree with Kurt that we should unify built-in function (external
>>
>> or
>>
>> internal) under a common layer.“ <- I don't think this is what Kurt
>>
>> means.
>>
>> Kurt and I are in favor of unifying built-in functions of external
>>
>> systems
>>
>> and catalog functions. Did you type a mistake?
>>
>> Besides, I'm not sure about the resolution order you proposed.
>>
>> Temporary
>>
>> functions have a lifespan over a session and are only visible to
>>
>> the
>>
>> session owner, they are unique to each user, and users create them
>>
>> on
>>
>> purpose to be the highest priority in order to overwrite system
>>
>> info
>>
>> (built-in functions in this case).
>>
>> In your case, why would users name a temporary function the same
>>
>> as a
>>
>> built-in function then? Since using that name in ambiguous function
>> reference will always be resolved to built-in functions, creating a
>> same-named temp function would be meaningless in the end.
>>
>>
>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Jingsong,
>>
>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>
>> should
>>
>> not introduce interfaces to influence the framework. To make
>> Flink itself more powerful, we should implement the functions
>> we need to add.
>>
>> Yes, please see the doc.
>>
>> Re> 2.Non-flink built-in functions are easy for users to change
>>
>> their
>>
>> behavior. If we support some flink built-in functions in the
>> future but act differently from non-flink built-in, this will
>>
>> lead
>>
>> to
>>
>> changes in user behavior.
>>
>> There's no such concept as "external built-in functions" any more.
>> Built-in functions of external systems will be treated as special
>>
>> catalog
>>
>> functions.
>>
>> Re> Another question is, does this fallback include all
>>
>> hive built-in functions? As far as I know, some hive functions
>> have some hacky. If possible, can we start with a white list?
>> Once we implement some functions to flink built-in, we can
>> also update the whitelist.
>>
>> Yes, that's something we thought of too. I don't think it's super
>> critical to the scope of this FLIP, thus I'd like to leave it to
>>
>> future
>>
>> efforts as a nice-to-have feature.
>>
>>
>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>>
>> wrote:
>>
>> Hi Kurt,
>>
>> Re: > What I want to propose is we can merge #3 and #4, make them
>>
>> both
>>
>> under
>>
>> "catalog" concept, by extending catalog function to make it have
>>
>> ability to
>>
>> have built-in catalog functions. Some benefits I can see from
>>
>> this
>>
>> approach:
>>
>> 1. We don't have to introduce new concept like external built-in
>>
>> functions.
>>
>> Actually I don't see a full story about how to treat a built-in
>>
>> functions, and it
>>
>> seems a little bit disrupt with catalog. As a result, you have
>>
>> to
>>
>> make
>>
>> some restriction
>>
>> like "hive built-in functions can only be used when current
>>
>> catalog
>>
>> is
>>
>> hive catalog".
>>
>> Yes, I've unified #3 and #4 but it seems I didn't update some
>>
>> part
>>
>> of
>>
>> the doc. I've modified those sections, and they are up to date
>>
>> now.
>>
>> In short, now built-in function of external systems are defined
>>
>> as
>>
>> a
>>
>> special kind of catalog function in Flink, and handled by Flink
>>
>> as
>>
>> following:
>> - An external built-in function must be associated with a catalog
>>
>> for
>>
>> the purpose of decoupling flink-table and external systems.
>> - It always resides in front of catalog functions in ambiguous
>>
>> function
>>
>> reference order, just like in its own external system
>> - It is a special catalog function that doesn’t have a
>>
>> schema/database
>>
>> namespace
>> - It goes thru the same instantiation logic as other user defined
>> catalog functions in the external system
>>
>> Please take another look at the doc, and let me know if you have
>>
>> more
>>
>> questions.
>>
>>
>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <tw...@apache.org> <tw...@apache.org>
>>
>> wrote:
>>
>> Hi Kurt,
>>
>> it should not affect the functions and operations we currently
>>
>> have
>>
>> in
>>
>> SQL. It just categorizes the available built-in functions. It is
>>
>> kind
>>
>> of
>> an orthogonal concept to the catalog API but built-in functions
>>
>> deserve
>>
>> this special kind of treatment. CatalogFunction still fits
>>
>> perfectly
>>
>> in
>>
>> there because the regular catalog object resolution logic is not
>> affected. So tables and functions are resolved in the same way
>>
>> but
>>
>> with
>>
>> built-in functions that have priority as in the original design.
>>
>> Regards,
>> Timo
>>
>>
>> On 03.09.19 15:26, Kurt Young wrote:
>>
>> Does this only affect the functions and operations we currently
>>
>> have
>>
>> in SQL
>>
>> and
>> have no effect on tables, right? Looks like this is an
>>
>> orthogonal
>>
>> concept
>>
>> with Catalog?
>> If the answer are both yes, then the catalog function will be a
>>
>> weird
>>
>> concept?
>>
>> Best,
>> Kurt
>>
>>
>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>> yuzhao.cyz@gmail.com
>>
>> wrote:
>>
>> The way you proposed are basically the same as what Calcite
>>
>> does, I
>>
>> think
>>
>> we are in the same line.
>>
>> Best,
>> Danny Chan
>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>>
>> ,写道:
>>
>> This sounds exactly as the module approach I mentioned, no?
>>
>> Regards,
>> Timo
>>
>> On 03.09.19 13:42, Danny Chan wrote:
>>
>> Thanks Bowen for bring up this topic, I think it’s a useful
>>
>> refactoring to make our function usage more user friendly.
>>
>> For the topic of how to organize the builtin operators and
>>
>> operators
>>
>> of Hive, here is a solution from Apache Calcite, the Calcite
>>
>> way
>>
>> is
>>
>> to make
>>
>> every dialect operators a “Library”, user can specify which
>>
>> libraries they
>>
>> want to use for a sql query. The builtin operators always
>>
>> comes
>>
>> as
>>
>> the
>>
>> first class objects and the others are used from the order
>>
>> they
>>
>> appears.
>>
>> Maybe you can take a reference.
>>
>> [1]
>>
>>
>>
>>
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>
>> Best,
>> Danny Chan
>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>>
>> ,写道:
>>
>> Hi folks,
>>
>> I'd like to kick off a discussion on reworking Flink's
>>
>> FunctionCatalog.
>>
>> It's critically helpful to improve function usability in
>>
>> SQL.
>>
>>
>>
>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>
>> In short, it:
>> - adds support for precise function reference with
>>
>> fully/partially
>>
>> qualified name
>> - redefines function resolution order for ambiguous
>>
>> function
>>
>> reference
>>
>> - adds support for Hive's rich built-in functions (support
>>
>> for
>>
>> Hive
>>
>> user
>>
>> defined functions was already added in 1.9.0)
>> - clarifies the concept of temporary functions
>>
>> Would love to hear your thoughts.
>>
>> Bowen
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>
>>
>>
>>
>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi David,

Thanks for sharing your thoughts and  request for clarifications. I believe
that I fully understood your proposal, which does has its merit. However,
it's different from ours. Here are the answers to your questions:

Re #1: yes, the temp functions in the proposal are global and have just
one-part names, similar to built-in functions. Two- or three-part names are
not allowed.

Re #2: not applicable as two- or three-part names are disallowed.

Re #3: same as above. Referencing external built-in functions is achieved
either implicitly (only the built-in functions in the current catalogs are
considered) or via special syntax such as cat::function. However, we are
looking into the modular approach that Time suggested with other feedback
received from the community.

Re #4: the resolution order goes like the following in our proposal:

1. temporary functions
2. bulit-in functions (including those augmented by add-on modules)
3. built-in functions in current catalog (this will not be needed if the
special syntax "cat::function" is required)
4. functions in current catalog and db.

If we go with the modular approach and make external built-in functions as
an add-on module, the 2 and 3 above will be combined. In essence, the
resolution order is equivalent in the two approaches.

By the way, resolution order matters only for simple name reference. For
names such as db.function (interpreted as current_cat/db/function) or
cat.db.function, the reference is unambiguous, so on resolution is needed.

As it can be seen, the proposed concept regarding temp function and
function resolution is quite simple. Additionally, the proposed resolution
order allows temp function to shadow a built-in function, which is
important (though not decisive) in our opinion.

I started liking the modular approach as the resolution order will only
include 1, 2, and 4, which is simpler and more generic. That's why I
suggested we look more into this direction.

Please let me know if there are further questions.

Thanks,
Xuefu




On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi Xuefu,
>
> Just wanted to summarize my opinion on the one topic (temporary functions).
>
> My preference would be to make temporary functions always 3-part qualified
> (as a result that would prohibit overriding built-in functions). Having
> said that if the community decides that it's better to allow overriding
> built-in functions I am fine with it and can commit to that decision.
>
> I wanted to ask if you could clarify a few points for me around that
> option.
>
>    1. Would you enforce temporary functions to be always just a single
>    name (without db & cat) as hive does, or would you allow also 3 or even 2
>    part identifiers?
>    2. Assuming 2/3-part paths. How would you register a function from a
>    following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>    all functions named 'func' in all databases named 'db' in all catalogs? Or
>    would you shadow only function 'func' in database 'db' in current catalog?
>    3. This point is still under discussion, but was mentioned a few
>    times, that maybe we want to enable syntax cat.func for "external built-in
>    functions". How would that affect statement from previous point? Would
>    'db.func' shadow "external built-in function" in 'db' catalog or user
>    functions as in point 2? Or maybe both?
>    4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>    paths. Would the function resolution be actually as follows?:
>       1. temporary functions (1-part path)
>       2. built-in functions
>       3. temporary functions (2-part path)
>       4. 2-part catalog functions a.k.a. "external built-in functions"
>       (cat + func) - this is still under discussion, if we want that in the other
>       focal point
>       5. temporary functions (3-part path)
>       6. 3-part catalog functions a.k.a. user functions
>
> I would be really grateful if you could explain me those questions, thanks.
>
> BTW, Thank you all for a healthy discussion.
>
> Best,
>
> Dawid
> On 04/09/2019 23:25, Xuefu Z wrote:
>
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> wrote:
>
>
> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
> wrote:
>
>
> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>
> are
>
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most
>
> of
>
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> *map* or *struct*. Though hive behaves reasonable well if I manage to
> overwrite a function. When I drop the temporary function the native
> function is still available.
>
> Spark:
>
> Spark's behavior imho is super bad.
>
> Theoretically I could overwrite all functions. I was able e.g. to
> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> FUNCTION syntax. Otherwise I get an exception that a function already
> exists. However when I used the CAST function in a query it used the
> native, built-in one.
>
> When I overwrote current_date() function, it was used in a query, but it
> completely replaces the built-in function and I can no longer use the
> native function in any way. I cannot also drop the temporary function. I
> get:
>
> *    Error in query: Cannot drop native function 'current_date';*
>
> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> with a database. Temporary functions are always represented as a single
> name.
>
> In my opinion neither of the systems have consistent behavior. Generally
> speaking I think overwriting any system provided functions is just
> dangerous.
>
> Regarding Jark's concerns. Such functions would be registered in a
>
> current
>
> catalog/database schema, so a user could still use its own function, but
> would have to fully qualify the function (because built-in functions take
> precedence). Moreover users would have the same problem with permanent
> functions. Imagine a user have a permanent function 'cat.db.explode'. In
> 1.9 the user could use just the 'explode' function as long as the 'cat' &
> 'db' were the default catalog & database. If we introduce 'explode'
> built-in function in 1.10, the user has to fully qualify the function.
>
> Best,
>
> Dawid
> On 04/09/2019 15:19, Timo Walther wrote:
>
> Hi all,
>
> thanks for the healthy discussion. It is already a very long discussion
> with a lot of text. So I will just post my opinion to a couple of
> statements:
>
>
> Hive built-in functions are not part of Flink built-in functions, they
>
> are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored in
> every Hive metastore catalog that is freshly created but are a set of
> functions that are listed somewhere and made available.
>
>
> ambiguous functions reference just shouldn't be resolved to a different
>
> catalog
>
> I agree. They should not be resolved to a different catalog. That's why I
> am suggesting to split the concept of built-in functions and catalog
>
> lookup
>
> semantics.
>
>
> I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions, both
>
> in
>
> the root of the Presto source."
>
>
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions as
> built-ins but we will end-up in a dependency hell in the near future if
>
> we
>
> don't introduce a pluggable approach. Library functions is what you also
> suggest but storing them in a catalog means to always fully qualify them
>
> or
>
> modifying the existing catalog design that was inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
>
> why would users name a temporary function the same as a built-in
>
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
>
> I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and Hive
> is just one of many systems that we can support. Not every built-in
> function behaves and will behave exactly like Hive.
>
>
> regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as a
> reference. I don't think that a user can overwrite a built-in function
> there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we
>
> will
>
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that
>
> support
>
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
>
>
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>
> )
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and
>
> must
>
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need
>
> a
>
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakowicz@apache.org
> <dw...@apache.org> <dw...@apache.org>
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> <us...@gmail.com> <us...@gmail.com> wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that Bowen's original proposal was trying to avoid this, but this
>
> could
>
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in
>
> function
>
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> <yk...@gmail.com> <yk...@gmail.com> wrote:
>
> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>
> the
>
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach,
>
> let's
>
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So
>
> user
>
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this
>
> is a
>
> very popular function
> and add it into Flink's built-in functions, but with different
>
> behavior
>
> as
>
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly
>
> what
>
> kind of
> built-in functions he would like to use. He can just tell framework
>
> to
>
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them,
>
> but
>
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com> wrote:
>
> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>
> section
>
> in the google doc was updated, please take a look first and let me
>
> know
>
> if
>
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions
>
> can
>
> only
>
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but
>
> not
>
> functions. It would be quite convinient for users to use Hive
>
> built-ins
>
> even if they use a Confluent schema registry or just the in-memory
>
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink
>
> built-in
>
> functions, they are catalog functions, thus if the current catalog
>
> is
>
> not a
>
> HiveCatalog but, say, a schema registry catalog, ambiguous
>
> functions
>
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced
>
> across
>
> catalog, but it doesn't have db namespace and we currently just
>
> don't
>
> have
>
> a SQL syntax for it. It can be enabled when such a SQL syntax is
>
> defined,
>
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and
>
> built-in
>
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>
> maybe
>
> we add more experimental functions in the future or function for
>
> some
>
> special application area (Geo functions, ML functions). A data
>
> platform
>
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have
>
> the
>
> "external built-in functions" anymore and currently the built-in
>
> function
>
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about
>
> it:
>
> - I don't know if any other databases handle built-in functions
>
> like
>
> that.
>
> Maybe you can give some examples? IMHO, built-in functions are
>
> system
>
> info
>
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some
>
> other
>
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by
>
> other
>
> approaches too. E.g. experimental functions can be taken good care
>
> of
>
> by
>
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable
>
> built-in
>
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g.
>
> how
>
> do
>
> you handle built-in functions in different modules but different
>
> names?
>
> In short, I'm not sure if it really stands and it looks like an
>
> overkill
>
> to me. I'd rather not go to that route. Related discussion can be
>
> on
>
> its
>
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a
>
> static
>
> list like in BuiltInFunctionDefinitions, a platform team should be
>
> able
>
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree
>
> with
>
> Kurt
>
> that we should unify built-in function (external or internal)
>
> under a
>
> common layer. However, the resolution order should be:
>    1. built-in functions
>    2. temporary functions
>    3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's
>
> built-in
>
> functions. If you take a look at other vendors, like SQL Server
>
> they
>
> also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external
>
> or
>
> internal) under a common layer.“ <- I don't think this is what Kurt
>
> means.
>
> Kurt and I are in favor of unifying built-in functions of external
>
> systems
>
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed.
>
> Temporary
>
> functions have a lifespan over a session and are only visible to
>
> the
>
> session owner, they are unique to each user, and users create them
>
> on
>
> purpose to be the highest priority in order to overwrite system
>
> info
>
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same
>
> as a
>
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we
>
> should
>
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change
>
> their
>
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will
>
> lead
>
> to
>
> changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special
>
> catalog
>
> functions.
>
> Re> Another question is, does this fallback include all
>
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super
> critical to the scope of this FLIP, thus I'd like to leave it to
>
> future
>
> efforts as a nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> <bo...@gmail.com> <bo...@gmail.com>
>
> wrote:
>
> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them
>
> both
>
> under
>
> "catalog" concept, by extending catalog function to make it have
>
> ability to
>
> have built-in catalog functions. Some benefits I can see from
>
> this
>
> approach:
>
> 1. We don't have to introduce new concept like external built-in
>
> functions.
>
> Actually I don't see a full story about how to treat a built-in
>
> functions, and it
>
> seems a little bit disrupt with catalog. As a result, you have
>
> to
>
> make
>
> some restriction
>
> like "hive built-in functions can only be used when current
>
> catalog
>
> is
>
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some
>
> part
>
> of
>
> the doc. I've modified those sections, and they are up to date
>
> now.
>
> In short, now built-in function of external systems are defined
>
> as
>
> a
>
> special kind of catalog function in Flink, and handled by Flink
>
> as
>
> following:
> - An external built-in function must be associated with a catalog
>
> for
>
> the purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous
>
> function
>
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a
>
> schema/database
>
> namespace
> - It goes thru the same instantiation logic as other user defined
> catalog functions in the external system
>
> Please take another look at the doc, and let me know if you have
>
> more
>
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> <tw...@apache.org> <tw...@apache.org>
>
> wrote:
>
> Hi Kurt,
>
> it should not affect the functions and operations we currently
>
> have
>
> in
>
> SQL. It just categorizes the available built-in functions. It is
>
> kind
>
> of
> an orthogonal concept to the catalog API but built-in functions
>
> deserve
>
> this special kind of treatment. CatalogFunction still fits
>
> perfectly
>
> in
>
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way
>
> but
>
> with
>
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
>
> Does this only affect the functions and operations we currently
>
> have
>
> in SQL
>
> and
> have no effect on tables, right? Looks like this is an
>
> orthogonal
>
> concept
>
> with Catalog?
> If the answer are both yes, then the catalog function will be a
>
> weird
>
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> yuzhao.cyz@gmail.com
>
> wrote:
>
> The way you proposed are basically the same as what Calcite
>
> does, I
>
> think
>
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>
> ,写道:
>
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
>
> Thanks Bowen for bring up this topic, I think it’s a useful
>
> refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and
>
> operators
>
> of Hive, here is a solution from Apache Calcite, the Calcite
>
> way
>
> is
>
> to make
>
> every dialect operators a “Library”, user can specify which
>
> libraries they
>
> want to use for a sql query. The builtin operators always
>
> comes
>
> as
>
> the
>
> first class objects and the others are used from the order
>
> they
>
> appears.
>
> Maybe you can take a reference.
>
> [1]
>
>
>
>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>
> ,写道:
>
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's
>
> FunctionCatalog.
>
> It's critically helpful to improve function usability in
>
> SQL.
>
>
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with
>
> fully/partially
>
> qualified name
> - redefines function resolution order for ambiguous
>
> function
>
> reference
>
> - adds support for Hive's rich built-in functions (support
>
> for
>
> Hive
>
> user
>
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Xuefu,

Just wanted to summarize my opinion on the one topic (temporary functions).

My preference would be to make temporary functions always 3-part
qualified (as a result that would prohibit overriding built-in
functions). Having said that if the community decides that it's better
to allow overriding built-in functions I am fine with it and can commit
to that decision.

I wanted to ask if you could clarify a few points for me around that option.

 1. Would you enforce temporary functions to be always just a single
    name (without db & cat) as hive does, or would you allow also 3 or
    even 2 part identifiers?
 2. Assuming 2/3-part paths. How would you register a function from a
    following statement: CREATE TEMPORARY FUNCTION db.func? Would that
    shadow all functions named 'func' in all databases named 'db' in all
    catalogs? Or would you shadow only function 'func' in database 'db'
    in current catalog?
 3. This point is still under discussion, but was mentioned a few times,
    that maybe we want to enable syntax cat.func for "external built-in
    functions". How would that affect statement from previous point?
    Would 'db.func' shadow "external built-in function" in 'db' catalog
    or user functions as in point 2? Or maybe both?
 4. Lastly in fact to summarize the previous points. Assuming 2/3-part
    paths. Would the function resolution be actually as follows?:
     1. temporary functions (1-part path)
     2. built-in functions
     3. temporary functions (2-part path)
     4. 2-part catalog functions a.k.a. "external built-in functions"
        (cat + func) - this is still under discussion, if we want that
        in the other focal point
     5. temporary functions (3-part path)
     6. 3-part catalog functions a.k.a. user functions

I would be really grateful if you could explain me those questions, thanks.

BTW, Thank you all for a healthy discussion.

Best,

Dawid

On 04/09/2019 23:25, Xuefu Z wrote:
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> wrote:
>
>> Let me try to summarize and conclude the long thread so far:
>>
>> 1. For order of temp function v.s. built-in function:
>>
>> I think Dawid's point that temp function should be of fully qualified path
>> is a better reasoning to back the newly proposed order, and i agree we
>> don't need to follow Hive/Spark.
>>
>> However, I'd rather not change fundamentals of temporary functions in this
>> FLIP. It belongs to a bigger story of how temporary objects should be
>> redefined and be handled uniformly - currently temporary tables and views
>> (those registered from TableEnv#registerTable()) behave different than what
>> Dawid propose for temp functions, and we need a FLIP to just unify their
>> APIs and behaviors.
>>
>> I agree that backward compatibility is not an issue w.r.t Jark's points.
>>
>> ***Seems we do have consensus that it's acceptable to prevent users
>> registering a temp function in the same name as a built-in function. To
>> help us move forward, I'd like to propose setting such a restraint on temp
>> functions in this FLIP to simplify the design and avoid disputes.*** It
>> will also leave rooms for improvements in the future.
>>
>>
>> 2. For Hive built-in function:
>>
>> Thanks Timo for providing the Presto and Postgres examples. I feel modular
>> built-in functions can be a good fit for the geo and ml example as a native
>> Flink extension, but not sure if it fits well with external integrations.
>> Anyway, I think modular built-in functions is a bigger story and can be on
>> its own thread too, and our proposal doesn't prevent Flink from doing that
>> in the future.
>>
>> ***Seems we have consensus that users should be able to use built-in
>> functions of Hive or other external systems in SQL explicitly and
>> deterministically regardless of Flink built-in functions and the potential
>> modular built-in functions, via some new syntax like "mycat::func"? If so,
>> I'd like to propose removing Hive built-in functions from ambiguous
>> function resolution order, and empower users with such a syntax. This way
>> we sacrifice a little convenience for certainty***
>>
>>
>> What do you think?
>>
>> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
>>> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>> are
>>> very inconsistent in that manner (spark being way worse on that).
>>>
>>> Hive:
>>>
>>> You cannot overwrite all the built-in functions. I could overwrite most
>> of
>>> the functions I tried e.g. length, e, pi, round, rtrim, but there are
>>> functions I cannot overwrite e.g. CAST, ARRAY I get:
>>>
>>>
>>> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>>>
>>> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
>>> *map* or *struct*. Though hive behaves reasonable well if I manage to
>>> overwrite a function. When I drop the temporary function the native
>>> function is still available.
>>>
>>> Spark:
>>>
>>> Spark's behavior imho is super bad.
>>>
>>> Theoretically I could overwrite all functions. I was able e.g. to
>>> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
>>> FUNCTION syntax. Otherwise I get an exception that a function already
>>> exists. However when I used the CAST function in a query it used the
>>> native, built-in one.
>>>
>>> When I overwrote current_date() function, it was used in a query, but it
>>> completely replaces the built-in function and I can no longer use the
>>> native function in any way. I cannot also drop the temporary function. I
>>> get:
>>>
>>> *    Error in query: Cannot drop native function 'current_date';*
>>>
>>> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
>>> with a database. Temporary functions are always represented as a single
>>> name.
>>>
>>> In my opinion neither of the systems have consistent behavior. Generally
>>> speaking I think overwriting any system provided functions is just
>>> dangerous.
>>>
>>> Regarding Jark's concerns. Such functions would be registered in a
>> current
>>> catalog/database schema, so a user could still use its own function, but
>>> would have to fully qualify the function (because built-in functions take
>>> precedence). Moreover users would have the same problem with permanent
>>> functions. Imagine a user have a permanent function 'cat.db.explode'. In
>>> 1.9 the user could use just the 'explode' function as long as the 'cat' &
>>> 'db' were the default catalog & database. If we introduce 'explode'
>>> built-in function in 1.10, the user has to fully qualify the function.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 04/09/2019 15:19, Timo Walther wrote:
>>>
>>> Hi all,
>>>
>>> thanks for the healthy discussion. It is already a very long discussion
>>> with a lot of text. So I will just post my opinion to a couple of
>>> statements:
>>>
>>>> Hive built-in functions are not part of Flink built-in functions, they
>>> are catalog functions
>>>
>>> That is not entirely true. Correct me if I'm wrong but I think Hive
>>> built-in functions are also not catalog functions. They are not stored in
>>> every Hive metastore catalog that is freshly created but are a set of
>>> functions that are listed somewhere and made available.
>>>
>>>> ambiguous functions reference just shouldn't be resolved to a different
>>> catalog
>>>
>>> I agree. They should not be resolved to a different catalog. That's why I
>>> am suggesting to split the concept of built-in functions and catalog
>> lookup
>>> semantics.
>>>
>>>> I don't know if any other databases handle built-in functions like that
>>> What I called "module" is:
>>> - Extension in Postgres [1]
>>> - Plugin in Presto [2]
>>>
>>> Btw. Presto even mentions example modules that are similar to the ones
>>> that we will introduce in the near future both for ML and System XYZ
>>> compatibility:
>>> "See either the presto-ml module for machine learning functions or the
>>> presto-teradata-functions module for Teradata-compatible functions, both
>> in
>>> the root of the Presto source."
>>>
>>>> functions should be either built-in already or just libraries
>> functions,
>>> and library functions can be adapted to catalog APIs or of some other
>>> syntax to use
>>>
>>> Regarding "built-in already", of course we can add a lot of functions as
>>> built-ins but we will end-up in a dependency hell in the near future if
>> we
>>> don't introduce a pluggable approach. Library functions is what you also
>>> suggest but storing them in a catalog means to always fully qualify them
>> or
>>> modifying the existing catalog design that was inspired by the standard.
>>>
>>> I don't think "it brings in even more complicated scenarios to the
>>> design", it just does clear separation of concerns. Integrating the
>>> functionality into the current design makes the catalog API more
>>> complicated.
>>>
>>>> why would users name a temporary function the same as a built-in
>>> function then?
>>>
>>> Because you never know what users do. If they don't, my suggested
>>> resolution order should not be a problem, right?
>>>
>>>> I don't think hive functions deserves be a function module
>>> Our goal is not to create a Hive clone. We need to think forward and Hive
>>> is just one of many systems that we can support. Not every built-in
>>> function behaves and will behave exactly like Hive.
>>>
>>>> regarding temporary functions, there are few systems that support it
>>> IMHO Spark and Hive are not always the best examples for consistent
>>> design. Systems like Postgres, Presto, or SQL Server should be used as a
>>> reference. I don't think that a user can overwrite a built-in function
>>> there.
>>>
>>> Regards,
>>> Timo
>>>
>>> [1] https://www.postgresql.org/docs/10/extend-extensions.html
>>> [2] https://prestodb.github.io/docs/current/develop/functions.html
>>>
>>>
>>> On 04.09.19 13:44, Jark Wu wrote:
>>>
>>> Hi all,
>>>
>>> Regarding #1 temp function <> built-in function and naming.
>>> I'm fine with temp functions should precede built-in function and can
>>> override built-in functions (we already support to override built-in
>>> function in 1.9).
>>> If we don't allow the same name as a built-in function, I'm afraid we
>> will
>>> have compatibility issues in the future.
>>> Say users register a user defined function named "explode" in 1.9, and we
>>> support a built-in "explode" function in 1.10.
>>> Then the user's jobs which call the registered "explode" function in 1.9
>>> will all fail in 1.10 because of naming conflict.
>>>
>>> Regarding #2 "External" built-in functions.
>>> I think if we store external built-in functions in catalog, then
>>> "hive1::sqrt" is a good way to go.
>>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>>> built-in functions as Timo suggested above.
>>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>>> function set as built-in functions in an easy way.
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
>>> <us...@gmail.com> wrote:
>>>
>>> Hi David,
>>>
>>> Thank you for sharing your findings. It seems to me that there is no SQL
>>> standard regarding temporary functions. There are few systems that
>> support
>>> it. Here are what I have found:
>>>
>>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>>> 2. Spark: basically follows Hive (
>>>
>>>
>>>
>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>> )
>>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
>>> behavior. (
>>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
>> )
>>> Because of lack of standard, it's perfectly fine for Flink to define
>>> whatever it sees appropriate. Thus, your proposal (no overwriting and
>> must
>>> have DB as holder) is one option. The advantage is simplicity, The
>>> downside
>>> is the deviation from Hive, which is popular and de facto standard in big
>>> data world.
>>>
>>> However, I don't think we have to follow Hive. More importantly, we need
>> a
>>> consensus. I have no objection if your proposal is generally agreed upon.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakowicz@apache.org
>>>
>>> <dw...@apache.org>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> Just an opinion on the built-in <> temporary functions resolution and
>>> NAMING issue. I think we should not allow overriding the built-in
>>> functions, as this may pose serious issues and to be honest is rather
>>> not feasible and would require major rework. What happens if a user
>>> wants to override CAST? Calls to that function are generated at
>>> different layers of the stack that unfortunately does not always go
>>> through the Catalog API (at least yet). Moreover from what I've checked
>>> no other systems allow overriding the built-in functions. All the
>>> systems I've checked so far register temporary functions in a
>>> database/schema (either special database for temporary functions, or
>>> just current database). What I would suggest is to always register
>>> temporary functions with a 3 part identifier. The same way as tables,
>>> views etc. This effectively means you cannot override built-in
>>> functions. With such approach it is natural that the temporary functions
>>> end up a step lower in the resolution order:
>>>
>>> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>>>
>>> 2. temporary functions (always 3 part path)
>>>
>>> 3. catalog functions (always 3 part path)
>>>
>>> Let me know what do you think.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> On 04/09/2019 06:13, Bowen Li wrote:
>>>
>>> Hi,
>>>
>>> I agree with Xuefu that the main controversial points are mainly the
>>>
>>> two
>>>
>>> places. My thoughts on them:
>>>
>>> 1) Determinism of referencing Hive built-in functions. We can either
>>>
>>> remove
>>>
>>> Hive built-in functions from ambiguous function resolution and require
>>> users to use special syntax for their qualified names, or add a config
>>>
>>> flag
>>>
>>> to catalog constructor/yaml for turning on and off Hive built-in
>>>
>>> functions
>>>
>>> with the flag set to 'false' by default and proper doc added to help
>>>
>>> users
>>>
>>> make their decisions.
>>>
>>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>>
>>> function
>>>
>>> resolution order. We believe Flink temp functions should precede Flink
>>> built-in functions, and I have presented my reasons. Just in case if we
>>> cannot reach an agreement, I propose forbid users registering temp
>>> functions in the same name as a built-in function, like MySQL's
>>>
>>> approach,
>>>
>>> for the moment. It won't have any performance concern, since built-in
>>> functions are all in memory and thus cost of a name check will be
>>>
>>> really
>>>
>>> trivial.
>>>
>>>
>>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
>>> <us...@gmail.com> wrote:
>>>
>>>  From what I have seen, there are a couple of focal disagreements:
>>>
>>> 1. Resolution order: temp function --> flink built-in function -->
>>>
>>> catalog
>>>
>>> function vs flink built-in function --> temp function -> catalog
>>>
>>> function.
>>>
>>> 2. "External" built-in functions: how to treat built-in functions in
>>> external system and how users reference them
>>>
>>> For #1, I agree with Bowen that temp function needs to be at the
>>>
>>> highest
>>>
>>> priority because that's how a user might overwrite a built-in function
>>> without referencing a persistent, overwriting catalog function with a
>>>
>>> fully
>>>
>>> qualified name. Putting built-in functions at the highest priority
>>> eliminates that usage.
>>>
>>> For #2, I saw a general agreement on referencing "external" built-in
>>> functions such as those in Hive needs to be explicit and deterministic
>>>
>>> even
>>>
>>> though different approaches are proposed. To limit the scope and
>>>
>>> simply
>>>
>>> the
>>>
>>> usage, it seems making sense to me to introduce special syntax for
>>>
>>> user  to
>>>
>>> explicitly reference an external built-in function such as hive1::sqrt
>>>
>>> or
>>>
>>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>>>
>>> call
>>>
>>> hive1.getFunction(ObjectPath functionName) where the database name is
>>> absent for bulit-in functions available in that catalog hive1. I
>>>
>>> understand
>>>
>>> that Bowen's original proposal was trying to avoid this, but this
>>>
>>> could
>>>
>>> turn out to be a clean and simple solution.
>>>
>>> (Timo's modular approach is great way to "expand" Flink's built-in
>>>
>>> function
>>>
>>> set, which seems orthogonal and complementary to this, which could be
>>> tackled in further future work.)
>>>
>>> I'd be happy to hear further thoughts on the two points.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
>>> <yk...@gmail.com> wrote:
>>>
>>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>>>
>>> the
>>>
>>> same
>>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>>> suggestion.
>>>
>>> The reason is backward compatibility. If we follow Bowen's approach,
>>>
>>> let's
>>>
>>> say we
>>> first find function in Flink's built-in functions, and then hive's
>>> built-in. For example, `foo`
>>> is not supported by Flink, but hive has such built-in function. So
>>>
>>> user
>>>
>>> will have hive's
>>> behavior for function `foo`. And in next release, Flink realize this
>>>
>>> is a
>>>
>>> very popular function
>>> and add it into Flink's built-in functions, but with different
>>>
>>> behavior
>>>
>>> as
>>>
>>> hive's. So in next
>>> release, the behavior changes.
>>>
>>> With Timo's approach, IIUC user have to tell the framework explicitly
>>>
>>> what
>>>
>>> kind of
>>> built-in functions he would like to use. He can just tell framework
>>>
>>> to
>>>
>>> abandon Flink's built-in
>>> functions, and use hive's instead. User can only choose between them,
>>>
>>> but
>>>
>>> not use
>>> them at the same time. I think this approach is more predictable.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
>>> <bo...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>>>
>>> section
>>>
>>> in the google doc was updated, please take a look first and let me
>>>
>>> know
>>>
>>> if
>>>
>>> you have more questions.
>>>
>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
>>> <bo...@gmail.com>
>>>
>>> wrote:
>>>
>>> Hi Timo,
>>>
>>> Re> 1) We should not have the restriction "hive built-in functions
>>>
>>> can
>>>
>>> only
>>>
>>> be used when current catalog is hive catalog". Switching a catalog
>>> should only have implications on the cat.db.object resolution but
>>>
>>> not
>>>
>>> functions. It would be quite convinient for users to use Hive
>>>
>>> built-ins
>>>
>>> even if they use a Confluent schema registry or just the in-memory
>>>
>>> catalog.
>>>
>>> There might be a misunderstanding here.
>>>
>>> First of all, Hive built-in functions are not part of Flink
>>>
>>> built-in
>>>
>>> functions, they are catalog functions, thus if the current catalog
>>>
>>> is
>>>
>>> not a
>>>
>>> HiveCatalog but, say, a schema registry catalog, ambiguous
>>>
>>> functions
>>>
>>> reference just shouldn't be resolved to a different catalog.
>>>
>>> Second, Hive built-in functions can potentially be referenced
>>>
>>> across
>>>
>>> catalog, but it doesn't have db namespace and we currently just
>>>
>>> don't
>>>
>>> have
>>>
>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>>
>>> defined,
>>>
>>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>>
>>> 2) I would propose to have separate concepts for catalog and
>>>
>>> built-in
>>>
>>> functions. In particular it would be nice to modularize built-in
>>> functions. Some built-in functions are very crucial (like AS, CAST,
>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>>
>>> maybe
>>>
>>> we add more experimental functions in the future or function for
>>>
>>> some
>>>
>>> special application area (Geo functions, ML functions). A data
>>>
>>> platform
>>>
>>> team might not want to make every built-in function available. Or a
>>> function module like ML functions is in a different Maven module.
>>>
>>> I think this is orthogonal to this FLIP, especially we don't have
>>>
>>> the
>>>
>>> "external built-in functions" anymore and currently the built-in
>>>
>>> function
>>>
>>> category remains untouched.
>>>
>>> But just to share some thoughts on the proposal, I'm not sure about
>>>
>>> it:
>>>
>>> - I don't know if any other databases handle built-in functions
>>>
>>> like
>>>
>>> that.
>>>
>>> Maybe you can give some examples? IMHO, built-in functions are
>>>
>>> system
>>>
>>> info
>>>
>>> and should be deterministic, not depending on loaded libraries. Geo
>>> functions should be either built-in already or just libraries
>>>
>>> functions,
>>>
>>> and library functions can be adapted to catalog APIs or of some
>>>
>>> other
>>>
>>> syntax to use
>>> - I don't know if all use cases stand, and many can be achieved by
>>>
>>> other
>>>
>>> approaches too. E.g. experimental functions can be taken good care
>>>
>>> of
>>>
>>> by
>>>
>>> documentations, annotations, etc
>>> - the proposal basically introduces some concept like a pluggable
>>>
>>> built-in
>>>
>>> function catalog, despite the already existing catalog APIs
>>> - it brings in even more complicated scenarios to the design. E.g.
>>>
>>> how
>>>
>>> do
>>>
>>> you handle built-in functions in different modules but different
>>>
>>> names?
>>>
>>> In short, I'm not sure if it really stands and it looks like an
>>>
>>> overkill
>>>
>>> to me. I'd rather not go to that route. Related discussion can be
>>>
>>> on
>>>
>>> its
>>>
>>> own thread.
>>>
>>> 3) Following the suggestion above, we can have a separate discovery
>>> mechanism for built-in functions. Instead of just going through a
>>>
>>> static
>>>
>>> list like in BuiltInFunctionDefinitions, a platform team should be
>>>
>>> able
>>>
>>> to select function modules like
>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>>> HiveFunctions) or via service discovery;
>>>
>>> Same as above. I'll leave it to its own thread.
>>>
>>> re > 3) Dawid and I discussed the resulution order again. I agree
>>>
>>> with
>>>
>>> Kurt
>>>
>>> that we should unify built-in function (external or internal)
>>>
>>> under a
>>>
>>> common layer. However, the resolution order should be:
>>>    1. built-in functions
>>>    2. temporary functions
>>>    3. regular catalog resolution logic
>>> Otherwise a temporary function could cause clashes with Flink's
>>>
>>> built-in
>>>
>>> functions. If you take a look at other vendors, like SQL Server
>>>
>>> they
>>>
>>> also do not allow to overwrite built-in functions.
>>>
>>> ”I agree with Kurt that we should unify built-in function (external
>>>
>>> or
>>>
>>> internal) under a common layer.“ <- I don't think this is what Kurt
>>>
>>> means.
>>>
>>> Kurt and I are in favor of unifying built-in functions of external
>>>
>>> systems
>>>
>>> and catalog functions. Did you type a mistake?
>>>
>>> Besides, I'm not sure about the resolution order you proposed.
>>>
>>> Temporary
>>>
>>> functions have a lifespan over a session and are only visible to
>>>
>>> the
>>>
>>> session owner, they are unique to each user, and users create them
>>>
>>> on
>>>
>>> purpose to be the highest priority in order to overwrite system
>>>
>>> info
>>>
>>> (built-in functions in this case).
>>>
>>> In your case, why would users name a temporary function the same
>>>
>>> as a
>>>
>>> built-in function then? Since using that name in ambiguous function
>>> reference will always be resolved to built-in functions, creating a
>>> same-named temp function would be meaningless in the end.
>>>
>>>
>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
>>> <bo...@gmail.com>
>>>
>>> wrote:
>>>
>>> Hi Jingsong,
>>>
>>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>>
>>> should
>>>
>>> not introduce interfaces to influence the framework. To make
>>> Flink itself more powerful, we should implement the functions
>>> we need to add.
>>>
>>> Yes, please see the doc.
>>>
>>> Re> 2.Non-flink built-in functions are easy for users to change
>>>
>>> their
>>>
>>> behavior. If we support some flink built-in functions in the
>>> future but act differently from non-flink built-in, this will
>>>
>>> lead
>>>
>>> to
>>>
>>> changes in user behavior.
>>>
>>> There's no such concept as "external built-in functions" any more.
>>> Built-in functions of external systems will be treated as special
>>>
>>> catalog
>>>
>>> functions.
>>>
>>> Re> Another question is, does this fallback include all
>>>
>>> hive built-in functions? As far as I know, some hive functions
>>> have some hacky. If possible, can we start with a white list?
>>> Once we implement some functions to flink built-in, we can
>>> also update the whitelist.
>>>
>>> Yes, that's something we thought of too. I don't think it's super
>>> critical to the scope of this FLIP, thus I'd like to leave it to
>>>
>>> future
>>>
>>> efforts as a nice-to-have feature.
>>>
>>>
>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
>>> <bo...@gmail.com>
>>>
>>> wrote:
>>>
>>> Hi Kurt,
>>>
>>> Re: > What I want to propose is we can merge #3 and #4, make them
>>>
>>> both
>>>
>>> under
>>>
>>> "catalog" concept, by extending catalog function to make it have
>>>
>>> ability to
>>>
>>> have built-in catalog functions. Some benefits I can see from
>>>
>>> this
>>>
>>> approach:
>>>
>>> 1. We don't have to introduce new concept like external built-in
>>>
>>> functions.
>>>
>>> Actually I don't see a full story about how to treat a built-in
>>>
>>> functions, and it
>>>
>>> seems a little bit disrupt with catalog. As a result, you have
>>>
>>> to
>>>
>>> make
>>>
>>> some restriction
>>>
>>> like "hive built-in functions can only be used when current
>>>
>>> catalog
>>>
>>> is
>>>
>>> hive catalog".
>>>
>>> Yes, I've unified #3 and #4 but it seems I didn't update some
>>>
>>> part
>>>
>>> of
>>>
>>> the doc. I've modified those sections, and they are up to date
>>>
>>> now.
>>>
>>> In short, now built-in function of external systems are defined
>>>
>>> as
>>>
>>> a
>>>
>>> special kind of catalog function in Flink, and handled by Flink
>>>
>>> as
>>>
>>> following:
>>> - An external built-in function must be associated with a catalog
>>>
>>> for
>>>
>>> the purpose of decoupling flink-table and external systems.
>>> - It always resides in front of catalog functions in ambiguous
>>>
>>> function
>>>
>>> reference order, just like in its own external system
>>> - It is a special catalog function that doesn’t have a
>>>
>>> schema/database
>>>
>>> namespace
>>> - It goes thru the same instantiation logic as other user defined
>>> catalog functions in the external system
>>>
>>> Please take another look at the doc, and let me know if you have
>>>
>>> more
>>>
>>> questions.
>>>
>>>
>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
>>> <tw...@apache.org>
>>>
>>> wrote:
>>>
>>> Hi Kurt,
>>>
>>> it should not affect the functions and operations we currently
>>>
>>> have
>>>
>>> in
>>>
>>> SQL. It just categorizes the available built-in functions. It is
>>>
>>> kind
>>>
>>> of
>>> an orthogonal concept to the catalog API but built-in functions
>>>
>>> deserve
>>>
>>> this special kind of treatment. CatalogFunction still fits
>>>
>>> perfectly
>>>
>>> in
>>>
>>> there because the regular catalog object resolution logic is not
>>> affected. So tables and functions are resolved in the same way
>>>
>>> but
>>>
>>> with
>>>
>>> built-in functions that have priority as in the original design.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 03.09.19 15:26, Kurt Young wrote:
>>>
>>> Does this only affect the functions and operations we currently
>>>
>>> have
>>>
>>> in SQL
>>>
>>> and
>>> have no effect on tables, right? Looks like this is an
>>>
>>> orthogonal
>>>
>>> concept
>>>
>>> with Catalog?
>>> If the answer are both yes, then the catalog function will be a
>>>
>>> weird
>>>
>>> concept?
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>>>
>>> yuzhao.cyz@gmail.com
>>>
>>> wrote:
>>>
>>> The way you proposed are basically the same as what Calcite
>>>
>>> does, I
>>>
>>> think
>>>
>>> we are in the same line.
>>>
>>> Best,
>>> Danny Chan
>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>>>
>>> ,写道:
>>>
>>> This sounds exactly as the module approach I mentioned, no?
>>>
>>> Regards,
>>> Timo
>>>
>>> On 03.09.19 13:42, Danny Chan wrote:
>>>
>>> Thanks Bowen for bring up this topic, I think it’s a useful
>>>
>>> refactoring to make our function usage more user friendly.
>>>
>>> For the topic of how to organize the builtin operators and
>>>
>>> operators
>>>
>>> of Hive, here is a solution from Apache Calcite, the Calcite
>>>
>>> way
>>>
>>> is
>>>
>>> to make
>>>
>>> every dialect operators a “Library”, user can specify which
>>>
>>> libraries they
>>>
>>> want to use for a sql query. The builtin operators always
>>>
>>> comes
>>>
>>> as
>>>
>>> the
>>>
>>> first class objects and the others are used from the order
>>>
>>> they
>>>
>>> appears.
>>>
>>> Maybe you can take a reference.
>>>
>>> [1]
>>>
>>>
>>>
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>> Best,
>>> Danny Chan
>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>>>
>>> ,写道:
>>>
>>> Hi folks,
>>>
>>> I'd like to kick off a discussion on reworking Flink's
>>>
>>> FunctionCatalog.
>>>
>>> It's critically helpful to improve function usability in
>>>
>>> SQL.
>>>
>>>
>>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>> In short, it:
>>> - adds support for precise function reference with
>>>
>>> fully/partially
>>>
>>> qualified name
>>> - redefines function resolution order for ambiguous
>>>
>>> function
>>>
>>> reference
>>>
>>> - adds support for Hive's rich built-in functions (support
>>>
>>> for
>>>
>>> Hive
>>>
>>> user
>>>
>>> defined functions was already added in 1.9.0)
>>> - clarifies the concept of temporary functions
>>>
>>> Would love to hear your thoughts.
>>>
>>> Bowen
>>>
>>> --
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>
>>>
>>> --
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>
>>>
>>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi David,

Thanks for sharing the findings about temporary functions. Because of
strong inconsistency observed in Spark, we can probably ignore it for now.
For Hive, I understand one may not be able to overwrite everything, but the
capability is being offered.

Whether we offer this capability is to be determined, I don't see the
"danger" you mentioned. It's user's action, which only impacts the user's
current session. Other users are not impacted. (That's one of the benefits
of temporary objects.) We cannot always prevent users from making mistakes.
If we think overwriting is useful to the user, then we can allow it. To
make it more consistent, we can also further restrict by further
blacklisting built-in functions that may not be overwritten.

In my past experience, I did see user needs to overwriting a built-in
function in Hive. Without this capability, user has to create a permanent
function and modify all the queries referencing the function with it
fully-qualified name. This is equivalent to create a new, user-defined,
function. This way can work, but the usability is bad.

(Jark's "explode" example is actually important because forcing user to
modify query is of bad experience after an upgrade.)

In short, we can theoretically disallow function overwriting since there is
no standard. However, I don't see strong reasons for doing so, especially
such a capability is useful to some users.

Thanks,
Xuefu







On Wed, Sep 4, 2019 at 10:02 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they are
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most of
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> *map* or *struct*. Though hive behaves reasonable well if I manage to
> overwrite a function. When I drop the temporary function the native
> function is still available.
>
> Spark:
>
> Spark's behavior imho is super bad.
>
> Theoretically I could overwrite all functions. I was able e.g. to
> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> FUNCTION syntax. Otherwise I get an exception that a function already
> exists. However when I used the CAST function in a query it used the
> native, built-in one.
>
> When I overwrote current_date() function, it was used in a query, but it
> completely replaces the built-in function and I can no longer use the
> native function in any way. I cannot also drop the temporary function. I
> get:
>
> *    Error in query: Cannot drop native function 'current_date';*
>
> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> with a database. Temporary functions are always represented as a single
> name.
>
> In my opinion neither of the systems have consistent behavior. Generally
> speaking I think overwriting any system provided functions is just
> dangerous.
>
> Regarding Jark's concerns. Such functions would be registered in a current
> catalog/database schema, so a user could still use its own function, but
> would have to fully qualify the function (because built-in functions take
> precedence). Moreover users would have the same problem with permanent
> functions. Imagine a user have a permanent function 'cat.db.explode'. In
> 1.9 the user could use just the 'explode' function as long as the 'cat' &
> 'db' were the default catalog & database. If we introduce 'explode'
> built-in function in 1.10, the user has to fully qualify the function.
>
> Best,
>
> Dawid
> On 04/09/2019 15:19, Timo Walther wrote:
>
> Hi all,
>
> thanks for the healthy discussion. It is already a very long discussion
> with a lot of text. So I will just post my opinion to a couple of
> statements:
>
> > Hive built-in functions are not part of Flink built-in functions, they
> are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored in
> every Hive metastore catalog that is freshly created but are a set of
> functions that are listed somewhere and made available.
>
> > ambiguous functions reference just shouldn't be resolved to a different
> catalog
>
> I agree. They should not be resolved to a different catalog. That's why I
> am suggesting to split the concept of built-in functions and catalog lookup
> semantics.
>
> > I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions, both in
> the root of the Presto source."
>
> > functions should be either built-in already or just libraries functions,
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions as
> built-ins but we will end-up in a dependency hell in the near future if we
> don't introduce a pluggable approach. Library functions is what you also
> suggest but storing them in a catalog means to always fully qualify them or
> modifying the existing catalog design that was inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
> > why would users name a temporary function the same as a built-in
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
> > I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and Hive
> is just one of many systems that we can support. Not every built-in
> function behaves and will behave exactly like Hive.
>
> > regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as a
> reference. I don't think that a user can overwrite a built-in function
> there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we will
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
> <us...@gmail.com> wrote:
>
> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that support
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and must
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need a
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org>
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
> <us...@gmail.com> wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that Bowen's original proposal was trying to avoid this, but this
>
> could
>
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in
>
> function
>
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
> <yk...@gmail.com> wrote:
>
> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>
> the
>
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach,
>
> let's
>
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So
>
> user
>
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this
>
> is a
>
> very popular function
> and add it into Flink's built-in functions, but with different
>
> behavior
>
> as
>
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly
>
> what
>
> kind of
> built-in functions he would like to use. He can just tell framework
>
> to
>
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them,
>
> but
>
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com> wrote:
>
> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>
> section
>
> in the google doc was updated, please take a look first and let me
>
> know
>
> if
>
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions
>
> can
>
> only
>
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but
>
> not
>
> functions. It would be quite convinient for users to use Hive
>
> built-ins
>
> even if they use a Confluent schema registry or just the in-memory
>
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink
>
> built-in
>
> functions, they are catalog functions, thus if the current catalog
>
> is
>
> not a
>
> HiveCatalog but, say, a schema registry catalog, ambiguous
>
> functions
>
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced
>
> across
>
> catalog, but it doesn't have db namespace and we currently just
>
> don't
>
> have
>
> a SQL syntax for it. It can be enabled when such a SQL syntax is
>
> defined,
>
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and
>
> built-in
>
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>
> maybe
>
> we add more experimental functions in the future or function for
>
> some
>
> special application area (Geo functions, ML functions). A data
>
> platform
>
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have
>
> the
>
> "external built-in functions" anymore and currently the built-in
>
> function
>
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about
>
> it:
>
> - I don't know if any other databases handle built-in functions
>
> like
>
> that.
>
> Maybe you can give some examples? IMHO, built-in functions are
>
> system
>
> info
>
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some
>
> other
>
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by
>
> other
>
> approaches too. E.g. experimental functions can be taken good care
>
> of
>
> by
>
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable
>
> built-in
>
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g.
>
> how
>
> do
>
> you handle built-in functions in different modules but different
>
> names?
>
> In short, I'm not sure if it really stands and it looks like an
>
> overkill
>
> to me. I'd rather not go to that route. Related discussion can be
>
> on
>
> its
>
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a
>
> static
>
> list like in BuiltInFunctionDefinitions, a platform team should be
>
> able
>
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree
>
> with
>
> Kurt
>
> that we should unify built-in function (external or internal)
>
> under a
>
> common layer. However, the resolution order should be:
>    1. built-in functions
>    2. temporary functions
>    3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's
>
> built-in
>
> functions. If you take a look at other vendors, like SQL Server
>
> they
>
> also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external
>
> or
>
> internal) under a common layer.“ <- I don't think this is what Kurt
>
> means.
>
> Kurt and I are in favor of unifying built-in functions of external
>
> systems
>
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed.
>
> Temporary
>
> functions have a lifespan over a session and are only visible to
>
> the
>
> session owner, they are unique to each user, and users create them
>
> on
>
> purpose to be the highest priority in order to overwrite system
>
> info
>
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same
>
> as a
>
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we
>
> should
>
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change
>
> their
>
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will
>
> lead
>
> to
>
> changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special
>
> catalog
>
> functions.
>
> Re> Another question is, does this fallback include all
>
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super
> critical to the scope of this FLIP, thus I'd like to leave it to
>
> future
>
> efforts as a nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them
>
> both
>
> under
>
> "catalog" concept, by extending catalog function to make it have
>
> ability to
>
> have built-in catalog functions. Some benefits I can see from
>
> this
>
> approach:
>
> 1. We don't have to introduce new concept like external built-in
>
> functions.
>
> Actually I don't see a full story about how to treat a built-in
>
> functions, and it
>
> seems a little bit disrupt with catalog. As a result, you have
>
> to
>
> make
>
> some restriction
>
> like "hive built-in functions can only be used when current
>
> catalog
>
> is
>
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some
>
> part
>
> of
>
> the doc. I've modified those sections, and they are up to date
>
> now.
>
> In short, now built-in function of external systems are defined
>
> as
>
> a
>
> special kind of catalog function in Flink, and handled by Flink
>
> as
>
> following:
> - An external built-in function must be associated with a catalog
>
> for
>
> the purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous
>
> function
>
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a
>
> schema/database
>
> namespace
> - It goes thru the same instantiation logic as other user defined
> catalog functions in the external system
>
> Please take another look at the doc, and let me know if you have
>
> more
>
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> <tw...@apache.org>
>
> wrote:
>
> Hi Kurt,
>
> it should not affect the functions and operations we currently
>
> have
>
> in
>
> SQL. It just categorizes the available built-in functions. It is
>
> kind
>
> of
> an orthogonal concept to the catalog API but built-in functions
>
> deserve
>
> this special kind of treatment. CatalogFunction still fits
>
> perfectly
>
> in
>
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way
>
> but
>
> with
>
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
>
> Does this only affect the functions and operations we currently
>
> have
>
> in SQL
>
> and
> have no effect on tables, right? Looks like this is an
>
> orthogonal
>
> concept
>
> with Catalog?
> If the answer are both yes, then the catalog function will be a
>
> weird
>
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>
> yuzhao.cyz@gmail.com
>
> wrote:
>
> The way you proposed are basically the same as what Calcite
>
> does, I
>
> think
>
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>
> ,写道:
>
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
>
> Thanks Bowen for bring up this topic, I think it’s a useful
>
> refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and
>
> operators
>
> of Hive, here is a solution from Apache Calcite, the Calcite
>
> way
>
> is
>
> to make
>
> every dialect operators a “Library”, user can specify which
>
> libraries they
>
> want to use for a sql query. The builtin operators always
>
> comes
>
> as
>
> the
>
> first class objects and the others are used from the order
>
> they
>
> appears.
>
> Maybe you can take a reference.
>
> [1]
>
>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>
> ,写道:
>
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's
>
> FunctionCatalog.
>
> It's critically helpful to improve function usability in
>
> SQL.
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with
>
> fully/partially
>
> qualified name
> - redefines function resolution order for ambiguous
>
> function
>
> reference
>
> - adds support for Hive's rich built-in functions (support
>
> for
>
> Hive
>
> user
>
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Yeah, sorry I prematurely concluded the discussions here, thinking they
were not converging. However, I did feel we needed to do more research and
restart with individual topics.

Please continue voicing your comments/suggestions while we revise and
clarify our proposal.

Thanks,
Xuefu

On Thu, Sep 5, 2019 at 12:04 PM Bowen Li <bo...@gmail.com> wrote:

> Maybe Xuefu missed my email. Please let me know what your thoughts are on
> the summary, if there's still major controversy, I can take time to
> reevaluate that part.
>
>
> On Wed, Sep 4, 2019 at 2:25 PM Xuefu Z <us...@gmail.com> wrote:
>
> > Thank all for the sharing thoughts. I think we have gathered some useful
> > initial feedback from this long discussion with a couple of focal points
> > sticking out.
> >
> >  We will go back to do more research and adapt our proposal. Once it's
> > ready, we will ask for a new round of review. If there is any
> disagreement,
> > we will start a new discussion thread on each rather than having a mega
> > discussion like this.
> >
> > Thanks to everyone for participating.
> >
> > Regards,
> > Xuefu
> >
> >
> > On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> wrote:
> >
> > > Let me try to summarize and conclude the long thread so far:
> > >
> > > 1. For order of temp function v.s. built-in function:
> > >
> > > I think Dawid's point that temp function should be of fully qualified
> > path
> > > is a better reasoning to back the newly proposed order, and i agree we
> > > don't need to follow Hive/Spark.
> > >
> > > However, I'd rather not change fundamentals of temporary functions in
> > this
> > > FLIP. It belongs to a bigger story of how temporary objects should be
> > > redefined and be handled uniformly - currently temporary tables and
> views
> > > (those registered from TableEnv#registerTable()) behave different than
> > what
> > > Dawid propose for temp functions, and we need a FLIP to just unify
> their
> > > APIs and behaviors.
> > >
> > > I agree that backward compatibility is not an issue w.r.t Jark's
> points.
> > >
> > > ***Seems we do have consensus that it's acceptable to prevent users
> > > registering a temp function in the same name as a built-in function. To
> > > help us move forward, I'd like to propose setting such a restraint on
> > temp
> > > functions in this FLIP to simplify the design and avoid disputes.*** It
> > > will also leave rooms for improvements in the future.
> > >
> > >
> > > 2. For Hive built-in function:
> > >
> > > Thanks Timo for providing the Presto and Postgres examples. I feel
> > modular
> > > built-in functions can be a good fit for the geo and ml example as a
> > native
> > > Flink extension, but not sure if it fits well with external
> integrations.
> > > Anyway, I think modular built-in functions is a bigger story and can be
> > on
> > > its own thread too, and our proposal doesn't prevent Flink from doing
> > that
> > > in the future.
> > >
> > > ***Seems we have consensus that users should be able to use built-in
> > > functions of Hive or other external systems in SQL explicitly and
> > > deterministically regardless of Flink built-in functions and the
> > potential
> > > modular built-in functions, via some new syntax like "mycat::func"? If
> > so,
> > > I'd like to propose removing Hive built-in functions from ambiguous
> > > function resolution order, and empower users with such a syntax. This
> way
> > > we sacrifice a little convenience for certainty***
> > >
> > >
> > > What do you think?
> > >
> > > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think
> they
> > > are
> > > > very inconsistent in that manner (spark being way worse on that).
> > > >
> > > > Hive:
> > > >
> > > > You cannot overwrite all the built-in functions. I could overwrite
> most
> > > of
> > > > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > > >
> > > >
> > > > *    ParseException line 1:29 cannot recognize input near 'array'
> 'AS'
> > *
> > > >
> > > > What is interesting is that I cannot ovewrite *array*, but I can
> > ovewrite
> > > > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > > > overwrite a function. When I drop the temporary function the native
> > > > function is still available.
> > > >
> > > > Spark:
> > > >
> > > > Spark's behavior imho is super bad.
> > > >
> > > > Theoretically I could overwrite all functions. I was able e.g. to
> > > > overwrite CAST function. I had to use though CREATE OR REPLACE
> > TEMPORARY
> > > > FUNCTION syntax. Otherwise I get an exception that a function already
> > > > exists. However when I used the CAST function in a query it used the
> > > > native, built-in one.
> > > >
> > > > When I overwrote current_date() function, it was used in a query, but
> > it
> > > > completely replaces the built-in function and I can no longer use the
> > > > native function in any way. I cannot also drop the temporary
> function.
> > I
> > > > get:
> > > >
> > > > *    Error in query: Cannot drop native function 'current_date';*
> > > >
> > > > Additional note, both systems do not allow creating TEMPORARY
> FUNCTIONS
> > > > with a database. Temporary functions are always represented as a
> single
> > > > name.
> > > >
> > > > In my opinion neither of the systems have consistent behavior.
> > Generally
> > > > speaking I think overwriting any system provided functions is just
> > > > dangerous.
> > > >
> > > > Regarding Jark's concerns. Such functions would be registered in a
> > > current
> > > > catalog/database schema, so a user could still use its own function,
> > but
> > > > would have to fully qualify the function (because built-in functions
> > take
> > > > precedence). Moreover users would have the same problem with
> permanent
> > > > functions. Imagine a user have a permanent function 'cat.db.explode'.
> > In
> > > > 1.9 the user could use just the 'explode' function as long as the
> > 'cat' &
> > > > 'db' were the default catalog & database. If we introduce 'explode'
> > > > built-in function in 1.10, the user has to fully qualify the
> function.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 04/09/2019 15:19, Timo Walther wrote:
> > > >
> > > > Hi all,
> > > >
> > > > thanks for the healthy discussion. It is already a very long
> discussion
> > > > with a lot of text. So I will just post my opinion to a couple of
> > > > statements:
> > > >
> > > > > Hive built-in functions are not part of Flink built-in functions,
> > they
> > > > are catalog functions
> > > >
> > > > That is not entirely true. Correct me if I'm wrong but I think Hive
> > > > built-in functions are also not catalog functions. They are not
> stored
> > in
> > > > every Hive metastore catalog that is freshly created but are a set of
> > > > functions that are listed somewhere and made available.
> > > >
> > > > > ambiguous functions reference just shouldn't be resolved to a
> > different
> > > > catalog
> > > >
> > > > I agree. They should not be resolved to a different catalog. That's
> > why I
> > > > am suggesting to split the concept of built-in functions and catalog
> > > lookup
> > > > semantics.
> > > >
> > > > > I don't know if any other databases handle built-in functions like
> > that
> > > >
> > > > What I called "module" is:
> > > > - Extension in Postgres [1]
> > > > - Plugin in Presto [2]
> > > >
> > > > Btw. Presto even mentions example modules that are similar to the
> ones
> > > > that we will introduce in the near future both for ML and System XYZ
> > > > compatibility:
> > > > "See either the presto-ml module for machine learning functions or
> the
> > > > presto-teradata-functions module for Teradata-compatible functions,
> > both
> > > in
> > > > the root of the Presto source."
> > > >
> > > > > functions should be either built-in already or just libraries
> > > functions,
> > > > and library functions can be adapted to catalog APIs or of some other
> > > > syntax to use
> > > >
> > > > Regarding "built-in already", of course we can add a lot of functions
> > as
> > > > built-ins but we will end-up in a dependency hell in the near future
> if
> > > we
> > > > don't introduce a pluggable approach. Library functions is what you
> > also
> > > > suggest but storing them in a catalog means to always fully qualify
> > them
> > > or
> > > > modifying the existing catalog design that was inspired by the
> > standard.
> > > >
> > > > I don't think "it brings in even more complicated scenarios to the
> > > > design", it just does clear separation of concerns. Integrating the
> > > > functionality into the current design makes the catalog API more
> > > > complicated.
> > > >
> > > > > why would users name a temporary function the same as a built-in
> > > > function then?
> > > >
> > > > Because you never know what users do. If they don't, my suggested
> > > > resolution order should not be a problem, right?
> > > >
> > > > > I don't think hive functions deserves be a function module
> > > >
> > > > Our goal is not to create a Hive clone. We need to think forward and
> > Hive
> > > > is just one of many systems that we can support. Not every built-in
> > > > function behaves and will behave exactly like Hive.
> > > >
> > > > > regarding temporary functions, there are few systems that support
> it
> > > >
> > > > IMHO Spark and Hive are not always the best examples for consistent
> > > > design. Systems like Postgres, Presto, or SQL Server should be used
> as
> > a
> > > > reference. I don't think that a user can overwrite a built-in
> function
> > > > there.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > > > [2] https://prestodb.github.io/docs/current/develop/functions.html
> > > >
> > > >
> > > > On 04.09.19 13:44, Jark Wu wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Regarding #1 temp function <> built-in function and naming.
> > > > I'm fine with temp functions should precede built-in function and can
> > > > override built-in functions (we already support to override built-in
> > > > function in 1.9).
> > > > If we don't allow the same name as a built-in function, I'm afraid we
> > > will
> > > > have compatibility issues in the future.
> > > > Say users register a user defined function named "explode" in 1.9,
> and
> > we
> > > > support a built-in "explode" function in 1.10.
> > > > Then the user's jobs which call the registered "explode" function in
> > 1.9
> > > > will all fail in 1.10 because of naming conflict.
> > > >
> > > > Regarding #2 "External" built-in functions.
> > > > I think if we store external built-in functions in catalog, then
> > > > "hive1::sqrt" is a good way to go.
> > > > However, I would prefer to support a discovery mechanism (e.g. SPI)
> for
> > > > built-in functions as Timo suggested above.
> > > > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > > > function set as built-in functions in an easy way.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
> > > > <us...@gmail.com> wrote:
> > > >
> > > > Hi David,
> > > >
> > > > Thank you for sharing your findings. It seems to me that there is no
> > SQL
> > > > standard regarding temporary functions. There are few systems that
> > > support
> > > > it. Here are what I have found:
> > > >
> > > > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > > > 2. Spark: basically follows Hive (
> > > >
> > > >
> > > >
> > >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > > > )
> > > > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
> > overwriting
> > > > behavior. (
> > > >
> > http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> > > )
> > > >
> > > > Because of lack of standard, it's perfectly fine for Flink to define
> > > > whatever it sees appropriate. Thus, your proposal (no overwriting and
> > > must
> > > > have DB as holder) is one option. The advantage is simplicity, The
> > > > downside
> > > > is the deviation from Hive, which is popular and de facto standard in
> > big
> > > > data world.
> > > >
> > > > However, I don't think we have to follow Hive. More importantly, we
> > need
> > > a
> > > > consensus. I have no objection if your proposal is generally agreed
> > upon.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org
> > > >
> > > > <dw...@apache.org>
> > > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Just an opinion on the built-in <> temporary functions resolution and
> > > > NAMING issue. I think we should not allow overriding the built-in
> > > > functions, as this may pose serious issues and to be honest is rather
> > > > not feasible and would require major rework. What happens if a user
> > > > wants to override CAST? Calls to that function are generated at
> > > > different layers of the stack that unfortunately does not always go
> > > > through the Catalog API (at least yet). Moreover from what I've
> checked
> > > > no other systems allow overriding the built-in functions. All the
> > > > systems I've checked so far register temporary functions in a
> > > > database/schema (either special database for temporary functions, or
> > > > just current database). What I would suggest is to always register
> > > > temporary functions with a 3 part identifier. The same way as tables,
> > > > views etc. This effectively means you cannot override built-in
> > > > functions. With such approach it is natural that the temporary
> > functions
> > > > end up a step lower in the resolution order:
> > > >
> > > > 1. built-in functions (1 part, maybe 2? - this is still under
> > discussion)
> > > >
> > > > 2. temporary functions (always 3 part path)
> > > >
> > > > 3. catalog functions (always 3 part path)
> > > >
> > > > Let me know what do you think.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > >
> > > > On 04/09/2019 06:13, Bowen Li wrote:
> > > >
> > > > Hi,
> > > >
> > > > I agree with Xuefu that the main controversial points are mainly the
> > > >
> > > > two
> > > >
> > > > places. My thoughts on them:
> > > >
> > > > 1) Determinism of referencing Hive built-in functions. We can either
> > > >
> > > > remove
> > > >
> > > > Hive built-in functions from ambiguous function resolution and
> require
> > > > users to use special syntax for their qualified names, or add a
> config
> > > >
> > > > flag
> > > >
> > > > to catalog constructor/yaml for turning on and off Hive built-in
> > > >
> > > > functions
> > > >
> > > > with the flag set to 'false' by default and proper doc added to help
> > > >
> > > > users
> > > >
> > > > make their decisions.
> > > >
> > > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > > >
> > > > function
> > > >
> > > > resolution order. We believe Flink temp functions should precede
> Flink
> > > > built-in functions, and I have presented my reasons. Just in case if
> we
> > > > cannot reach an agreement, I propose forbid users registering temp
> > > > functions in the same name as a built-in function, like MySQL's
> > > >
> > > > approach,
> > > >
> > > > for the moment. It won't have any performance concern, since built-in
> > > > functions are all in memory and thus cost of a name check will be
> > > >
> > > > really
> > > >
> > > > trivial.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
> > > > <us...@gmail.com> wrote:
> > > >
> > > >  From what I have seen, there are a couple of focal disagreements:
> > > >
> > > > 1. Resolution order: temp function --> flink built-in function -->
> > > >
> > > > catalog
> > > >
> > > > function vs flink built-in function --> temp function -> catalog
> > > >
> > > > function.
> > > >
> > > > 2. "External" built-in functions: how to treat built-in functions in
> > > > external system and how users reference them
> > > >
> > > > For #1, I agree with Bowen that temp function needs to be at the
> > > >
> > > > highest
> > > >
> > > > priority because that's how a user might overwrite a built-in
> function
> > > > without referencing a persistent, overwriting catalog function with a
> > > >
> > > > fully
> > > >
> > > > qualified name. Putting built-in functions at the highest priority
> > > > eliminates that usage.
> > > >
> > > > For #2, I saw a general agreement on referencing "external" built-in
> > > > functions such as those in Hive needs to be explicit and
> deterministic
> > > >
> > > > even
> > > >
> > > > though different approaches are proposed. To limit the scope and
> > > >
> > > > simply
> > > >
> > > > the
> > > >
> > > > usage, it seems making sense to me to introduce special syntax for
> > > >
> > > > user  to
> > > >
> > > > explicitly reference an external built-in function such as
> hive1::sqrt
> > > >
> > > > or
> > > >
> > > > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog
> API
> > > >
> > > > call
> > > >
> > > > hive1.getFunction(ObjectPath functionName) where the database name is
> > > > absent for bulit-in functions available in that catalog hive1. I
> > > >
> > > > understand
> > > >
> > > > that Bowen's original proposal was trying to avoid this, but this
> > > >
> > > > could
> > > >
> > > > turn out to be a clean and simple solution.
> > > >
> > > > (Timo's modular approach is great way to "expand" Flink's built-in
> > > >
> > > > function
> > > >
> > > > set, which seems orthogonal and complementary to this, which could be
> > > > tackled in further future work.)
> > > >
> > > > I'd be happy to hear further thoughts on the two points.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
> > > > <yk...@gmail.com> wrote:
> > > >
> > > > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> > > >
> > > > the
> > > >
> > > > same
> > > > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > > > suggestion.
> > > >
> > > > The reason is backward compatibility. If we follow Bowen's approach,
> > > >
> > > > let's
> > > >
> > > > say we
> > > > first find function in Flink's built-in functions, and then hive's
> > > > built-in. For example, `foo`
> > > > is not supported by Flink, but hive has such built-in function. So
> > > >
> > > > user
> > > >
> > > > will have hive's
> > > > behavior for function `foo`. And in next release, Flink realize this
> > > >
> > > > is a
> > > >
> > > > very popular function
> > > > and add it into Flink's built-in functions, but with different
> > > >
> > > > behavior
> > > >
> > > > as
> > > >
> > > > hive's. So in next
> > > > release, the behavior changes.
> > > >
> > > > With Timo's approach, IIUC user have to tell the framework explicitly
> > > >
> > > > what
> > > >
> > > > kind of
> > > > built-in functions he would like to use. He can just tell framework
> > > >
> > > > to
> > > >
> > > > abandon Flink's built-in
> > > > functions, and use hive's instead. User can only choose between them,
> > > >
> > > > but
> > > >
> > > > not use
> > > > them at the same time. I think this approach is more predictable.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
> > > > <bo...@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > > >
> > > > section
> > > >
> > > > in the google doc was updated, please take a look first and let me
> > > >
> > > > know
> > > >
> > > > if
> > > >
> > > > you have more questions.
> > > >
> > > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> > > > <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Timo,
> > > >
> > > > Re> 1) We should not have the restriction "hive built-in functions
> > > >
> > > > can
> > > >
> > > > only
> > > >
> > > > be used when current catalog is hive catalog". Switching a catalog
> > > > should only have implications on the cat.db.object resolution but
> > > >
> > > > not
> > > >
> > > > functions. It would be quite convinient for users to use Hive
> > > >
> > > > built-ins
> > > >
> > > > even if they use a Confluent schema registry or just the in-memory
> > > >
> > > > catalog.
> > > >
> > > > There might be a misunderstanding here.
> > > >
> > > > First of all, Hive built-in functions are not part of Flink
> > > >
> > > > built-in
> > > >
> > > > functions, they are catalog functions, thus if the current catalog
> > > >
> > > > is
> > > >
> > > > not a
> > > >
> > > > HiveCatalog but, say, a schema registry catalog, ambiguous
> > > >
> > > > functions
> > > >
> > > > reference just shouldn't be resolved to a different catalog.
> > > >
> > > > Second, Hive built-in functions can potentially be referenced
> > > >
> > > > across
> > > >
> > > > catalog, but it doesn't have db namespace and we currently just
> > > >
> > > > don't
> > > >
> > > > have
> > > >
> > > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > > >
> > > > defined,
> > > >
> > > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > > >
> > > > 2) I would propose to have separate concepts for catalog and
> > > >
> > > > built-in
> > > >
> > > > functions. In particular it would be nice to modularize built-in
> > > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > > >
> > > > maybe
> > > >
> > > > we add more experimental functions in the future or function for
> > > >
> > > > some
> > > >
> > > > special application area (Geo functions, ML functions). A data
> > > >
> > > > platform
> > > >
> > > > team might not want to make every built-in function available. Or a
> > > > function module like ML functions is in a different Maven module.
> > > >
> > > > I think this is orthogonal to this FLIP, especially we don't have
> > > >
> > > > the
> > > >
> > > > "external built-in functions" anymore and currently the built-in
> > > >
> > > > function
> > > >
> > > > category remains untouched.
> > > >
> > > > But just to share some thoughts on the proposal, I'm not sure about
> > > >
> > > > it:
> > > >
> > > > - I don't know if any other databases handle built-in functions
> > > >
> > > > like
> > > >
> > > > that.
> > > >
> > > > Maybe you can give some examples? IMHO, built-in functions are
> > > >
> > > > system
> > > >
> > > > info
> > > >
> > > > and should be deterministic, not depending on loaded libraries. Geo
> > > > functions should be either built-in already or just libraries
> > > >
> > > > functions,
> > > >
> > > > and library functions can be adapted to catalog APIs or of some
> > > >
> > > > other
> > > >
> > > > syntax to use
> > > > - I don't know if all use cases stand, and many can be achieved by
> > > >
> > > > other
> > > >
> > > > approaches too. E.g. experimental functions can be taken good care
> > > >
> > > > of
> > > >
> > > > by
> > > >
> > > > documentations, annotations, etc
> > > > - the proposal basically introduces some concept like a pluggable
> > > >
> > > > built-in
> > > >
> > > > function catalog, despite the already existing catalog APIs
> > > > - it brings in even more complicated scenarios to the design. E.g.
> > > >
> > > > how
> > > >
> > > > do
> > > >
> > > > you handle built-in functions in different modules but different
> > > >
> > > > names?
> > > >
> > > > In short, I'm not sure if it really stands and it looks like an
> > > >
> > > > overkill
> > > >
> > > > to me. I'd rather not go to that route. Related discussion can be
> > > >
> > > > on
> > > >
> > > > its
> > > >
> > > > own thread.
> > > >
> > > > 3) Following the suggestion above, we can have a separate discovery
> > > > mechanism for built-in functions. Instead of just going through a
> > > >
> > > > static
> > > >
> > > > list like in BuiltInFunctionDefinitions, a platform team should be
> > > >
> > > > able
> > > >
> > > > to select function modules like
> > > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > > HiveFunctions) or via service discovery;
> > > >
> > > > Same as above. I'll leave it to its own thread.
> > > >
> > > > re > 3) Dawid and I discussed the resulution order again. I agree
> > > >
> > > > with
> > > >
> > > > Kurt
> > > >
> > > > that we should unify built-in function (external or internal)
> > > >
> > > > under a
> > > >
> > > > common layer. However, the resolution order should be:
> > > >    1. built-in functions
> > > >    2. temporary functions
> > > >    3. regular catalog resolution logic
> > > > Otherwise a temporary function could cause clashes with Flink's
> > > >
> > > > built-in
> > > >
> > > > functions. If you take a look at other vendors, like SQL Server
> > > >
> > > > they
> > > >
> > > > also do not allow to overwrite built-in functions.
> > > >
> > > > ”I agree with Kurt that we should unify built-in function (external
> > > >
> > > > or
> > > >
> > > > internal) under a common layer.“ <- I don't think this is what Kurt
> > > >
> > > > means.
> > > >
> > > > Kurt and I are in favor of unifying built-in functions of external
> > > >
> > > > systems
> > > >
> > > > and catalog functions. Did you type a mistake?
> > > >
> > > > Besides, I'm not sure about the resolution order you proposed.
> > > >
> > > > Temporary
> > > >
> > > > functions have a lifespan over a session and are only visible to
> > > >
> > > > the
> > > >
> > > > session owner, they are unique to each user, and users create them
> > > >
> > > > on
> > > >
> > > > purpose to be the highest priority in order to overwrite system
> > > >
> > > > info
> > > >
> > > > (built-in functions in this case).
> > > >
> > > > In your case, why would users name a temporary function the same
> > > >
> > > > as a
> > > >
> > > > built-in function then? Since using that name in ambiguous function
> > > > reference will always be resolved to built-in functions, creating a
> > > > same-named temp function would be meaningless in the end.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> > > > <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Jingsong,
> > > >
> > > > Re> 1.Hive built-in functions is an intermediate solution. So we
> > > >
> > > > should
> > > >
> > > > not introduce interfaces to influence the framework. To make
> > > > Flink itself more powerful, we should implement the functions
> > > > we need to add.
> > > >
> > > > Yes, please see the doc.
> > > >
> > > > Re> 2.Non-flink built-in functions are easy for users to change
> > > >
> > > > their
> > > >
> > > > behavior. If we support some flink built-in functions in the
> > > > future but act differently from non-flink built-in, this will
> > > >
> > > > lead
> > > >
> > > > to
> > > >
> > > > changes in user behavior.
> > > >
> > > > There's no such concept as "external built-in functions" any more.
> > > > Built-in functions of external systems will be treated as special
> > > >
> > > > catalog
> > > >
> > > > functions.
> > > >
> > > > Re> Another question is, does this fallback include all
> > > >
> > > > hive built-in functions? As far as I know, some hive functions
> > > > have some hacky. If possible, can we start with a white list?
> > > > Once we implement some functions to flink built-in, we can
> > > > also update the whitelist.
> > > >
> > > > Yes, that's something we thought of too. I don't think it's super
> > > > critical to the scope of this FLIP, thus I'd like to leave it to
> > > >
> > > > future
> > > >
> > > > efforts as a nice-to-have feature.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> > > > <bo...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > > Hi Kurt,
> > > >
> > > > Re: > What I want to propose is we can merge #3 and #4, make them
> > > >
> > > > both
> > > >
> > > > under
> > > >
> > > > "catalog" concept, by extending catalog function to make it have
> > > >
> > > > ability to
> > > >
> > > > have built-in catalog functions. Some benefits I can see from
> > > >
> > > > this
> > > >
> > > > approach:
> > > >
> > > > 1. We don't have to introduce new concept like external built-in
> > > >
> > > > functions.
> > > >
> > > > Actually I don't see a full story about how to treat a built-in
> > > >
> > > > functions, and it
> > > >
> > > > seems a little bit disrupt with catalog. As a result, you have
> > > >
> > > > to
> > > >
> > > > make
> > > >
> > > > some restriction
> > > >
> > > > like "hive built-in functions can only be used when current
> > > >
> > > > catalog
> > > >
> > > > is
> > > >
> > > > hive catalog".
> > > >
> > > > Yes, I've unified #3 and #4 but it seems I didn't update some
> > > >
> > > > part
> > > >
> > > > of
> > > >
> > > > the doc. I've modified those sections, and they are up to date
> > > >
> > > > now.
> > > >
> > > > In short, now built-in function of external systems are defined
> > > >
> > > > as
> > > >
> > > > a
> > > >
> > > > special kind of catalog function in Flink, and handled by Flink
> > > >
> > > > as
> > > >
> > > > following:
> > > > - An external built-in function must be associated with a catalog
> > > >
> > > > for
> > > >
> > > > the purpose of decoupling flink-table and external systems.
> > > > - It always resides in front of catalog functions in ambiguous
> > > >
> > > > function
> > > >
> > > > reference order, just like in its own external system
> > > > - It is a special catalog function that doesn’t have a
> > > >
> > > > schema/database
> > > >
> > > > namespace
> > > > - It goes thru the same instantiation logic as other user defined
> > > > catalog functions in the external system
> > > >
> > > > Please take another look at the doc, and let me know if you have
> > > >
> > > > more
> > > >
> > > > questions.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > > > <tw...@apache.org>
> > > >
> > > > wrote:
> > > >
> > > > Hi Kurt,
> > > >
> > > > it should not affect the functions and operations we currently
> > > >
> > > > have
> > > >
> > > > in
> > > >
> > > > SQL. It just categorizes the available built-in functions. It is
> > > >
> > > > kind
> > > >
> > > > of
> > > > an orthogonal concept to the catalog API but built-in functions
> > > >
> > > > deserve
> > > >
> > > > this special kind of treatment. CatalogFunction still fits
> > > >
> > > > perfectly
> > > >
> > > > in
> > > >
> > > > there because the regular catalog object resolution logic is not
> > > > affected. So tables and functions are resolved in the same way
> > > >
> > > > but
> > > >
> > > > with
> > > >
> > > > built-in functions that have priority as in the original design.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 03.09.19 15:26, Kurt Young wrote:
> > > >
> > > > Does this only affect the functions and operations we currently
> > > >
> > > > have
> > > >
> > > > in SQL
> > > >
> > > > and
> > > > have no effect on tables, right? Looks like this is an
> > > >
> > > > orthogonal
> > > >
> > > > concept
> > > >
> > > > with Catalog?
> > > > If the answer are both yes, then the catalog function will be a
> > > >
> > > > weird
> > > >
> > > > concept?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> > > >
> > > > yuzhao.cyz@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > The way you proposed are basically the same as what Calcite
> > > >
> > > > does, I
> > > >
> > > > think
> > > >
> > > > we are in the same line.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> > > >
> > > > ,写道:
> > > >
> > > > This sounds exactly as the module approach I mentioned, no?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > On 03.09.19 13:42, Danny Chan wrote:
> > > >
> > > > Thanks Bowen for bring up this topic, I think it’s a useful
> > > >
> > > > refactoring to make our function usage more user friendly.
> > > >
> > > > For the topic of how to organize the builtin operators and
> > > >
> > > > operators
> > > >
> > > > of Hive, here is a solution from Apache Calcite, the Calcite
> > > >
> > > > way
> > > >
> > > > is
> > > >
> > > > to make
> > > >
> > > > every dialect operators a “Library”, user can specify which
> > > >
> > > > libraries they
> > > >
> > > > want to use for a sql query. The builtin operators always
> > > >
> > > > comes
> > > >
> > > > as
> > > >
> > > > the
> > > >
> > > > first class objects and the others are used from the order
> > > >
> > > > they
> > > >
> > > > appears.
> > > >
> > > > Maybe you can take a reference.
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> > > >
> > > > ,写道:
> > > >
> > > > Hi folks,
> > > >
> > > > I'd like to kick off a discussion on reworking Flink's
> > > >
> > > > FunctionCatalog.
> > > >
> > > > It's critically helpful to improve function usability in
> > > >
> > > > SQL.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > > >
> > > > In short, it:
> > > > - adds support for precise function reference with
> > > >
> > > > fully/partially
> > > >
> > > > qualified name
> > > > - redefines function resolution order for ambiguous
> > > >
> > > > function
> > > >
> > > > reference
> > > >
> > > > - adds support for Hive's rich built-in functions (support
> > > >
> > > > for
> > > >
> > > > Hive
> > > >
> > > > user
> > > >
> > > > defined functions was already added in 1.9.0)
> > > > - clarifies the concept of temporary functions
> > > >
> > > > Would love to hear your thoughts.
> > > >
> > > > Bowen
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > > >
> > > >
> > >
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Maybe Xuefu missed my email. Please let me know what your thoughts are on
the summary, if there's still major controversy, I can take time to
reevaluate that part.


On Wed, Sep 4, 2019 at 2:25 PM Xuefu Z <us...@gmail.com> wrote:

> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> wrote:
>
> > Let me try to summarize and conclude the long thread so far:
> >
> > 1. For order of temp function v.s. built-in function:
> >
> > I think Dawid's point that temp function should be of fully qualified
> path
> > is a better reasoning to back the newly proposed order, and i agree we
> > don't need to follow Hive/Spark.
> >
> > However, I'd rather not change fundamentals of temporary functions in
> this
> > FLIP. It belongs to a bigger story of how temporary objects should be
> > redefined and be handled uniformly - currently temporary tables and views
> > (those registered from TableEnv#registerTable()) behave different than
> what
> > Dawid propose for temp functions, and we need a FLIP to just unify their
> > APIs and behaviors.
> >
> > I agree that backward compatibility is not an issue w.r.t Jark's points.
> >
> > ***Seems we do have consensus that it's acceptable to prevent users
> > registering a temp function in the same name as a built-in function. To
> > help us move forward, I'd like to propose setting such a restraint on
> temp
> > functions in this FLIP to simplify the design and avoid disputes.*** It
> > will also leave rooms for improvements in the future.
> >
> >
> > 2. For Hive built-in function:
> >
> > Thanks Timo for providing the Presto and Postgres examples. I feel
> modular
> > built-in functions can be a good fit for the geo and ml example as a
> native
> > Flink extension, but not sure if it fits well with external integrations.
> > Anyway, I think modular built-in functions is a bigger story and can be
> on
> > its own thread too, and our proposal doesn't prevent Flink from doing
> that
> > in the future.
> >
> > ***Seems we have consensus that users should be able to use built-in
> > functions of Hive or other external systems in SQL explicitly and
> > deterministically regardless of Flink built-in functions and the
> potential
> > modular built-in functions, via some new syntax like "mycat::func"? If
> so,
> > I'd like to propose removing Hive built-in functions from ambiguous
> > function resolution order, and empower users with such a syntax. This way
> > we sacrifice a little convenience for certainty***
> >
> >
> > What do you think?
> >
> > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> > are
> > > very inconsistent in that manner (spark being way worse on that).
> > >
> > > Hive:
> > >
> > > You cannot overwrite all the built-in functions. I could overwrite most
> > of
> > > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > >
> > >
> > > *    ParseException line 1:29 cannot recognize input near 'array' 'AS'
> *
> > >
> > > What is interesting is that I cannot ovewrite *array*, but I can
> ovewrite
> > > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > > overwrite a function. When I drop the temporary function the native
> > > function is still available.
> > >
> > > Spark:
> > >
> > > Spark's behavior imho is super bad.
> > >
> > > Theoretically I could overwrite all functions. I was able e.g. to
> > > overwrite CAST function. I had to use though CREATE OR REPLACE
> TEMPORARY
> > > FUNCTION syntax. Otherwise I get an exception that a function already
> > > exists. However when I used the CAST function in a query it used the
> > > native, built-in one.
> > >
> > > When I overwrote current_date() function, it was used in a query, but
> it
> > > completely replaces the built-in function and I can no longer use the
> > > native function in any way. I cannot also drop the temporary function.
> I
> > > get:
> > >
> > > *    Error in query: Cannot drop native function 'current_date';*
> > >
> > > Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> > > with a database. Temporary functions are always represented as a single
> > > name.
> > >
> > > In my opinion neither of the systems have consistent behavior.
> Generally
> > > speaking I think overwriting any system provided functions is just
> > > dangerous.
> > >
> > > Regarding Jark's concerns. Such functions would be registered in a
> > current
> > > catalog/database schema, so a user could still use its own function,
> but
> > > would have to fully qualify the function (because built-in functions
> take
> > > precedence). Moreover users would have the same problem with permanent
> > > functions. Imagine a user have a permanent function 'cat.db.explode'.
> In
> > > 1.9 the user could use just the 'explode' function as long as the
> 'cat' &
> > > 'db' were the default catalog & database. If we introduce 'explode'
> > > built-in function in 1.10, the user has to fully qualify the function.
> > >
> > > Best,
> > >
> > > Dawid
> > > On 04/09/2019 15:19, Timo Walther wrote:
> > >
> > > Hi all,
> > >
> > > thanks for the healthy discussion. It is already a very long discussion
> > > with a lot of text. So I will just post my opinion to a couple of
> > > statements:
> > >
> > > > Hive built-in functions are not part of Flink built-in functions,
> they
> > > are catalog functions
> > >
> > > That is not entirely true. Correct me if I'm wrong but I think Hive
> > > built-in functions are also not catalog functions. They are not stored
> in
> > > every Hive metastore catalog that is freshly created but are a set of
> > > functions that are listed somewhere and made available.
> > >
> > > > ambiguous functions reference just shouldn't be resolved to a
> different
> > > catalog
> > >
> > > I agree. They should not be resolved to a different catalog. That's
> why I
> > > am suggesting to split the concept of built-in functions and catalog
> > lookup
> > > semantics.
> > >
> > > > I don't know if any other databases handle built-in functions like
> that
> > >
> > > What I called "module" is:
> > > - Extension in Postgres [1]
> > > - Plugin in Presto [2]
> > >
> > > Btw. Presto even mentions example modules that are similar to the ones
> > > that we will introduce in the near future both for ML and System XYZ
> > > compatibility:
> > > "See either the presto-ml module for machine learning functions or the
> > > presto-teradata-functions module for Teradata-compatible functions,
> both
> > in
> > > the root of the Presto source."
> > >
> > > > functions should be either built-in already or just libraries
> > functions,
> > > and library functions can be adapted to catalog APIs or of some other
> > > syntax to use
> > >
> > > Regarding "built-in already", of course we can add a lot of functions
> as
> > > built-ins but we will end-up in a dependency hell in the near future if
> > we
> > > don't introduce a pluggable approach. Library functions is what you
> also
> > > suggest but storing them in a catalog means to always fully qualify
> them
> > or
> > > modifying the existing catalog design that was inspired by the
> standard.
> > >
> > > I don't think "it brings in even more complicated scenarios to the
> > > design", it just does clear separation of concerns. Integrating the
> > > functionality into the current design makes the catalog API more
> > > complicated.
> > >
> > > > why would users name a temporary function the same as a built-in
> > > function then?
> > >
> > > Because you never know what users do. If they don't, my suggested
> > > resolution order should not be a problem, right?
> > >
> > > > I don't think hive functions deserves be a function module
> > >
> > > Our goal is not to create a Hive clone. We need to think forward and
> Hive
> > > is just one of many systems that we can support. Not every built-in
> > > function behaves and will behave exactly like Hive.
> > >
> > > > regarding temporary functions, there are few systems that support it
> > >
> > > IMHO Spark and Hive are not always the best examples for consistent
> > > design. Systems like Postgres, Presto, or SQL Server should be used as
> a
> > > reference. I don't think that a user can overwrite a built-in function
> > > there.
> > >
> > > Regards,
> > > Timo
> > >
> > > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > > [2] https://prestodb.github.io/docs/current/develop/functions.html
> > >
> > >
> > > On 04.09.19 13:44, Jark Wu wrote:
> > >
> > > Hi all,
> > >
> > > Regarding #1 temp function <> built-in function and naming.
> > > I'm fine with temp functions should precede built-in function and can
> > > override built-in functions (we already support to override built-in
> > > function in 1.9).
> > > If we don't allow the same name as a built-in function, I'm afraid we
> > will
> > > have compatibility issues in the future.
> > > Say users register a user defined function named "explode" in 1.9, and
> we
> > > support a built-in "explode" function in 1.10.
> > > Then the user's jobs which call the registered "explode" function in
> 1.9
> > > will all fail in 1.10 because of naming conflict.
> > >
> > > Regarding #2 "External" built-in functions.
> > > I think if we store external built-in functions in catalog, then
> > > "hive1::sqrt" is a good way to go.
> > > However, I would prefer to support a discovery mechanism (e.g. SPI) for
> > > built-in functions as Timo suggested above.
> > > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > > function set as built-in functions in an easy way.
> > >
> > > Best,
> > > Jark
> > >
> > > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
> > > <us...@gmail.com> wrote:
> > >
> > > Hi David,
> > >
> > > Thank you for sharing your findings. It seems to me that there is no
> SQL
> > > standard regarding temporary functions. There are few systems that
> > support
> > > it. Here are what I have found:
> > >
> > > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > > 2. Spark: basically follows Hive (
> > >
> > >
> > >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > > )
> > > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
> overwriting
> > > behavior. (
> > >
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> > )
> > >
> > > Because of lack of standard, it's perfectly fine for Flink to define
> > > whatever it sees appropriate. Thus, your proposal (no overwriting and
> > must
> > > have DB as holder) is one option. The advantage is simplicity, The
> > > downside
> > > is the deviation from Hive, which is popular and de facto standard in
> big
> > > data world.
> > >
> > > However, I don't think we have to follow Hive. More importantly, we
> need
> > a
> > > consensus. I have no objection if your proposal is generally agreed
> upon.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <
> dwysakowicz@apache.org
> > >
> > > <dw...@apache.org>
> > > wrote:
> > >
> > > Hi all,
> > >
> > > Just an opinion on the built-in <> temporary functions resolution and
> > > NAMING issue. I think we should not allow overriding the built-in
> > > functions, as this may pose serious issues and to be honest is rather
> > > not feasible and would require major rework. What happens if a user
> > > wants to override CAST? Calls to that function are generated at
> > > different layers of the stack that unfortunately does not always go
> > > through the Catalog API (at least yet). Moreover from what I've checked
> > > no other systems allow overriding the built-in functions. All the
> > > systems I've checked so far register temporary functions in a
> > > database/schema (either special database for temporary functions, or
> > > just current database). What I would suggest is to always register
> > > temporary functions with a 3 part identifier. The same way as tables,
> > > views etc. This effectively means you cannot override built-in
> > > functions. With such approach it is natural that the temporary
> functions
> > > end up a step lower in the resolution order:
> > >
> > > 1. built-in functions (1 part, maybe 2? - this is still under
> discussion)
> > >
> > > 2. temporary functions (always 3 part path)
> > >
> > > 3. catalog functions (always 3 part path)
> > >
> > > Let me know what do you think.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 04/09/2019 06:13, Bowen Li wrote:
> > >
> > > Hi,
> > >
> > > I agree with Xuefu that the main controversial points are mainly the
> > >
> > > two
> > >
> > > places. My thoughts on them:
> > >
> > > 1) Determinism of referencing Hive built-in functions. We can either
> > >
> > > remove
> > >
> > > Hive built-in functions from ambiguous function resolution and require
> > > users to use special syntax for their qualified names, or add a config
> > >
> > > flag
> > >
> > > to catalog constructor/yaml for turning on and off Hive built-in
> > >
> > > functions
> > >
> > > with the flag set to 'false' by default and proper doc added to help
> > >
> > > users
> > >
> > > make their decisions.
> > >
> > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > >
> > > function
> > >
> > > resolution order. We believe Flink temp functions should precede Flink
> > > built-in functions, and I have presented my reasons. Just in case if we
> > > cannot reach an agreement, I propose forbid users registering temp
> > > functions in the same name as a built-in function, like MySQL's
> > >
> > > approach,
> > >
> > > for the moment. It won't have any performance concern, since built-in
> > > functions are all in memory and thus cost of a name check will be
> > >
> > > really
> > >
> > > trivial.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
> > > <us...@gmail.com> wrote:
> > >
> > >  From what I have seen, there are a couple of focal disagreements:
> > >
> > > 1. Resolution order: temp function --> flink built-in function -->
> > >
> > > catalog
> > >
> > > function vs flink built-in function --> temp function -> catalog
> > >
> > > function.
> > >
> > > 2. "External" built-in functions: how to treat built-in functions in
> > > external system and how users reference them
> > >
> > > For #1, I agree with Bowen that temp function needs to be at the
> > >
> > > highest
> > >
> > > priority because that's how a user might overwrite a built-in function
> > > without referencing a persistent, overwriting catalog function with a
> > >
> > > fully
> > >
> > > qualified name. Putting built-in functions at the highest priority
> > > eliminates that usage.
> > >
> > > For #2, I saw a general agreement on referencing "external" built-in
> > > functions such as those in Hive needs to be explicit and deterministic
> > >
> > > even
> > >
> > > though different approaches are proposed. To limit the scope and
> > >
> > > simply
> > >
> > > the
> > >
> > > usage, it seems making sense to me to introduce special syntax for
> > >
> > > user  to
> > >
> > > explicitly reference an external built-in function such as hive1::sqrt
> > >
> > > or
> > >
> > > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> > >
> > > call
> > >
> > > hive1.getFunction(ObjectPath functionName) where the database name is
> > > absent for bulit-in functions available in that catalog hive1. I
> > >
> > > understand
> > >
> > > that Bowen's original proposal was trying to avoid this, but this
> > >
> > > could
> > >
> > > turn out to be a clean and simple solution.
> > >
> > > (Timo's modular approach is great way to "expand" Flink's built-in
> > >
> > > function
> > >
> > > set, which seems orthogonal and complementary to this, which could be
> > > tackled in further future work.)
> > >
> > > I'd be happy to hear further thoughts on the two points.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
> > > <yk...@gmail.com> wrote:
> > >
> > > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> > >
> > > the
> > >
> > > same
> > > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > > suggestion.
> > >
> > > The reason is backward compatibility. If we follow Bowen's approach,
> > >
> > > let's
> > >
> > > say we
> > > first find function in Flink's built-in functions, and then hive's
> > > built-in. For example, `foo`
> > > is not supported by Flink, but hive has such built-in function. So
> > >
> > > user
> > >
> > > will have hive's
> > > behavior for function `foo`. And in next release, Flink realize this
> > >
> > > is a
> > >
> > > very popular function
> > > and add it into Flink's built-in functions, but with different
> > >
> > > behavior
> > >
> > > as
> > >
> > > hive's. So in next
> > > release, the behavior changes.
> > >
> > > With Timo's approach, IIUC user have to tell the framework explicitly
> > >
> > > what
> > >
> > > kind of
> > > built-in functions he would like to use. He can just tell framework
> > >
> > > to
> > >
> > > abandon Flink's built-in
> > > functions, and use hive's instead. User can only choose between them,
> > >
> > > but
> > >
> > > not use
> > > them at the same time. I think this approach is more predictable.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
> > > <bo...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > >
> > > section
> > >
> > > in the google doc was updated, please take a look first and let me
> > >
> > > know
> > >
> > > if
> > >
> > > you have more questions.
> > >
> > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> > > <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Timo,
> > >
> > > Re> 1) We should not have the restriction "hive built-in functions
> > >
> > > can
> > >
> > > only
> > >
> > > be used when current catalog is hive catalog". Switching a catalog
> > > should only have implications on the cat.db.object resolution but
> > >
> > > not
> > >
> > > functions. It would be quite convinient for users to use Hive
> > >
> > > built-ins
> > >
> > > even if they use a Confluent schema registry or just the in-memory
> > >
> > > catalog.
> > >
> > > There might be a misunderstanding here.
> > >
> > > First of all, Hive built-in functions are not part of Flink
> > >
> > > built-in
> > >
> > > functions, they are catalog functions, thus if the current catalog
> > >
> > > is
> > >
> > > not a
> > >
> > > HiveCatalog but, say, a schema registry catalog, ambiguous
> > >
> > > functions
> > >
> > > reference just shouldn't be resolved to a different catalog.
> > >
> > > Second, Hive built-in functions can potentially be referenced
> > >
> > > across
> > >
> > > catalog, but it doesn't have db namespace and we currently just
> > >
> > > don't
> > >
> > > have
> > >
> > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > >
> > > defined,
> > >
> > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > >
> > > 2) I would propose to have separate concepts for catalog and
> > >
> > > built-in
> > >
> > > functions. In particular it would be nice to modularize built-in
> > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > >
> > > maybe
> > >
> > > we add more experimental functions in the future or function for
> > >
> > > some
> > >
> > > special application area (Geo functions, ML functions). A data
> > >
> > > platform
> > >
> > > team might not want to make every built-in function available. Or a
> > > function module like ML functions is in a different Maven module.
> > >
> > > I think this is orthogonal to this FLIP, especially we don't have
> > >
> > > the
> > >
> > > "external built-in functions" anymore and currently the built-in
> > >
> > > function
> > >
> > > category remains untouched.
> > >
> > > But just to share some thoughts on the proposal, I'm not sure about
> > >
> > > it:
> > >
> > > - I don't know if any other databases handle built-in functions
> > >
> > > like
> > >
> > > that.
> > >
> > > Maybe you can give some examples? IMHO, built-in functions are
> > >
> > > system
> > >
> > > info
> > >
> > > and should be deterministic, not depending on loaded libraries. Geo
> > > functions should be either built-in already or just libraries
> > >
> > > functions,
> > >
> > > and library functions can be adapted to catalog APIs or of some
> > >
> > > other
> > >
> > > syntax to use
> > > - I don't know if all use cases stand, and many can be achieved by
> > >
> > > other
> > >
> > > approaches too. E.g. experimental functions can be taken good care
> > >
> > > of
> > >
> > > by
> > >
> > > documentations, annotations, etc
> > > - the proposal basically introduces some concept like a pluggable
> > >
> > > built-in
> > >
> > > function catalog, despite the already existing catalog APIs
> > > - it brings in even more complicated scenarios to the design. E.g.
> > >
> > > how
> > >
> > > do
> > >
> > > you handle built-in functions in different modules but different
> > >
> > > names?
> > >
> > > In short, I'm not sure if it really stands and it looks like an
> > >
> > > overkill
> > >
> > > to me. I'd rather not go to that route. Related discussion can be
> > >
> > > on
> > >
> > > its
> > >
> > > own thread.
> > >
> > > 3) Following the suggestion above, we can have a separate discovery
> > > mechanism for built-in functions. Instead of just going through a
> > >
> > > static
> > >
> > > list like in BuiltInFunctionDefinitions, a platform team should be
> > >
> > > able
> > >
> > > to select function modules like
> > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > HiveFunctions) or via service discovery;
> > >
> > > Same as above. I'll leave it to its own thread.
> > >
> > > re > 3) Dawid and I discussed the resulution order again. I agree
> > >
> > > with
> > >
> > > Kurt
> > >
> > > that we should unify built-in function (external or internal)
> > >
> > > under a
> > >
> > > common layer. However, the resolution order should be:
> > >    1. built-in functions
> > >    2. temporary functions
> > >    3. regular catalog resolution logic
> > > Otherwise a temporary function could cause clashes with Flink's
> > >
> > > built-in
> > >
> > > functions. If you take a look at other vendors, like SQL Server
> > >
> > > they
> > >
> > > also do not allow to overwrite built-in functions.
> > >
> > > ”I agree with Kurt that we should unify built-in function (external
> > >
> > > or
> > >
> > > internal) under a common layer.“ <- I don't think this is what Kurt
> > >
> > > means.
> > >
> > > Kurt and I are in favor of unifying built-in functions of external
> > >
> > > systems
> > >
> > > and catalog functions. Did you type a mistake?
> > >
> > > Besides, I'm not sure about the resolution order you proposed.
> > >
> > > Temporary
> > >
> > > functions have a lifespan over a session and are only visible to
> > >
> > > the
> > >
> > > session owner, they are unique to each user, and users create them
> > >
> > > on
> > >
> > > purpose to be the highest priority in order to overwrite system
> > >
> > > info
> > >
> > > (built-in functions in this case).
> > >
> > > In your case, why would users name a temporary function the same
> > >
> > > as a
> > >
> > > built-in function then? Since using that name in ambiguous function
> > > reference will always be resolved to built-in functions, creating a
> > > same-named temp function would be meaningless in the end.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> > > <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Jingsong,
> > >
> > > Re> 1.Hive built-in functions is an intermediate solution. So we
> > >
> > > should
> > >
> > > not introduce interfaces to influence the framework. To make
> > > Flink itself more powerful, we should implement the functions
> > > we need to add.
> > >
> > > Yes, please see the doc.
> > >
> > > Re> 2.Non-flink built-in functions are easy for users to change
> > >
> > > their
> > >
> > > behavior. If we support some flink built-in functions in the
> > > future but act differently from non-flink built-in, this will
> > >
> > > lead
> > >
> > > to
> > >
> > > changes in user behavior.
> > >
> > > There's no such concept as "external built-in functions" any more.
> > > Built-in functions of external systems will be treated as special
> > >
> > > catalog
> > >
> > > functions.
> > >
> > > Re> Another question is, does this fallback include all
> > >
> > > hive built-in functions? As far as I know, some hive functions
> > > have some hacky. If possible, can we start with a white list?
> > > Once we implement some functions to flink built-in, we can
> > > also update the whitelist.
> > >
> > > Yes, that's something we thought of too. I don't think it's super
> > > critical to the scope of this FLIP, thus I'd like to leave it to
> > >
> > > future
> > >
> > > efforts as a nice-to-have feature.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> > > <bo...@gmail.com>
> > >
> > > wrote:
> > >
> > > Hi Kurt,
> > >
> > > Re: > What I want to propose is we can merge #3 and #4, make them
> > >
> > > both
> > >
> > > under
> > >
> > > "catalog" concept, by extending catalog function to make it have
> > >
> > > ability to
> > >
> > > have built-in catalog functions. Some benefits I can see from
> > >
> > > this
> > >
> > > approach:
> > >
> > > 1. We don't have to introduce new concept like external built-in
> > >
> > > functions.
> > >
> > > Actually I don't see a full story about how to treat a built-in
> > >
> > > functions, and it
> > >
> > > seems a little bit disrupt with catalog. As a result, you have
> > >
> > > to
> > >
> > > make
> > >
> > > some restriction
> > >
> > > like "hive built-in functions can only be used when current
> > >
> > > catalog
> > >
> > > is
> > >
> > > hive catalog".
> > >
> > > Yes, I've unified #3 and #4 but it seems I didn't update some
> > >
> > > part
> > >
> > > of
> > >
> > > the doc. I've modified those sections, and they are up to date
> > >
> > > now.
> > >
> > > In short, now built-in function of external systems are defined
> > >
> > > as
> > >
> > > a
> > >
> > > special kind of catalog function in Flink, and handled by Flink
> > >
> > > as
> > >
> > > following:
> > > - An external built-in function must be associated with a catalog
> > >
> > > for
> > >
> > > the purpose of decoupling flink-table and external systems.
> > > - It always resides in front of catalog functions in ambiguous
> > >
> > > function
> > >
> > > reference order, just like in its own external system
> > > - It is a special catalog function that doesn’t have a
> > >
> > > schema/database
> > >
> > > namespace
> > > - It goes thru the same instantiation logic as other user defined
> > > catalog functions in the external system
> > >
> > > Please take another look at the doc, and let me know if you have
> > >
> > > more
> > >
> > > questions.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > > <tw...@apache.org>
> > >
> > > wrote:
> > >
> > > Hi Kurt,
> > >
> > > it should not affect the functions and operations we currently
> > >
> > > have
> > >
> > > in
> > >
> > > SQL. It just categorizes the available built-in functions. It is
> > >
> > > kind
> > >
> > > of
> > > an orthogonal concept to the catalog API but built-in functions
> > >
> > > deserve
> > >
> > > this special kind of treatment. CatalogFunction still fits
> > >
> > > perfectly
> > >
> > > in
> > >
> > > there because the regular catalog object resolution logic is not
> > > affected. So tables and functions are resolved in the same way
> > >
> > > but
> > >
> > > with
> > >
> > > built-in functions that have priority as in the original design.
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 03.09.19 15:26, Kurt Young wrote:
> > >
> > > Does this only affect the functions and operations we currently
> > >
> > > have
> > >
> > > in SQL
> > >
> > > and
> > > have no effect on tables, right? Looks like this is an
> > >
> > > orthogonal
> > >
> > > concept
> > >
> > > with Catalog?
> > > If the answer are both yes, then the catalog function will be a
> > >
> > > weird
> > >
> > > concept?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> > >
> > > yuzhao.cyz@gmail.com
> > >
> > > wrote:
> > >
> > > The way you proposed are basically the same as what Calcite
> > >
> > > does, I
> > >
> > > think
> > >
> > > we are in the same line.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> > >
> > > ,写道:
> > >
> > > This sounds exactly as the module approach I mentioned, no?
> > >
> > > Regards,
> > > Timo
> > >
> > > On 03.09.19 13:42, Danny Chan wrote:
> > >
> > > Thanks Bowen for bring up this topic, I think it’s a useful
> > >
> > > refactoring to make our function usage more user friendly.
> > >
> > > For the topic of how to organize the builtin operators and
> > >
> > > operators
> > >
> > > of Hive, here is a solution from Apache Calcite, the Calcite
> > >
> > > way
> > >
> > > is
> > >
> > > to make
> > >
> > > every dialect operators a “Library”, user can specify which
> > >
> > > libraries they
> > >
> > > want to use for a sql query. The builtin operators always
> > >
> > > comes
> > >
> > > as
> > >
> > > the
> > >
> > > first class objects and the others are used from the order
> > >
> > > they
> > >
> > > appears.
> > >
> > > Maybe you can take a reference.
> > >
> > > [1]
> > >
> > >
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> > >
> > > ,写道:
> > >
> > > Hi folks,
> > >
> > > I'd like to kick off a discussion on reworking Flink's
> > >
> > > FunctionCatalog.
> > >
> > > It's critically helpful to improve function usability in
> > >
> > > SQL.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > >
> > > In short, it:
> > > - adds support for precise function reference with
> > >
> > > fully/partially
> > >
> > > qualified name
> > > - redefines function resolution order for ambiguous
> > >
> > > function
> > >
> > > reference
> > >
> > > - adds support for Hive's rich built-in functions (support
> > >
> > > for
> > >
> > > Hive
> > >
> > > user
> > >
> > > defined functions was already added in 1.9.0)
> > > - clarifies the concept of temporary functions
> > >
> > > Would love to hear your thoughts.
> > >
> > > Bowen
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> > >
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Thank all for the sharing thoughts. I think we have gathered some useful
initial feedback from this long discussion with a couple of focal points
sticking out.

 We will go back to do more research and adapt our proposal. Once it's
ready, we will ask for a new round of review. If there is any disagreement,
we will start a new discussion thread on each rather than having a mega
discussion like this.

Thanks to everyone for participating.

Regards,
Xuefu


On Thu, Sep 5, 2019 at 2:52 AM Bowen Li <bo...@gmail.com> wrote:

> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > Hi,
> >
> > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> are
> > very inconsistent in that manner (spark being way worse on that).
> >
> > Hive:
> >
> > You cannot overwrite all the built-in functions. I could overwrite most
> of
> > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > functions I cannot overwrite e.g. CAST, ARRAY I get:
> >
> >
> > *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
> >
> > What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > overwrite a function. When I drop the temporary function the native
> > function is still available.
> >
> > Spark:
> >
> > Spark's behavior imho is super bad.
> >
> > Theoretically I could overwrite all functions. I was able e.g. to
> > overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> > FUNCTION syntax. Otherwise I get an exception that a function already
> > exists. However when I used the CAST function in a query it used the
> > native, built-in one.
> >
> > When I overwrote current_date() function, it was used in a query, but it
> > completely replaces the built-in function and I can no longer use the
> > native function in any way. I cannot also drop the temporary function. I
> > get:
> >
> > *    Error in query: Cannot drop native function 'current_date';*
> >
> > Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> > with a database. Temporary functions are always represented as a single
> > name.
> >
> > In my opinion neither of the systems have consistent behavior. Generally
> > speaking I think overwriting any system provided functions is just
> > dangerous.
> >
> > Regarding Jark's concerns. Such functions would be registered in a
> current
> > catalog/database schema, so a user could still use its own function, but
> > would have to fully qualify the function (because built-in functions take
> > precedence). Moreover users would have the same problem with permanent
> > functions. Imagine a user have a permanent function 'cat.db.explode'. In
> > 1.9 the user could use just the 'explode' function as long as the 'cat' &
> > 'db' were the default catalog & database. If we introduce 'explode'
> > built-in function in 1.10, the user has to fully qualify the function.
> >
> > Best,
> >
> > Dawid
> > On 04/09/2019 15:19, Timo Walther wrote:
> >
> > Hi all,
> >
> > thanks for the healthy discussion. It is already a very long discussion
> > with a lot of text. So I will just post my opinion to a couple of
> > statements:
> >
> > > Hive built-in functions are not part of Flink built-in functions, they
> > are catalog functions
> >
> > That is not entirely true. Correct me if I'm wrong but I think Hive
> > built-in functions are also not catalog functions. They are not stored in
> > every Hive metastore catalog that is freshly created but are a set of
> > functions that are listed somewhere and made available.
> >
> > > ambiguous functions reference just shouldn't be resolved to a different
> > catalog
> >
> > I agree. They should not be resolved to a different catalog. That's why I
> > am suggesting to split the concept of built-in functions and catalog
> lookup
> > semantics.
> >
> > > I don't know if any other databases handle built-in functions like that
> >
> > What I called "module" is:
> > - Extension in Postgres [1]
> > - Plugin in Presto [2]
> >
> > Btw. Presto even mentions example modules that are similar to the ones
> > that we will introduce in the near future both for ML and System XYZ
> > compatibility:
> > "See either the presto-ml module for machine learning functions or the
> > presto-teradata-functions module for Teradata-compatible functions, both
> in
> > the root of the Presto source."
> >
> > > functions should be either built-in already or just libraries
> functions,
> > and library functions can be adapted to catalog APIs or of some other
> > syntax to use
> >
> > Regarding "built-in already", of course we can add a lot of functions as
> > built-ins but we will end-up in a dependency hell in the near future if
> we
> > don't introduce a pluggable approach. Library functions is what you also
> > suggest but storing them in a catalog means to always fully qualify them
> or
> > modifying the existing catalog design that was inspired by the standard.
> >
> > I don't think "it brings in even more complicated scenarios to the
> > design", it just does clear separation of concerns. Integrating the
> > functionality into the current design makes the catalog API more
> > complicated.
> >
> > > why would users name a temporary function the same as a built-in
> > function then?
> >
> > Because you never know what users do. If they don't, my suggested
> > resolution order should not be a problem, right?
> >
> > > I don't think hive functions deserves be a function module
> >
> > Our goal is not to create a Hive clone. We need to think forward and Hive
> > is just one of many systems that we can support. Not every built-in
> > function behaves and will behave exactly like Hive.
> >
> > > regarding temporary functions, there are few systems that support it
> >
> > IMHO Spark and Hive are not always the best examples for consistent
> > design. Systems like Postgres, Presto, or SQL Server should be used as a
> > reference. I don't think that a user can overwrite a built-in function
> > there.
> >
> > Regards,
> > Timo
> >
> > [1] https://www.postgresql.org/docs/10/extend-extensions.html
> > [2] https://prestodb.github.io/docs/current/develop/functions.html
> >
> >
> > On 04.09.19 13:44, Jark Wu wrote:
> >
> > Hi all,
> >
> > Regarding #1 temp function <> built-in function and naming.
> > I'm fine with temp functions should precede built-in function and can
> > override built-in functions (we already support to override built-in
> > function in 1.9).
> > If we don't allow the same name as a built-in function, I'm afraid we
> will
> > have compatibility issues in the future.
> > Say users register a user defined function named "explode" in 1.9, and we
> > support a built-in "explode" function in 1.10.
> > Then the user's jobs which call the registered "explode" function in 1.9
> > will all fail in 1.10 because of naming conflict.
> >
> > Regarding #2 "External" built-in functions.
> > I think if we store external built-in functions in catalog, then
> > "hive1::sqrt" is a good way to go.
> > However, I would prefer to support a discovery mechanism (e.g. SPI) for
> > built-in functions as Timo suggested above.
> > This gives us the flexibility to add Hive or MySQL or Geo or whatever
> > function set as built-in functions in an easy way.
> >
> > Best,
> > Jark
> >
> > On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
> > <us...@gmail.com> wrote:
> >
> > Hi David,
> >
> > Thank you for sharing your findings. It seems to me that there is no SQL
> > standard regarding temporary functions. There are few systems that
> support
> > it. Here are what I have found:
> >
> > 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> > 2. Spark: basically follows Hive (
> >
> >
> >
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> > )
> > 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> > behavior. (
> > http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html
> )
> >
> > Because of lack of standard, it's perfectly fine for Flink to define
> > whatever it sees appropriate. Thus, your proposal (no overwriting and
> must
> > have DB as holder) is one option. The advantage is simplicity, The
> > downside
> > is the deviation from Hive, which is popular and de facto standard in big
> > data world.
> >
> > However, I don't think we have to follow Hive. More importantly, we need
> a
> > consensus. I have no objection if your proposal is generally agreed upon.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dwysakowicz@apache.org
> >
> > <dw...@apache.org>
> > wrote:
> >
> > Hi all,
> >
> > Just an opinion on the built-in <> temporary functions resolution and
> > NAMING issue. I think we should not allow overriding the built-in
> > functions, as this may pose serious issues and to be honest is rather
> > not feasible and would require major rework. What happens if a user
> > wants to override CAST? Calls to that function are generated at
> > different layers of the stack that unfortunately does not always go
> > through the Catalog API (at least yet). Moreover from what I've checked
> > no other systems allow overriding the built-in functions. All the
> > systems I've checked so far register temporary functions in a
> > database/schema (either special database for temporary functions, or
> > just current database). What I would suggest is to always register
> > temporary functions with a 3 part identifier. The same way as tables,
> > views etc. This effectively means you cannot override built-in
> > functions. With such approach it is natural that the temporary functions
> > end up a step lower in the resolution order:
> >
> > 1. built-in functions (1 part, maybe 2? - this is still under discussion)
> >
> > 2. temporary functions (always 3 part path)
> >
> > 3. catalog functions (always 3 part path)
> >
> > Let me know what do you think.
> >
> > Best,
> >
> > Dawid
> >
> > On 04/09/2019 06:13, Bowen Li wrote:
> >
> > Hi,
> >
> > I agree with Xuefu that the main controversial points are mainly the
> >
> > two
> >
> > places. My thoughts on them:
> >
> > 1) Determinism of referencing Hive built-in functions. We can either
> >
> > remove
> >
> > Hive built-in functions from ambiguous function resolution and require
> > users to use special syntax for their qualified names, or add a config
> >
> > flag
> >
> > to catalog constructor/yaml for turning on and off Hive built-in
> >
> > functions
> >
> > with the flag set to 'false' by default and proper doc added to help
> >
> > users
> >
> > make their decisions.
> >
> > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> >
> > function
> >
> > resolution order. We believe Flink temp functions should precede Flink
> > built-in functions, and I have presented my reasons. Just in case if we
> > cannot reach an agreement, I propose forbid users registering temp
> > functions in the same name as a built-in function, like MySQL's
> >
> > approach,
> >
> > for the moment. It won't have any performance concern, since built-in
> > functions are all in memory and thus cost of a name check will be
> >
> > really
> >
> > trivial.
> >
> >
> > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
> > <us...@gmail.com> wrote:
> >
> >  From what I have seen, there are a couple of focal disagreements:
> >
> > 1. Resolution order: temp function --> flink built-in function -->
> >
> > catalog
> >
> > function vs flink built-in function --> temp function -> catalog
> >
> > function.
> >
> > 2. "External" built-in functions: how to treat built-in functions in
> > external system and how users reference them
> >
> > For #1, I agree with Bowen that temp function needs to be at the
> >
> > highest
> >
> > priority because that's how a user might overwrite a built-in function
> > without referencing a persistent, overwriting catalog function with a
> >
> > fully
> >
> > qualified name. Putting built-in functions at the highest priority
> > eliminates that usage.
> >
> > For #2, I saw a general agreement on referencing "external" built-in
> > functions such as those in Hive needs to be explicit and deterministic
> >
> > even
> >
> > though different approaches are proposed. To limit the scope and
> >
> > simply
> >
> > the
> >
> > usage, it seems making sense to me to introduce special syntax for
> >
> > user  to
> >
> > explicitly reference an external built-in function such as hive1::sqrt
> >
> > or
> >
> > hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> >
> > call
> >
> > hive1.getFunction(ObjectPath functionName) where the database name is
> > absent for bulit-in functions available in that catalog hive1. I
> >
> > understand
> >
> > that Bowen's original proposal was trying to avoid this, but this
> >
> > could
> >
> > turn out to be a clean and simple solution.
> >
> > (Timo's modular approach is great way to "expand" Flink's built-in
> >
> > function
> >
> > set, which seems orthogonal and complementary to this, which could be
> > tackled in further future work.)
> >
> > I'd be happy to hear further thoughts on the two points.
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
> > <yk...@gmail.com> wrote:
> >
> > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> >
> > the
> >
> > same
> > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > suggestion.
> >
> > The reason is backward compatibility. If we follow Bowen's approach,
> >
> > let's
> >
> > say we
> > first find function in Flink's built-in functions, and then hive's
> > built-in. For example, `foo`
> > is not supported by Flink, but hive has such built-in function. So
> >
> > user
> >
> > will have hive's
> > behavior for function `foo`. And in next release, Flink realize this
> >
> > is a
> >
> > very popular function
> > and add it into Flink's built-in functions, but with different
> >
> > behavior
> >
> > as
> >
> > hive's. So in next
> > release, the behavior changes.
> >
> > With Timo's approach, IIUC user have to tell the framework explicitly
> >
> > what
> >
> > kind of
> > built-in functions he would like to use. He can just tell framework
> >
> > to
> >
> > abandon Flink's built-in
> > functions, and use hive's instead. User can only choose between them,
> >
> > but
> >
> > not use
> > them at the same time. I think this approach is more predictable.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
> > <bo...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> >
> > section
> >
> > in the google doc was updated, please take a look first and let me
> >
> > know
> >
> > if
> >
> > you have more questions.
> >
> > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> > <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Timo,
> >
> > Re> 1) We should not have the restriction "hive built-in functions
> >
> > can
> >
> > only
> >
> > be used when current catalog is hive catalog". Switching a catalog
> > should only have implications on the cat.db.object resolution but
> >
> > not
> >
> > functions. It would be quite convinient for users to use Hive
> >
> > built-ins
> >
> > even if they use a Confluent schema registry or just the in-memory
> >
> > catalog.
> >
> > There might be a misunderstanding here.
> >
> > First of all, Hive built-in functions are not part of Flink
> >
> > built-in
> >
> > functions, they are catalog functions, thus if the current catalog
> >
> > is
> >
> > not a
> >
> > HiveCatalog but, say, a schema registry catalog, ambiguous
> >
> > functions
> >
> > reference just shouldn't be resolved to a different catalog.
> >
> > Second, Hive built-in functions can potentially be referenced
> >
> > across
> >
> > catalog, but it doesn't have db namespace and we currently just
> >
> > don't
> >
> > have
> >
> > a SQL syntax for it. It can be enabled when such a SQL syntax is
> >
> > defined,
> >
> > e.g. "catalog::function", but it's out of scope of this FLIP.
> >
> > 2) I would propose to have separate concepts for catalog and
> >
> > built-in
> >
> > functions. In particular it would be nice to modularize built-in
> > functions. Some built-in functions are very crucial (like AS, CAST,
> > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> >
> > maybe
> >
> > we add more experimental functions in the future or function for
> >
> > some
> >
> > special application area (Geo functions, ML functions). A data
> >
> > platform
> >
> > team might not want to make every built-in function available. Or a
> > function module like ML functions is in a different Maven module.
> >
> > I think this is orthogonal to this FLIP, especially we don't have
> >
> > the
> >
> > "external built-in functions" anymore and currently the built-in
> >
> > function
> >
> > category remains untouched.
> >
> > But just to share some thoughts on the proposal, I'm not sure about
> >
> > it:
> >
> > - I don't know if any other databases handle built-in functions
> >
> > like
> >
> > that.
> >
> > Maybe you can give some examples? IMHO, built-in functions are
> >
> > system
> >
> > info
> >
> > and should be deterministic, not depending on loaded libraries. Geo
> > functions should be either built-in already or just libraries
> >
> > functions,
> >
> > and library functions can be adapted to catalog APIs or of some
> >
> > other
> >
> > syntax to use
> > - I don't know if all use cases stand, and many can be achieved by
> >
> > other
> >
> > approaches too. E.g. experimental functions can be taken good care
> >
> > of
> >
> > by
> >
> > documentations, annotations, etc
> > - the proposal basically introduces some concept like a pluggable
> >
> > built-in
> >
> > function catalog, despite the already existing catalog APIs
> > - it brings in even more complicated scenarios to the design. E.g.
> >
> > how
> >
> > do
> >
> > you handle built-in functions in different modules but different
> >
> > names?
> >
> > In short, I'm not sure if it really stands and it looks like an
> >
> > overkill
> >
> > to me. I'd rather not go to that route. Related discussion can be
> >
> > on
> >
> > its
> >
> > own thread.
> >
> > 3) Following the suggestion above, we can have a separate discovery
> > mechanism for built-in functions. Instead of just going through a
> >
> > static
> >
> > list like in BuiltInFunctionDefinitions, a platform team should be
> >
> > able
> >
> > to select function modules like
> > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > HiveFunctions) or via service discovery;
> >
> > Same as above. I'll leave it to its own thread.
> >
> > re > 3) Dawid and I discussed the resulution order again. I agree
> >
> > with
> >
> > Kurt
> >
> > that we should unify built-in function (external or internal)
> >
> > under a
> >
> > common layer. However, the resolution order should be:
> >    1. built-in functions
> >    2. temporary functions
> >    3. regular catalog resolution logic
> > Otherwise a temporary function could cause clashes with Flink's
> >
> > built-in
> >
> > functions. If you take a look at other vendors, like SQL Server
> >
> > they
> >
> > also do not allow to overwrite built-in functions.
> >
> > ”I agree with Kurt that we should unify built-in function (external
> >
> > or
> >
> > internal) under a common layer.“ <- I don't think this is what Kurt
> >
> > means.
> >
> > Kurt and I are in favor of unifying built-in functions of external
> >
> > systems
> >
> > and catalog functions. Did you type a mistake?
> >
> > Besides, I'm not sure about the resolution order you proposed.
> >
> > Temporary
> >
> > functions have a lifespan over a session and are only visible to
> >
> > the
> >
> > session owner, they are unique to each user, and users create them
> >
> > on
> >
> > purpose to be the highest priority in order to overwrite system
> >
> > info
> >
> > (built-in functions in this case).
> >
> > In your case, why would users name a temporary function the same
> >
> > as a
> >
> > built-in function then? Since using that name in ambiguous function
> > reference will always be resolved to built-in functions, creating a
> > same-named temp function would be meaningless in the end.
> >
> >
> > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> > <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Jingsong,
> >
> > Re> 1.Hive built-in functions is an intermediate solution. So we
> >
> > should
> >
> > not introduce interfaces to influence the framework. To make
> > Flink itself more powerful, we should implement the functions
> > we need to add.
> >
> > Yes, please see the doc.
> >
> > Re> 2.Non-flink built-in functions are easy for users to change
> >
> > their
> >
> > behavior. If we support some flink built-in functions in the
> > future but act differently from non-flink built-in, this will
> >
> > lead
> >
> > to
> >
> > changes in user behavior.
> >
> > There's no such concept as "external built-in functions" any more.
> > Built-in functions of external systems will be treated as special
> >
> > catalog
> >
> > functions.
> >
> > Re> Another question is, does this fallback include all
> >
> > hive built-in functions? As far as I know, some hive functions
> > have some hacky. If possible, can we start with a white list?
> > Once we implement some functions to flink built-in, we can
> > also update the whitelist.
> >
> > Yes, that's something we thought of too. I don't think it's super
> > critical to the scope of this FLIP, thus I'd like to leave it to
> >
> > future
> >
> > efforts as a nice-to-have feature.
> >
> >
> > On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> > <bo...@gmail.com>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > Re: > What I want to propose is we can merge #3 and #4, make them
> >
> > both
> >
> > under
> >
> > "catalog" concept, by extending catalog function to make it have
> >
> > ability to
> >
> > have built-in catalog functions. Some benefits I can see from
> >
> > this
> >
> > approach:
> >
> > 1. We don't have to introduce new concept like external built-in
> >
> > functions.
> >
> > Actually I don't see a full story about how to treat a built-in
> >
> > functions, and it
> >
> > seems a little bit disrupt with catalog. As a result, you have
> >
> > to
> >
> > make
> >
> > some restriction
> >
> > like "hive built-in functions can only be used when current
> >
> > catalog
> >
> > is
> >
> > hive catalog".
> >
> > Yes, I've unified #3 and #4 but it seems I didn't update some
> >
> > part
> >
> > of
> >
> > the doc. I've modified those sections, and they are up to date
> >
> > now.
> >
> > In short, now built-in function of external systems are defined
> >
> > as
> >
> > a
> >
> > special kind of catalog function in Flink, and handled by Flink
> >
> > as
> >
> > following:
> > - An external built-in function must be associated with a catalog
> >
> > for
> >
> > the purpose of decoupling flink-table and external systems.
> > - It always resides in front of catalog functions in ambiguous
> >
> > function
> >
> > reference order, just like in its own external system
> > - It is a special catalog function that doesn’t have a
> >
> > schema/database
> >
> > namespace
> > - It goes thru the same instantiation logic as other user defined
> > catalog functions in the external system
> >
> > Please take another look at the doc, and let me know if you have
> >
> > more
> >
> > questions.
> >
> >
> > On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > <tw...@apache.org>
> >
> > wrote:
> >
> > Hi Kurt,
> >
> > it should not affect the functions and operations we currently
> >
> > have
> >
> > in
> >
> > SQL. It just categorizes the available built-in functions. It is
> >
> > kind
> >
> > of
> > an orthogonal concept to the catalog API but built-in functions
> >
> > deserve
> >
> > this special kind of treatment. CatalogFunction still fits
> >
> > perfectly
> >
> > in
> >
> > there because the regular catalog object resolution logic is not
> > affected. So tables and functions are resolved in the same way
> >
> > but
> >
> > with
> >
> > built-in functions that have priority as in the original design.
> >
> > Regards,
> > Timo
> >
> >
> > On 03.09.19 15:26, Kurt Young wrote:
> >
> > Does this only affect the functions and operations we currently
> >
> > have
> >
> > in SQL
> >
> > and
> > have no effect on tables, right? Looks like this is an
> >
> > orthogonal
> >
> > concept
> >
> > with Catalog?
> > If the answer are both yes, then the catalog function will be a
> >
> > weird
> >
> > concept?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> >
> > yuzhao.cyz@gmail.com
> >
> > wrote:
> >
> > The way you proposed are basically the same as what Calcite
> >
> > does, I
> >
> > think
> >
> > we are in the same line.
> >
> > Best,
> > Danny Chan
> > 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> >
> > ,写道:
> >
> > This sounds exactly as the module approach I mentioned, no?
> >
> > Regards,
> > Timo
> >
> > On 03.09.19 13:42, Danny Chan wrote:
> >
> > Thanks Bowen for bring up this topic, I think it’s a useful
> >
> > refactoring to make our function usage more user friendly.
> >
> > For the topic of how to organize the builtin operators and
> >
> > operators
> >
> > of Hive, here is a solution from Apache Calcite, the Calcite
> >
> > way
> >
> > is
> >
> > to make
> >
> > every dialect operators a “Library”, user can specify which
> >
> > libraries they
> >
> > want to use for a sql query. The builtin operators always
> >
> > comes
> >
> > as
> >
> > the
> >
> > first class objects and the others are used from the order
> >
> > they
> >
> > appears.
> >
> > Maybe you can take a reference.
> >
> > [1]
> >
> >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >
> > Best,
> > Danny Chan
> > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> >
> > ,写道:
> >
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's
> >
> > FunctionCatalog.
> >
> > It's critically helpful to improve function usability in
> >
> > SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with
> >
> > fully/partially
> >
> > qualified name
> > - redefines function resolution order for ambiguous
> >
> > function
> >
> > reference
> >
> > - adds support for Hive's rich built-in functions (support
> >
> > for
> >
> > Hive
> >
> > user
> >
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
> >
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Let me try to summarize and conclude the long thread so far:

1. For order of temp function v.s. built-in function:

I think Dawid's point that temp function should be of fully qualified path
is a better reasoning to back the newly proposed order, and i agree we
don't need to follow Hive/Spark.

However, I'd rather not change fundamentals of temporary functions in this
FLIP. It belongs to a bigger story of how temporary objects should be
redefined and be handled uniformly - currently temporary tables and views
(those registered from TableEnv#registerTable()) behave different than what
Dawid propose for temp functions, and we need a FLIP to just unify their
APIs and behaviors.

I agree that backward compatibility is not an issue w.r.t Jark's points.

***Seems we do have consensus that it's acceptable to prevent users
registering a temp function in the same name as a built-in function. To
help us move forward, I'd like to propose setting such a restraint on temp
functions in this FLIP to simplify the design and avoid disputes.*** It
will also leave rooms for improvements in the future.


2. For Hive built-in function:

Thanks Timo for providing the Presto and Postgres examples. I feel modular
built-in functions can be a good fit for the geo and ml example as a native
Flink extension, but not sure if it fits well with external integrations.
Anyway, I think modular built-in functions is a bigger story and can be on
its own thread too, and our proposal doesn't prevent Flink from doing that
in the future.

***Seems we have consensus that users should be able to use built-in
functions of Hive or other external systems in SQL explicitly and
deterministically regardless of Flink built-in functions and the potential
modular built-in functions, via some new syntax like "mycat::func"? If so,
I'd like to propose removing Hive built-in functions from ambiguous
function resolution order, and empower users with such a syntax. This way
we sacrifice a little convenience for certainty***


What do you think?

On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they are
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most of
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *    ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*, but I can ovewrite
> *map* or *struct*. Though hive behaves reasonable well if I manage to
> overwrite a function. When I drop the temporary function the native
> function is still available.
>
> Spark:
>
> Spark's behavior imho is super bad.
>
> Theoretically I could overwrite all functions. I was able e.g. to
> overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
> FUNCTION syntax. Otherwise I get an exception that a function already
> exists. However when I used the CAST function in a query it used the
> native, built-in one.
>
> When I overwrote current_date() function, it was used in a query, but it
> completely replaces the built-in function and I can no longer use the
> native function in any way. I cannot also drop the temporary function. I
> get:
>
> *    Error in query: Cannot drop native function 'current_date';*
>
> Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
> with a database. Temporary functions are always represented as a single
> name.
>
> In my opinion neither of the systems have consistent behavior. Generally
> speaking I think overwriting any system provided functions is just
> dangerous.
>
> Regarding Jark's concerns. Such functions would be registered in a current
> catalog/database schema, so a user could still use its own function, but
> would have to fully qualify the function (because built-in functions take
> precedence). Moreover users would have the same problem with permanent
> functions. Imagine a user have a permanent function 'cat.db.explode'. In
> 1.9 the user could use just the 'explode' function as long as the 'cat' &
> 'db' were the default catalog & database. If we introduce 'explode'
> built-in function in 1.10, the user has to fully qualify the function.
>
> Best,
>
> Dawid
> On 04/09/2019 15:19, Timo Walther wrote:
>
> Hi all,
>
> thanks for the healthy discussion. It is already a very long discussion
> with a lot of text. So I will just post my opinion to a couple of
> statements:
>
> > Hive built-in functions are not part of Flink built-in functions, they
> are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored in
> every Hive metastore catalog that is freshly created but are a set of
> functions that are listed somewhere and made available.
>
> > ambiguous functions reference just shouldn't be resolved to a different
> catalog
>
> I agree. They should not be resolved to a different catalog. That's why I
> am suggesting to split the concept of built-in functions and catalog lookup
> semantics.
>
> > I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions, both in
> the root of the Presto source."
>
> > functions should be either built-in already or just libraries functions,
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions as
> built-ins but we will end-up in a dependency hell in the near future if we
> don't introduce a pluggable approach. Library functions is what you also
> suggest but storing them in a catalog means to always fully qualify them or
> modifying the existing catalog design that was inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
> > why would users name a temporary function the same as a built-in
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
> > I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and Hive
> is just one of many systems that we can support. Not every built-in
> function behaves and will behave exactly like Hive.
>
> > regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as a
> reference. I don't think that a user can overwrite a built-in function
> there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we will
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com>
> <us...@gmail.com> wrote:
>
> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that support
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and must
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need a
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org>
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com>
> <us...@gmail.com> wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that Bowen's original proposal was trying to avoid this, but this
>
> could
>
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in
>
> function
>
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com>
> <yk...@gmail.com> wrote:
>
> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>
> the
>
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach,
>
> let's
>
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So
>
> user
>
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this
>
> is a
>
> very popular function
> and add it into Flink's built-in functions, but with different
>
> behavior
>
> as
>
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly
>
> what
>
> kind of
> built-in functions he would like to use. He can just tell framework
>
> to
>
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them,
>
> but
>
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com> wrote:
>
> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>
> section
>
> in the google doc was updated, please take a look first and let me
>
> know
>
> if
>
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions
>
> can
>
> only
>
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but
>
> not
>
> functions. It would be quite convinient for users to use Hive
>
> built-ins
>
> even if they use a Confluent schema registry or just the in-memory
>
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink
>
> built-in
>
> functions, they are catalog functions, thus if the current catalog
>
> is
>
> not a
>
> HiveCatalog but, say, a schema registry catalog, ambiguous
>
> functions
>
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced
>
> across
>
> catalog, but it doesn't have db namespace and we currently just
>
> don't
>
> have
>
> a SQL syntax for it. It can be enabled when such a SQL syntax is
>
> defined,
>
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and
>
> built-in
>
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>
> maybe
>
> we add more experimental functions in the future or function for
>
> some
>
> special application area (Geo functions, ML functions). A data
>
> platform
>
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have
>
> the
>
> "external built-in functions" anymore and currently the built-in
>
> function
>
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about
>
> it:
>
> - I don't know if any other databases handle built-in functions
>
> like
>
> that.
>
> Maybe you can give some examples? IMHO, built-in functions are
>
> system
>
> info
>
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries
>
> functions,
>
> and library functions can be adapted to catalog APIs or of some
>
> other
>
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by
>
> other
>
> approaches too. E.g. experimental functions can be taken good care
>
> of
>
> by
>
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable
>
> built-in
>
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g.
>
> how
>
> do
>
> you handle built-in functions in different modules but different
>
> names?
>
> In short, I'm not sure if it really stands and it looks like an
>
> overkill
>
> to me. I'd rather not go to that route. Related discussion can be
>
> on
>
> its
>
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a
>
> static
>
> list like in BuiltInFunctionDefinitions, a platform team should be
>
> able
>
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree
>
> with
>
> Kurt
>
> that we should unify built-in function (external or internal)
>
> under a
>
> common layer. However, the resolution order should be:
>    1. built-in functions
>    2. temporary functions
>    3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's
>
> built-in
>
> functions. If you take a look at other vendors, like SQL Server
>
> they
>
> also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external
>
> or
>
> internal) under a common layer.“ <- I don't think this is what Kurt
>
> means.
>
> Kurt and I are in favor of unifying built-in functions of external
>
> systems
>
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed.
>
> Temporary
>
> functions have a lifespan over a session and are only visible to
>
> the
>
> session owner, they are unique to each user, and users create them
>
> on
>
> purpose to be the highest priority in order to overwrite system
>
> info
>
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same
>
> as a
>
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we
>
> should
>
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change
>
> their
>
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will
>
> lead
>
> to
>
> changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special
>
> catalog
>
> functions.
>
> Re> Another question is, does this fallback include all
>
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super
> critical to the scope of this FLIP, thus I'd like to leave it to
>
> future
>
> efforts as a nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> <bo...@gmail.com>
>
> wrote:
>
> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them
>
> both
>
> under
>
> "catalog" concept, by extending catalog function to make it have
>
> ability to
>
> have built-in catalog functions. Some benefits I can see from
>
> this
>
> approach:
>
> 1. We don't have to introduce new concept like external built-in
>
> functions.
>
> Actually I don't see a full story about how to treat a built-in
>
> functions, and it
>
> seems a little bit disrupt with catalog. As a result, you have
>
> to
>
> make
>
> some restriction
>
> like "hive built-in functions can only be used when current
>
> catalog
>
> is
>
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some
>
> part
>
> of
>
> the doc. I've modified those sections, and they are up to date
>
> now.
>
> In short, now built-in function of external systems are defined
>
> as
>
> a
>
> special kind of catalog function in Flink, and handled by Flink
>
> as
>
> following:
> - An external built-in function must be associated with a catalog
>
> for
>
> the purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous
>
> function
>
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a
>
> schema/database
>
> namespace
> - It goes thru the same instantiation logic as other user defined
> catalog functions in the external system
>
> Please take another look at the doc, and let me know if you have
>
> more
>
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> <tw...@apache.org>
>
> wrote:
>
> Hi Kurt,
>
> it should not affect the functions and operations we currently
>
> have
>
> in
>
> SQL. It just categorizes the available built-in functions. It is
>
> kind
>
> of
> an orthogonal concept to the catalog API but built-in functions
>
> deserve
>
> this special kind of treatment. CatalogFunction still fits
>
> perfectly
>
> in
>
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way
>
> but
>
> with
>
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
>
> Does this only affect the functions and operations we currently
>
> have
>
> in SQL
>
> and
> have no effect on tables, right? Looks like this is an
>
> orthogonal
>
> concept
>
> with Catalog?
> If the answer are both yes, then the catalog function will be a
>
> weird
>
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>
> yuzhao.cyz@gmail.com
>
> wrote:
>
> The way you proposed are basically the same as what Calcite
>
> does, I
>
> think
>
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>
> ,写道:
>
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
>
> Thanks Bowen for bring up this topic, I think it’s a useful
>
> refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and
>
> operators
>
> of Hive, here is a solution from Apache Calcite, the Calcite
>
> way
>
> is
>
> to make
>
> every dialect operators a “Library”, user can specify which
>
> libraries they
>
> want to use for a sql query. The builtin operators always
>
> comes
>
> as
>
> the
>
> first class objects and the others are used from the order
>
> they
>
> appears.
>
> Maybe you can take a reference.
>
> [1]
>
>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>
> ,写道:
>
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's
>
> FunctionCatalog.
>
> It's critically helpful to improve function usability in
>
> SQL.
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with
>
> fully/partially
>
> qualified name
> - redefines function resolution order for ambiguous
>
> function
>
> reference
>
> - adds support for Hive's rich built-in functions (support
>
> for
>
> Hive
>
> user
>
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi,

Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
are very inconsistent in that manner (spark being way worse on that).

Hive:

You cannot overwrite all the built-in functions. I could overwrite most
of the functions I tried e.g. length, e, pi, round, rtrim, but there are
functions I cannot overwrite e.g. CAST, ARRAY I get:

/    ParseException line 1:29 cannot recognize input near 'array' 'AS'
/

What is interesting is that I cannot ovewrite /array/, but I can
ovewrite /map/ or /struct/. Though hive behaves reasonable well if I
manage to overwrite a function. When I drop the temporary function the
native function is still available.

Spark:

Spark's behavior imho is super bad.

Theoretically I could overwrite all functions. I was able e.g. to
overwrite CAST function. I had to use though CREATE OR REPLACE TEMPORARY
FUNCTION syntax. Otherwise I get an exception that a function already
exists. However when I used the CAST function in a query it used the
native, built-in one.

When I overwrote current_date() function, it was used in a query, but it
completely replaces the built-in function and I can no longer use the
native function in any way. I cannot also drop the temporary function. I
get:

/    Error in query: Cannot drop native function 'current_date';/

Additional note, both systems do not allow creating TEMPORARY FUNCTIONS
with a database. Temporary functions are always represented as a single
name.

In my opinion neither of the systems have consistent behavior. Generally
speaking I think overwriting any system provided functions is just
dangerous.

Regarding Jark's concerns. Such functions would be registered in a
current catalog/database schema, so a user could still use its own
function, but would have to fully qualify the function (because built-in
functions take precedence). Moreover users would have the same problem
with permanent functions. Imagine a user have a permanent function
'cat.db.explode'. In 1.9 the user could use just the 'explode' function
as long as the 'cat' & 'db' were the default catalog & database. If we
introduce 'explode' built-in function in 1.10, the user has to fully
qualify the function.

Best,

Dawid

On 04/09/2019 15:19, Timo Walther wrote:
> Hi all,
>
> thanks for the healthy discussion. It is already a very long
> discussion with a lot of text. So I will just post my opinion to a
> couple of statements:
>
> > Hive built-in functions are not part of Flink built-in functions,
> they are catalog functions
>
> That is not entirely true. Correct me if I'm wrong but I think Hive
> built-in functions are also not catalog functions. They are not stored
> in every Hive metastore catalog that is freshly created but are a set
> of functions that are listed somewhere and made available.
>
> > ambiguous functions reference just shouldn't be resolved to a
> different catalog
>
> I agree. They should not be resolved to a different catalog. That's
> why I am suggesting to split the concept of built-in functions and
> catalog lookup semantics.
>
> > I don't know if any other databases handle built-in functions like that
>
> What I called "module" is:
> - Extension in Postgres [1]
> - Plugin in Presto [2]
>
> Btw. Presto even mentions example modules that are similar to the ones
> that we will introduce in the near future both for ML and System XYZ
> compatibility:
> "See either the presto-ml module for machine learning functions or the
> presto-teradata-functions module for Teradata-compatible functions,
> both in the root of the Presto source."
>
> > functions should be either built-in already or just libraries
> functions, and library functions can be adapted to catalog APIs or of
> some other syntax to use
>
> Regarding "built-in already", of course we can add a lot of functions
> as built-ins but we will end-up in a dependency hell in the near
> future if we don't introduce a pluggable approach. Library functions
> is what you also suggest but storing them in a catalog means to always
> fully qualify them or modifying the existing catalog design that was
> inspired by the standard.
>
> I don't think "it brings in even more complicated scenarios to the
> design", it just does clear separation of concerns. Integrating the
> functionality into the current design makes the catalog API more
> complicated.
>
> > why would users name a temporary function the same as a built-in
> function then?
>
> Because you never know what users do. If they don't, my suggested
> resolution order should not be a problem, right?
>
> > I don't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and
> Hive is just one of many systems that we can support. Not every
> built-in function behaves and will behave exactly like Hive.
>
> > regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as
> a reference. I don't think that a user can overwrite a built-in
> function there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>> Hi all,
>>
>> Regarding #1 temp function <> built-in function and naming.
>> I'm fine with temp functions should precede built-in function and can
>> override built-in functions (we already support to override built-in
>> function in 1.9).
>> If we don't allow the same name as a built-in function, I'm afraid we
>> will
>> have compatibility issues in the future.
>> Say users register a user defined function named "explode" in 1.9,
>> and we
>> support a built-in "explode" function in 1.10.
>> Then the user's jobs which call the registered "explode" function in 1.9
>> will all fail in 1.10 because of naming conflict.
>>
>> Regarding #2 "External" built-in functions.
>> I think if we store external built-in functions in catalog, then
>> "hive1::sqrt" is a good way to go.
>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>> built-in functions as Timo suggested above.
>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>> function set as built-in functions in an easy way.
>>
>> Best,
>> Jark
>>
>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> wrote:
>>
>>> Hi David,
>>>
>>> Thank you for sharing your findings. It seems to me that there is no
>>> SQL
>>> standard regarding temporary functions. There are few systems that
>>> support
>>> it. Here are what I have found:
>>>
>>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>>> 2. Spark: basically follows Hive (
>>>
>>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>>
>>> )
>>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
>>> overwriting
>>> behavior. (
>>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>>>
>>>
>>> Because of lack of standard, it's perfectly fine for Flink to define
>>> whatever it sees appropriate. Thus, your proposal (no overwriting
>>> and must
>>> have DB as holder) is one option. The advantage is simplicity, The
>>> downside
>>> is the deviation from Hive, which is popular and de facto standard
>>> in big
>>> data world.
>>>
>>> However, I don't think we have to follow Hive. More importantly, we
>>> need a
>>> consensus. I have no objection if your proposal is generally agreed
>>> upon.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz
>>> <dw...@apache.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Just an opinion on the built-in <> temporary functions resolution and
>>>> NAMING issue. I think we should not allow overriding the built-in
>>>> functions, as this may pose serious issues and to be honest is rather
>>>> not feasible and would require major rework. What happens if a user
>>>> wants to override CAST? Calls to that function are generated at
>>>> different layers of the stack that unfortunately does not always go
>>>> through the Catalog API (at least yet). Moreover from what I've
>>>> checked
>>>> no other systems allow overriding the built-in functions. All the
>>>> systems I've checked so far register temporary functions in a
>>>> database/schema (either special database for temporary functions, or
>>>> just current database). What I would suggest is to always register
>>>> temporary functions with a 3 part identifier. The same way as tables,
>>>> views etc. This effectively means you cannot override built-in
>>>> functions. With such approach it is natural that the temporary
>>>> functions
>>>> end up a step lower in the resolution order:
>>>>
>>>> 1. built-in functions (1 part, maybe 2? - this is still under
>>>> discussion)
>>>>
>>>> 2. temporary functions (always 3 part path)
>>>>
>>>> 3. catalog functions (always 3 part path)
>>>>
>>>> Let me know what do you think.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 04/09/2019 06:13, Bowen Li wrote:
>>>>> Hi,
>>>>>
>>>>> I agree with Xuefu that the main controversial points are mainly the
>>> two
>>>>> places. My thoughts on them:
>>>>>
>>>>> 1) Determinism of referencing Hive built-in functions. We can either
>>>> remove
>>>>> Hive built-in functions from ambiguous function resolution and
>>>>> require
>>>>> users to use special syntax for their qualified names, or add a
>>>>> config
>>>> flag
>>>>> to catalog constructor/yaml for turning on and off Hive built-in
>>>> functions
>>>>> with the flag set to 'false' by default and proper doc added to help
>>>> users
>>>>> make their decisions.
>>>>>
>>>>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>>> function
>>>>> resolution order. We believe Flink temp functions should precede
>>>>> Flink
>>>>> built-in functions, and I have presented my reasons. Just in case
>>>>> if we
>>>>> cannot reach an agreement, I propose forbid users registering temp
>>>>> functions in the same name as a built-in function, like MySQL's
>>> approach,
>>>>> for the moment. It won't have any performance concern, since built-in
>>>>> functions are all in memory and thus cost of a name check will be
>>> really
>>>>> trivial.
>>>>>
>>>>>
>>>>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:
>>>>>
>>>>>>  From what I have seen, there are a couple of focal disagreements:
>>>>>>
>>>>>> 1. Resolution order: temp function --> flink built-in function -->
>>>> catalog
>>>>>> function vs flink built-in function --> temp function -> catalog
>>>> function.
>>>>>> 2. "External" built-in functions: how to treat built-in functions in
>>>>>> external system and how users reference them
>>>>>>
>>>>>> For #1, I agree with Bowen that temp function needs to be at the
>>> highest
>>>>>> priority because that's how a user might overwrite a built-in
>>>>>> function
>>>>>> without referencing a persistent, overwriting catalog function
>>>>>> with a
>>>> fully
>>>>>> qualified name. Putting built-in functions at the highest priority
>>>>>> eliminates that usage.
>>>>>>
>>>>>> For #2, I saw a general agreement on referencing "external" built-in
>>>>>> functions such as those in Hive needs to be explicit and
>>>>>> deterministic
>>>> even
>>>>>> though different approaches are proposed. To limit the scope and
>>> simply
>>>> the
>>>>>> usage, it seems making sense to me to introduce special syntax for
>>>> user  to
>>>>>> explicitly reference an external built-in function such as
>>>>>> hive1::sqrt
>>>> or
>>>>>> hive1._built_in.sqrt. This is a DML syntax matching nicely
>>>>>> Catalog API
>>>> call
>>>>>> hive1.getFunction(ObjectPath functionName) where the database
>>>>>> name is
>>>>>> absent for bulit-in functions available in that catalog hive1. I
>>>> understand
>>>>>> that Bowen's original proposal was trying to avoid this, but this
>>> could
>>>>>> turn out to be a clean and simple solution.
>>>>>>
>>>>>> (Timo's modular approach is great way to "expand" Flink's built-in
>>>> function
>>>>>> set, which seems orthogonal and complementary to this, which
>>>>>> could be
>>>>>> tackled in further future work.)
>>>>>>
>>>>>> I'd be happy to hear further thoughts on the two points.
>>>>>>
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>>
>>>>>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Timo & Bowen for the feedback. Bowen was right, my
>>>>>>> proposal is
>>>> the
>>>>>>> same
>>>>>>> as Bowen's. But after thinking about it, I'm currently lean to
>>>>>>> Timo's
>>>>>>> suggestion.
>>>>>>>
>>>>>>> The reason is backward compatibility. If we follow Bowen's
>>>>>>> approach,
>>>>>> let's
>>>>>>> say we
>>>>>>> first find function in Flink's built-in functions, and then hive's
>>>>>>> built-in. For example, `foo`
>>>>>>> is not supported by Flink, but hive has such built-in function. So
>>> user
>>>>>>> will have hive's
>>>>>>> behavior for function `foo`. And in next release, Flink realize
>>>>>>> this
>>>> is a
>>>>>>> very popular function
>>>>>>> and add it into Flink's built-in functions, but with different
>>> behavior
>>>>>> as
>>>>>>> hive's. So in next
>>>>>>> release, the behavior changes.
>>>>>>>
>>>>>>> With Timo's approach, IIUC user have to tell the framework
>>>>>>> explicitly
>>>>>> what
>>>>>>> kind of
>>>>>>> built-in functions he would like to use. He can just tell framework
>>> to
>>>>>>> abandon Flink's built-in
>>>>>>> functions, and use hive's instead. User can only choose between
>>>>>>> them,
>>>> but
>>>>>>> not use
>>>>>>> them at the same time. I think this approach is more predictable.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Thanks for the feedback. Just a kindly reminder that the
>>>>>>>> [Proposal]
>>>>>>> section
>>>>>>>> in the google doc was updated, please take a look first and let me
>>>> know
>>>>>>> if
>>>>>>>> you have more questions.
>>>>>>>>
>>>>>>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
>>> wrote:
>>>>>>>>> Hi Timo,
>>>>>>>>>
>>>>>>>>> Re> 1) We should not have the restriction "hive built-in
>>>>>>>>> functions
>>>>>> can
>>>>>>>>> only
>>>>>>>>>> be used when current catalog is hive catalog". Switching a
>>>>>>>>>> catalog
>>>>>>>>>> should only have implications on the cat.db.object resolution
>>>>>>>>>> but
>>>>>> not
>>>>>>>>>> functions. It would be quite convinient for users to use Hive
>>>>>>> built-ins
>>>>>>>>>> even if they use a Confluent schema registry or just the
>>>>>>>>>> in-memory
>>>>>>>>> catalog.
>>>>>>>>>
>>>>>>>>> There might be a misunderstanding here.
>>>>>>>>>
>>>>>>>>> First of all, Hive built-in functions are not part of Flink
>>> built-in
>>>>>>>>> functions, they are catalog functions, thus if the current
>>>>>>>>> catalog
>>> is
>>>>>>>> not a
>>>>>>>>> HiveCatalog but, say, a schema registry catalog, ambiguous
>>> functions
>>>>>>>>> reference just shouldn't be resolved to a different catalog.
>>>>>>>>>
>>>>>>>>> Second, Hive built-in functions can potentially be referenced
>>> across
>>>>>>>>> catalog, but it doesn't have db namespace and we currently just
>>> don't
>>>>>>>> have
>>>>>>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>>>>>> defined,
>>>>>>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>>>>>>>>
>>>>>>>>> 2) I would propose to have separate concepts for catalog and
>>> built-in
>>>>>>>>> functions. In particular it would be nice to modularize built-in
>>>>>>>>> functions. Some built-in functions are very crucial (like AS,
>>>>>>>>> CAST,
>>>>>>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>>>>> maybe
>>>>>>>>> we add more experimental functions in the future or function for
>>> some
>>>>>>>>> special application area (Geo functions, ML functions). A data
>>>>>> platform
>>>>>>>>> team might not want to make every built-in function available.
>>>>>>>>> Or a
>>>>>>>>> function module like ML functions is in a different Maven module.
>>>>>>>>>
>>>>>>>>> I think this is orthogonal to this FLIP, especially we don't have
>>> the
>>>>>>>>> "external built-in functions" anymore and currently the built-in
>>>>>>> function
>>>>>>>>> category remains untouched.
>>>>>>>>>
>>>>>>>>> But just to share some thoughts on the proposal, I'm not sure
>>>>>>>>> about
>>>>>> it:
>>>>>>>>> - I don't know if any other databases handle built-in functions
>>> like
>>>>>>>> that.
>>>>>>>>> Maybe you can give some examples? IMHO, built-in functions are
>>> system
>>>>>>>> info
>>>>>>>>> and should be deterministic, not depending on loaded
>>>>>>>>> libraries. Geo
>>>>>>>>> functions should be either built-in already or just libraries
>>>>>>> functions,
>>>>>>>>> and library functions can be adapted to catalog APIs or of some
>>> other
>>>>>>>>> syntax to use
>>>>>>>>> - I don't know if all use cases stand, and many can be
>>>>>>>>> achieved by
>>>>>>> other
>>>>>>>>> approaches too. E.g. experimental functions can be taken good
>>>>>>>>> care
>>> of
>>>>>>> by
>>>>>>>>> documentations, annotations, etc
>>>>>>>>> - the proposal basically introduces some concept like a pluggable
>>>>>>>> built-in
>>>>>>>>> function catalog, despite the already existing catalog APIs
>>>>>>>>> - it brings in even more complicated scenarios to the design.
>>>>>>>>> E.g.
>>>>>> how
>>>>>>> do
>>>>>>>>> you handle built-in functions in different modules but different
>>>>>> names?
>>>>>>>>> In short, I'm not sure if it really stands and it looks like an
>>>>>>> overkill
>>>>>>>>> to me. I'd rather not go to that route. Related discussion can be
>>> on
>>>>>>> its
>>>>>>>>> own thread.
>>>>>>>>>
>>>>>>>>> 3) Following the suggestion above, we can have a separate
>>>>>>>>> discovery
>>>>>>>>> mechanism for built-in functions. Instead of just going through a
>>>>>>> static
>>>>>>>>> list like in BuiltInFunctionDefinitions, a platform team
>>>>>>>>> should be
>>>>>> able
>>>>>>>>> to select function modules like
>>>>>>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>>>>>>>>> HiveFunctions) or via service discovery;
>>>>>>>>>
>>>>>>>>> Same as above. I'll leave it to its own thread.
>>>>>>>>>
>>>>>>>>> re > 3) Dawid and I discussed the resulution order again. I agree
>>>>>> with
>>>>>>>>> Kurt
>>>>>>>>>> that we should unify built-in function (external or internal)
>>>>>> under a
>>>>>>>>>> common layer. However, the resolution order should be:
>>>>>>>>>>    1. built-in functions
>>>>>>>>>>    2. temporary functions
>>>>>>>>>>    3. regular catalog resolution logic
>>>>>>>>>> Otherwise a temporary function could cause clashes with Flink's
>>>>>>>> built-in
>>>>>>>>>> functions. If you take a look at other vendors, like SQL Server
>>>>>> they
>>>>>>>>>> also do not allow to overwrite built-in functions.
>>>>>>>>> ”I agree with Kurt that we should unify built-in function
>>>>>>>>> (external
>>>>>> or
>>>>>>>>> internal) under a common layer.“ <- I don't think this is what
>>>>>>>>> Kurt
>>>>>>>> means.
>>>>>>>>> Kurt and I are in favor of unifying built-in functions of
>>>>>>>>> external
>>>>>>>> systems
>>>>>>>>> and catalog functions. Did you type a mistake?
>>>>>>>>>
>>>>>>>>> Besides, I'm not sure about the resolution order you proposed.
>>>>>>> Temporary
>>>>>>>>> functions have a lifespan over a session and are only visible to
>>> the
>>>>>>>>> session owner, they are unique to each user, and users create
>>>>>>>>> them
>>> on
>>>>>>>>> purpose to be the highest priority in order to overwrite system
>>> info
>>>>>>>>> (built-in functions in this case).
>>>>>>>>>
>>>>>>>>> In your case, why would users name a temporary function the same
>>> as a
>>>>>>>>> built-in function then? Since using that name in ambiguous
>>>>>>>>> function
>>>>>>>>> reference will always be resolved to built-in functions,
>>>>>>>>> creating a
>>>>>>>>> same-named temp function would be meaningless in the end.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
>>> wrote:
>>>>>>>>>> Hi Jingsong,
>>>>>>>>>>
>>>>>>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>>>>>> should
>>>>>>>>>>> not introduce interfaces to influence the framework. To make
>>>>>>>>>>> Flink itself more powerful, we should implement the functions
>>>>>>>>>>> we need to add.
>>>>>>>>>> Yes, please see the doc.
>>>>>>>>>>
>>>>>>>>>> Re> 2.Non-flink built-in functions are easy for users to change
>>>>>> their
>>>>>>>>>>> behavior. If we support some flink built-in functions in the
>>>>>>>>>>> future but act differently from non-flink built-in, this will
>>> lead
>>>>>>> to
>>>>>>>>>>> changes in user behavior.
>>>>>>>>>> There's no such concept as "external built-in functions" any
>>>>>>>>>> more.
>>>>>>>>>> Built-in functions of external systems will be treated as
>>>>>>>>>> special
>>>>>>>> catalog
>>>>>>>>>> functions.
>>>>>>>>>>
>>>>>>>>>> Re> Another question is, does this fallback include all
>>>>>>>>>>> hive built-in functions? As far as I know, some hive functions
>>>>>>>>>>> have some hacky. If possible, can we start with a white list?
>>>>>>>>>>> Once we implement some functions to flink built-in, we can
>>>>>>>>>>> also update the whitelist.
>>>>>>>>>> Yes, that's something we thought of too. I don't think it's
>>>>>>>>>> super
>>>>>>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
>>>>>>> future
>>>>>>>>>> efforts as a nice-to-have feature.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>
>>>>>>>>>>> Re: > What I want to propose is we can merge #3 and #4, make
>>>>>>>>>>> them
>>>>>>> both
>>>>>>>>>>> under
>>>>>>>>>>>> "catalog" concept, by extending catalog function to make it
>>>>>>>>>>>> have
>>>>>>>>>>> ability to
>>>>>>>>>>>> have built-in catalog functions. Some benefits I can see from
>>> this
>>>>>>>>>>> approach:
>>>>>>>>>>>> 1. We don't have to introduce new concept like external
>>>>>>>>>>>> built-in
>>>>>>>>>>> functions.
>>>>>>>>>>>> Actually I don't see a full story about how to treat a
>>>>>>>>>>>> built-in
>>>>>>>>>>> functions, and it
>>>>>>>>>>>> seems a little bit disrupt with catalog. As a result, you have
>>> to
>>>>>>> make
>>>>>>>>>>> some restriction
>>>>>>>>>>>> like "hive built-in functions can only be used when current
>>>>>> catalog
>>>>>>> is
>>>>>>>>>>> hive catalog".
>>>>>>>>>>>
>>>>>>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some
>>> part
>>>>>> of
>>>>>>>>>>> the doc. I've modified those sections, and they are up to date
>>> now.
>>>>>>>>>>> In short, now built-in function of external systems are defined
>>> as
>>>>>> a
>>>>>>>>>>> special kind of catalog function in Flink, and handled by Flink
>>> as
>>>>>>>>>>> following:
>>>>>>>>>>> - An external built-in function must be associated with a
>>>>>>>>>>> catalog
>>>>>> for
>>>>>>>>>>> the purpose of decoupling flink-table and external systems.
>>>>>>>>>>> - It always resides in front of catalog functions in ambiguous
>>>>>>> function
>>>>>>>>>>> reference order, just like in its own external system
>>>>>>>>>>> - It is a special catalog function that doesn’t have a
>>>>>>> schema/database
>>>>>>>>>>> namespace
>>>>>>>>>>> - It goes thru the same instantiation logic as other user
>>>>>>>>>>> defined
>>>>>>>>>>> catalog functions in the external system
>>>>>>>>>>>
>>>>>>>>>>> Please take another look at the doc, and let me know if you
>>>>>>>>>>> have
>>>>>> more
>>>>>>>>>>> questions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther
>>>>>>>>>>> <tw...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>>
>>>>>>>>>>>> it should not affect the functions and operations we currently
>>>>>> have
>>>>>>> in
>>>>>>>>>>>> SQL. It just categorizes the available built-in functions.
>>>>>>>>>>>> It is
>>>>>>> kind
>>>>>>>>>>>> of
>>>>>>>>>>>> an orthogonal concept to the catalog API but built-in
>>>>>>>>>>>> functions
>>>>>>>> deserve
>>>>>>>>>>>> this special kind of treatment. CatalogFunction still fits
>>>>>> perfectly
>>>>>>>> in
>>>>>>>>>>>> there because the regular catalog object resolution logic
>>>>>>>>>>>> is not
>>>>>>>>>>>> affected. So tables and functions are resolved in the same way
>>> but
>>>>>>>> with
>>>>>>>>>>>> built-in functions that have priority as in the original
>>>>>>>>>>>> design.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
>>>>>>>>>>>>> Does this only affect the functions and operations we
>>>>>>>>>>>>> currently
>>>>>>> have
>>>>>>>>>>>> in SQL
>>>>>>>>>>>>> and
>>>>>>>>>>>>> have no effect on tables, right? Looks like this is an
>>>>>> orthogonal
>>>>>>>>>>>> concept
>>>>>>>>>>>>> with Catalog?
>>>>>>>>>>>>> If the answer are both yes, then the catalog function will
>>>>>>>>>>>>> be a
>>>>>>>> weird
>>>>>>>>>>>>> concept?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>>> yuzhao.cyz@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> The way you proposed are basically the same as what Calcite
>>>>>>> does, I
>>>>>>>>>>>> think
>>>>>>>>>>>>>> we are in the same line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther
>>>>>>>>>>>>>> <twalthr@apache.org
>>>> ,写道:
>>>>>>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
>>>>>>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a
>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>> refactoring to make our function usage more user friendly.
>>>>>>>>>>>>>>>> For the topic of how to organize the builtin operators and
>>>>>>>>>>>> operators
>>>>>>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
>>>>>> way
>>>>>>> is
>>>>>>>>>>>> to make
>>>>>>>>>>>>>> every dialect operators a “Library”, user can specify which
>>>>>>>>>>>> libraries they
>>>>>>>>>>>>>> want to use for a sql query. The builtin operators always
>>> comes
>>>>>>> as
>>>>>>>>>>>> the
>>>>>>>>>>>>>> first class objects and the others are used from the order
>>> they
>>>>>>>>>>>> appears.
>>>>>>>>>>>>>> Maybe you can take a reference.
>>>>>>>>>>>>>>>> [1]
>>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li
>>>>>>>>>>>>>>>> <bowenli86@gmail.com
>>>> ,写道:
>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
>>>>>>>>>>>>>> FunctionCatalog.
>>>>>>>>>>>>>>>>> It's critically helpful to improve function usability in
>>>>>> SQL.
>>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>
>>>>>>>>>>>>>>>>> In short, it:
>>>>>>>>>>>>>>>>> - adds support for precise function reference with
>>>>>>>> fully/partially
>>>>>>>>>>>>>>>>> qualified name
>>>>>>>>>>>>>>>>> - redefines function resolution order for ambiguous
>>> function
>>>>>>>>>>>>>> reference
>>>>>>>>>>>>>>>>> - adds support for Hive's rich built-in functions
>>>>>>>>>>>>>>>>> (support
>>>>>> for
>>>>>>>>>>>> Hive
>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>> defined functions was already added in 1.9.0)
>>>>>>>>>>>>>>>>> - clarifies the concept of temporary functions
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would love to hear your thoughts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bowen
>>>>>> -- 
>>>>>> Xuefu Zhang
>>>>>>
>>>>>> "In Honey We Trust!"
>>>>>>
>>>>
>>> -- 
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi all,

thanks for the healthy discussion. It is already a very long discussion 
with a lot of text. So I will just post my opinion to a couple of 
statements:

 > Hive built-in functions are not part of Flink built-in functions, 
they are catalog functions

That is not entirely true. Correct me if I'm wrong but I think Hive 
built-in functions are also not catalog functions. They are not stored 
in every Hive metastore catalog that is freshly created but are a set of 
functions that are listed somewhere and made available.

 > ambiguous functions reference just shouldn't be resolved to a 
different catalog

I agree. They should not be resolved to a different catalog. That's why 
I am suggesting to split the concept of built-in functions and catalog 
lookup semantics.

 > I don't know if any other databases handle built-in functions like that

What I called "module" is:
- Extension in Postgres [1]
- Plugin in Presto [2]

Btw. Presto even mentions example modules that are similar to the ones 
that we will introduce in the near future both for ML and System XYZ 
compatibility:
"See either the presto-ml module for machine learning functions or the 
presto-teradata-functions module for Teradata-compatible functions, both 
in the root of the Presto source."

 > functions should be either built-in already or just libraries 
functions, and library functions can be adapted to catalog APIs or of 
some other syntax to use

Regarding "built-in already", of course we can add a lot of functions as 
built-ins but we will end-up in a dependency hell in the near future if 
we don't introduce a pluggable approach. Library functions is what you 
also suggest but storing them in a catalog means to always fully qualify 
them or modifying the existing catalog design that was inspired by the 
standard.

I don't think "it brings in even more complicated scenarios to the 
design", it just does clear separation of concerns. Integrating the 
functionality into the current design makes the catalog API more 
complicated.

 > why would users name a temporary function the same as a built-in 
function then?

Because you never know what users do. If they don't, my suggested 
resolution order should not be a problem, right?

 > I don't think hive functions deserves be a function module

Our goal is not to create a Hive clone. We need to think forward and 
Hive is just one of many systems that we can support. Not every built-in 
function behaves and will behave exactly like Hive.

 > regarding temporary functions, there are few systems that support it

IMHO Spark and Hive are not always the best examples for consistent 
design. Systems like Postgres, Presto, or SQL Server should be used as a 
reference. I don't think that a user can overwrite a built-in function 
there.

Regards,
Timo

[1] https://www.postgresql.org/docs/10/extend-extensions.html
[2] https://prestodb.github.io/docs/current/develop/functions.html


On 04.09.19 13:44, Jark Wu wrote:
> Hi all,
>
> Regarding #1 temp function <> built-in function and naming.
> I'm fine with temp functions should precede built-in function and can
> override built-in functions (we already support to override built-in
> function in 1.9).
> If we don't allow the same name as a built-in function, I'm afraid we will
> have compatibility issues in the future.
> Say users register a user defined function named "explode" in 1.9, and we
> support a built-in "explode" function in 1.10.
> Then the user's jobs which call the registered "explode" function in 1.9
> will all fail in 1.10 because of naming conflict.
>
> Regarding #2 "External" built-in functions.
> I think if we store external built-in functions in catalog, then
> "hive1::sqrt" is a good way to go.
> However, I would prefer to support a discovery mechanism (e.g. SPI) for
> built-in functions as Timo suggested above.
> This gives us the flexibility to add Hive or MySQL or Geo or whatever
> function set as built-in functions in an easy way.
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> wrote:
>
>> Hi David,
>>
>> Thank you for sharing your findings. It seems to me that there is no SQL
>> standard regarding temporary functions. There are few systems that support
>> it. Here are what I have found:
>>
>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>> 2. Spark: basically follows Hive (
>>
>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>> )
>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
>> behavior. (
>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>>
>> Because of lack of standard, it's perfectly fine for Flink to define
>> whatever it sees appropriate. Thus, your proposal (no overwriting and must
>> have DB as holder) is one option. The advantage is simplicity, The downside
>> is the deviation from Hive, which is popular and de facto standard in big
>> data world.
>>
>> However, I don't think we have to follow Hive. More importantly, we need a
>> consensus. I have no objection if your proposal is generally agreed upon.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> Just an opinion on the built-in <> temporary functions resolution and
>>> NAMING issue. I think we should not allow overriding the built-in
>>> functions, as this may pose serious issues and to be honest is rather
>>> not feasible and would require major rework. What happens if a user
>>> wants to override CAST? Calls to that function are generated at
>>> different layers of the stack that unfortunately does not always go
>>> through the Catalog API (at least yet). Moreover from what I've checked
>>> no other systems allow overriding the built-in functions. All the
>>> systems I've checked so far register temporary functions in a
>>> database/schema (either special database for temporary functions, or
>>> just current database). What I would suggest is to always register
>>> temporary functions with a 3 part identifier. The same way as tables,
>>> views etc. This effectively means you cannot override built-in
>>> functions. With such approach it is natural that the temporary functions
>>> end up a step lower in the resolution order:
>>>
>>> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>>>
>>> 2. temporary functions (always 3 part path)
>>>
>>> 3. catalog functions (always 3 part path)
>>>
>>> Let me know what do you think.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> On 04/09/2019 06:13, Bowen Li wrote:
>>>> Hi,
>>>>
>>>> I agree with Xuefu that the main controversial points are mainly the
>> two
>>>> places. My thoughts on them:
>>>>
>>>> 1) Determinism of referencing Hive built-in functions. We can either
>>> remove
>>>> Hive built-in functions from ambiguous function resolution and require
>>>> users to use special syntax for their qualified names, or add a config
>>> flag
>>>> to catalog constructor/yaml for turning on and off Hive built-in
>>> functions
>>>> with the flag set to 'false' by default and proper doc added to help
>>> users
>>>> make their decisions.
>>>>
>>>> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>>> function
>>>> resolution order. We believe Flink temp functions should precede Flink
>>>> built-in functions, and I have presented my reasons. Just in case if we
>>>> cannot reach an agreement, I propose forbid users registering temp
>>>> functions in the same name as a built-in function, like MySQL's
>> approach,
>>>> for the moment. It won't have any performance concern, since built-in
>>>> functions are all in memory and thus cost of a name check will be
>> really
>>>> trivial.
>>>>
>>>>
>>>> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:
>>>>
>>>>>  From what I have seen, there are a couple of focal disagreements:
>>>>>
>>>>> 1. Resolution order: temp function --> flink built-in function -->
>>> catalog
>>>>> function vs flink built-in function --> temp function -> catalog
>>> function.
>>>>> 2. "External" built-in functions: how to treat built-in functions in
>>>>> external system and how users reference them
>>>>>
>>>>> For #1, I agree with Bowen that temp function needs to be at the
>> highest
>>>>> priority because that's how a user might overwrite a built-in function
>>>>> without referencing a persistent, overwriting catalog function with a
>>> fully
>>>>> qualified name. Putting built-in functions at the highest priority
>>>>> eliminates that usage.
>>>>>
>>>>> For #2, I saw a general agreement on referencing "external" built-in
>>>>> functions such as those in Hive needs to be explicit and deterministic
>>> even
>>>>> though different approaches are proposed. To limit the scope and
>> simply
>>> the
>>>>> usage, it seems making sense to me to introduce special syntax for
>>> user  to
>>>>> explicitly reference an external built-in function such as hive1::sqrt
>>> or
>>>>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>>> call
>>>>> hive1.getFunction(ObjectPath functionName) where the database name is
>>>>> absent for bulit-in functions available in that catalog hive1. I
>>> understand
>>>>> that Bowen's original proposal was trying to avoid this, but this
>> could
>>>>> turn out to be a clean and simple solution.
>>>>>
>>>>> (Timo's modular approach is great way to "expand" Flink's built-in
>>> function
>>>>> set, which seems orthogonal and complementary to this, which could be
>>>>> tackled in further future work.)
>>>>>
>>>>> I'd be happy to hear further thoughts on the two points.
>>>>>
>>>>> Thanks,
>>>>> Xuefu
>>>>>
>>>>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
>>> the
>>>>>> same
>>>>>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>>>>>> suggestion.
>>>>>>
>>>>>> The reason is backward compatibility. If we follow Bowen's approach,
>>>>> let's
>>>>>> say we
>>>>>> first find function in Flink's built-in functions, and then hive's
>>>>>> built-in. For example, `foo`
>>>>>> is not supported by Flink, but hive has such built-in function. So
>> user
>>>>>> will have hive's
>>>>>> behavior for function `foo`. And in next release, Flink realize this
>>> is a
>>>>>> very popular function
>>>>>> and add it into Flink's built-in functions, but with different
>> behavior
>>>>> as
>>>>>> hive's. So in next
>>>>>> release, the behavior changes.
>>>>>>
>>>>>> With Timo's approach, IIUC user have to tell the framework explicitly
>>>>> what
>>>>>> kind of
>>>>>> built-in functions he would like to use. He can just tell framework
>> to
>>>>>> abandon Flink's built-in
>>>>>> functions, and use hive's instead. User can only choose between them,
>>> but
>>>>>> not use
>>>>>> them at the same time. I think this approach is more predictable.
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>>>>>> section
>>>>>>> in the google doc was updated, please take a look first and let me
>>> know
>>>>>> if
>>>>>>> you have more questions.
>>>>>>>
>>>>>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
>> wrote:
>>>>>>>> Hi Timo,
>>>>>>>>
>>>>>>>> Re> 1) We should not have the restriction "hive built-in functions
>>>>> can
>>>>>>>> only
>>>>>>>>> be used when current catalog is hive catalog". Switching a catalog
>>>>>>>>> should only have implications on the cat.db.object resolution but
>>>>> not
>>>>>>>>> functions. It would be quite convinient for users to use Hive
>>>>>> built-ins
>>>>>>>>> even if they use a Confluent schema registry or just the in-memory
>>>>>>>> catalog.
>>>>>>>>
>>>>>>>> There might be a misunderstanding here.
>>>>>>>>
>>>>>>>> First of all, Hive built-in functions are not part of Flink
>> built-in
>>>>>>>> functions, they are catalog functions, thus if the current catalog
>> is
>>>>>>> not a
>>>>>>>> HiveCatalog but, say, a schema registry catalog, ambiguous
>> functions
>>>>>>>> reference just shouldn't be resolved to a different catalog.
>>>>>>>>
>>>>>>>> Second, Hive built-in functions can potentially be referenced
>> across
>>>>>>>> catalog, but it doesn't have db namespace and we currently just
>> don't
>>>>>>> have
>>>>>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>>>>> defined,
>>>>>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>>>>>>>
>>>>>>>> 2) I would propose to have separate concepts for catalog and
>> built-in
>>>>>>>> functions. In particular it would be nice to modularize built-in
>>>>>>>> functions. Some built-in functions are very crucial (like AS, CAST,
>>>>>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>>>>> maybe
>>>>>>>> we add more experimental functions in the future or function for
>> some
>>>>>>>> special application area (Geo functions, ML functions). A data
>>>>> platform
>>>>>>>> team might not want to make every built-in function available. Or a
>>>>>>>> function module like ML functions is in a different Maven module.
>>>>>>>>
>>>>>>>> I think this is orthogonal to this FLIP, especially we don't have
>> the
>>>>>>>> "external built-in functions" anymore and currently the built-in
>>>>>> function
>>>>>>>> category remains untouched.
>>>>>>>>
>>>>>>>> But just to share some thoughts on the proposal, I'm not sure about
>>>>> it:
>>>>>>>> - I don't know if any other databases handle built-in functions
>> like
>>>>>>> that.
>>>>>>>> Maybe you can give some examples? IMHO, built-in functions are
>> system
>>>>>>> info
>>>>>>>> and should be deterministic, not depending on loaded libraries. Geo
>>>>>>>> functions should be either built-in already or just libraries
>>>>>> functions,
>>>>>>>> and library functions can be adapted to catalog APIs or of some
>> other
>>>>>>>> syntax to use
>>>>>>>> - I don't know if all use cases stand, and many can be achieved by
>>>>>> other
>>>>>>>> approaches too. E.g. experimental functions can be taken good care
>> of
>>>>>> by
>>>>>>>> documentations, annotations, etc
>>>>>>>> - the proposal basically introduces some concept like a pluggable
>>>>>>> built-in
>>>>>>>> function catalog, despite the already existing catalog APIs
>>>>>>>> - it brings in even more complicated scenarios to the design. E.g.
>>>>> how
>>>>>> do
>>>>>>>> you handle built-in functions in different modules but different
>>>>> names?
>>>>>>>> In short, I'm not sure if it really stands and it looks like an
>>>>>> overkill
>>>>>>>> to me. I'd rather not go to that route. Related discussion can be
>> on
>>>>>> its
>>>>>>>> own thread.
>>>>>>>>
>>>>>>>> 3) Following the suggestion above, we can have a separate discovery
>>>>>>>> mechanism for built-in functions. Instead of just going through a
>>>>>> static
>>>>>>>> list like in BuiltInFunctionDefinitions, a platform team should be
>>>>> able
>>>>>>>> to select function modules like
>>>>>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>>>>>>>> HiveFunctions) or via service discovery;
>>>>>>>>
>>>>>>>> Same as above. I'll leave it to its own thread.
>>>>>>>>
>>>>>>>> re > 3) Dawid and I discussed the resulution order again. I agree
>>>>> with
>>>>>>>> Kurt
>>>>>>>>> that we should unify built-in function (external or internal)
>>>>> under a
>>>>>>>>> common layer. However, the resolution order should be:
>>>>>>>>>    1. built-in functions
>>>>>>>>>    2. temporary functions
>>>>>>>>>    3. regular catalog resolution logic
>>>>>>>>> Otherwise a temporary function could cause clashes with Flink's
>>>>>>> built-in
>>>>>>>>> functions. If you take a look at other vendors, like SQL Server
>>>>> they
>>>>>>>>> also do not allow to overwrite built-in functions.
>>>>>>>> ”I agree with Kurt that we should unify built-in function (external
>>>>> or
>>>>>>>> internal) under a common layer.“ <- I don't think this is what Kurt
>>>>>>> means.
>>>>>>>> Kurt and I are in favor of unifying built-in functions of external
>>>>>>> systems
>>>>>>>> and catalog functions. Did you type a mistake?
>>>>>>>>
>>>>>>>> Besides, I'm not sure about the resolution order you proposed.
>>>>>> Temporary
>>>>>>>> functions have a lifespan over a session and are only visible to
>> the
>>>>>>>> session owner, they are unique to each user, and users create them
>> on
>>>>>>>> purpose to be the highest priority in order to overwrite system
>> info
>>>>>>>> (built-in functions in this case).
>>>>>>>>
>>>>>>>> In your case, why would users name a temporary function the same
>> as a
>>>>>>>> built-in function then? Since using that name in ambiguous function
>>>>>>>> reference will always be resolved to built-in functions, creating a
>>>>>>>> same-named temp function would be meaningless in the end.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
>> wrote:
>>>>>>>>> Hi Jingsong,
>>>>>>>>>
>>>>>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>>>>> should
>>>>>>>>>> not introduce interfaces to influence the framework. To make
>>>>>>>>>> Flink itself more powerful, we should implement the functions
>>>>>>>>>> we need to add.
>>>>>>>>> Yes, please see the doc.
>>>>>>>>>
>>>>>>>>> Re> 2.Non-flink built-in functions are easy for users to change
>>>>> their
>>>>>>>>>> behavior. If we support some flink built-in functions in the
>>>>>>>>>> future but act differently from non-flink built-in, this will
>> lead
>>>>>> to
>>>>>>>>>> changes in user behavior.
>>>>>>>>> There's no such concept as "external built-in functions" any more.
>>>>>>>>> Built-in functions of external systems will be treated as special
>>>>>>> catalog
>>>>>>>>> functions.
>>>>>>>>>
>>>>>>>>> Re> Another question is, does this fallback include all
>>>>>>>>>> hive built-in functions? As far as I know, some hive functions
>>>>>>>>>> have some hacky. If possible, can we start with a white list?
>>>>>>>>>> Once we implement some functions to flink built-in, we can
>>>>>>>>>> also update the whitelist.
>>>>>>>>> Yes, that's something we thought of too. I don't think it's super
>>>>>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
>>>>>> future
>>>>>>>>> efforts as a nice-to-have feature.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
>>>>> wrote:
>>>>>>>>>> Hi Kurt,
>>>>>>>>>>
>>>>>>>>>> Re: > What I want to propose is we can merge #3 and #4, make them
>>>>>> both
>>>>>>>>>> under
>>>>>>>>>>> "catalog" concept, by extending catalog function to make it have
>>>>>>>>>> ability to
>>>>>>>>>>> have built-in catalog functions. Some benefits I can see from
>> this
>>>>>>>>>> approach:
>>>>>>>>>>> 1. We don't have to introduce new concept like external built-in
>>>>>>>>>> functions.
>>>>>>>>>>> Actually I don't see a full story about how to treat a built-in
>>>>>>>>>> functions, and it
>>>>>>>>>>> seems a little bit disrupt with catalog. As a result, you have
>> to
>>>>>> make
>>>>>>>>>> some restriction
>>>>>>>>>>> like "hive built-in functions can only be used when current
>>>>> catalog
>>>>>> is
>>>>>>>>>> hive catalog".
>>>>>>>>>>
>>>>>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some
>> part
>>>>> of
>>>>>>>>>> the doc. I've modified those sections, and they are up to date
>> now.
>>>>>>>>>> In short, now built-in function of external systems are defined
>> as
>>>>> a
>>>>>>>>>> special kind of catalog function in Flink, and handled by Flink
>> as
>>>>>>>>>> following:
>>>>>>>>>> - An external built-in function must be associated with a catalog
>>>>> for
>>>>>>>>>> the purpose of decoupling flink-table and external systems.
>>>>>>>>>> - It always resides in front of catalog functions in ambiguous
>>>>>> function
>>>>>>>>>> reference order, just like in its own external system
>>>>>>>>>> - It is a special catalog function that doesn’t have a
>>>>>> schema/database
>>>>>>>>>> namespace
>>>>>>>>>> - It goes thru the same instantiation logic as other user defined
>>>>>>>>>> catalog functions in the external system
>>>>>>>>>>
>>>>>>>>>> Please take another look at the doc, and let me know if you have
>>>>> more
>>>>>>>>>> questions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>
>>>>>>>>>>> it should not affect the functions and operations we currently
>>>>> have
>>>>>> in
>>>>>>>>>>> SQL. It just categorizes the available built-in functions. It is
>>>>>> kind
>>>>>>>>>>> of
>>>>>>>>>>> an orthogonal concept to the catalog API but built-in functions
>>>>>>> deserve
>>>>>>>>>>> this special kind of treatment. CatalogFunction still fits
>>>>> perfectly
>>>>>>> in
>>>>>>>>>>> there because the regular catalog object resolution logic is not
>>>>>>>>>>> affected. So tables and functions are resolved in the same way
>> but
>>>>>>> with
>>>>>>>>>>> built-in functions that have priority as in the original design.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
>>>>>>>>>>>> Does this only affect the functions and operations we currently
>>>>>> have
>>>>>>>>>>> in SQL
>>>>>>>>>>>> and
>>>>>>>>>>>> have no effect on tables, right? Looks like this is an
>>>>> orthogonal
>>>>>>>>>>> concept
>>>>>>>>>>>> with Catalog?
>>>>>>>>>>>> If the answer are both yes, then the catalog function will be a
>>>>>>> weird
>>>>>>>>>>>> concept?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Kurt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
>> yuzhao.cyz@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> The way you proposed are basically the same as what Calcite
>>>>>> does, I
>>>>>>>>>>> think
>>>>>>>>>>>>> we are in the same line.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
>>> ,写道:
>>>>>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
>>>>>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a useful
>>>>>>>>>>>>> refactoring to make our function usage more user friendly.
>>>>>>>>>>>>>>> For the topic of how to organize the builtin operators and
>>>>>>>>>>> operators
>>>>>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
>>>>> way
>>>>>> is
>>>>>>>>>>> to make
>>>>>>>>>>>>> every dialect operators a “Library”, user can specify which
>>>>>>>>>>> libraries they
>>>>>>>>>>>>> want to use for a sql query. The builtin operators always
>> comes
>>>>>> as
>>>>>>>>>>> the
>>>>>>>>>>>>> first class objects and the others are used from the order
>> they
>>>>>>>>>>> appears.
>>>>>>>>>>>>> Maybe you can take a reference.
>>>>>>>>>>>>>>> [1]
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
>>> ,写道:
>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
>>>>>>>>>>>>> FunctionCatalog.
>>>>>>>>>>>>>>>> It's critically helpful to improve function usability in
>>>>> SQL.
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>>>>>>>>>>>>>> In short, it:
>>>>>>>>>>>>>>>> - adds support for precise function reference with
>>>>>>> fully/partially
>>>>>>>>>>>>>>>> qualified name
>>>>>>>>>>>>>>>> - redefines function resolution order for ambiguous
>> function
>>>>>>>>>>>>> reference
>>>>>>>>>>>>>>>> - adds support for Hive's rich built-in functions (support
>>>>> for
>>>>>>>>>>> Hive
>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>> defined functions was already added in 1.9.0)
>>>>>>>>>>>>>>>> - clarifies the concept of temporary functions
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Would love to hear your thoughts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bowen
>>>>> --
>>>>> Xuefu Zhang
>>>>>
>>>>> "In Honey We Trust!"
>>>>>
>>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Jark Wu <im...@gmail.com>.
Hi all,

Regarding #1 temp function <> built-in function and naming.
I'm fine with temp functions should precede built-in function and can
override built-in functions (we already support to override built-in
function in 1.9).
If we don't allow the same name as a built-in function, I'm afraid we will
have compatibility issues in the future.
Say users register a user defined function named "explode" in 1.9, and we
support a built-in "explode" function in 1.10.
Then the user's jobs which call the registered "explode" function in 1.9
will all fail in 1.10 because of naming conflict.

Regarding #2 "External" built-in functions.
I think if we store external built-in functions in catalog, then
"hive1::sqrt" is a good way to go.
However, I would prefer to support a discovery mechanism (e.g. SPI) for
built-in functions as Timo suggested above.
This gives us the flexibility to add Hive or MySQL or Geo or whatever
function set as built-in functions in an easy way.

Best,
Jark

On Wed, 4 Sep 2019 at 17:47, Xuefu Z <us...@gmail.com> wrote:

> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that support
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and must
> have DB as holder) is one option. The advantage is simplicity, The downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need a
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > Hi all,
> >
> > Just an opinion on the built-in <> temporary functions resolution and
> > NAMING issue. I think we should not allow overriding the built-in
> > functions, as this may pose serious issues and to be honest is rather
> > not feasible and would require major rework. What happens if a user
> > wants to override CAST? Calls to that function are generated at
> > different layers of the stack that unfortunately does not always go
> > through the Catalog API (at least yet). Moreover from what I've checked
> > no other systems allow overriding the built-in functions. All the
> > systems I've checked so far register temporary functions in a
> > database/schema (either special database for temporary functions, or
> > just current database). What I would suggest is to always register
> > temporary functions with a 3 part identifier. The same way as tables,
> > views etc. This effectively means you cannot override built-in
> > functions. With such approach it is natural that the temporary functions
> > end up a step lower in the resolution order:
> >
> > 1. built-in functions (1 part, maybe 2? - this is still under discussion)
> >
> > 2. temporary functions (always 3 part path)
> >
> > 3. catalog functions (always 3 part path)
> >
> > Let me know what do you think.
> >
> > Best,
> >
> > Dawid
> >
> > On 04/09/2019 06:13, Bowen Li wrote:
> > > Hi,
> > >
> > > I agree with Xuefu that the main controversial points are mainly the
> two
> > > places. My thoughts on them:
> > >
> > > 1) Determinism of referencing Hive built-in functions. We can either
> > remove
> > > Hive built-in functions from ambiguous function resolution and require
> > > users to use special syntax for their qualified names, or add a config
> > flag
> > > to catalog constructor/yaml for turning on and off Hive built-in
> > functions
> > > with the flag set to 'false' by default and proper doc added to help
> > users
> > > make their decisions.
> > >
> > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > function
> > > resolution order. We believe Flink temp functions should precede Flink
> > > built-in functions, and I have presented my reasons. Just in case if we
> > > cannot reach an agreement, I propose forbid users registering temp
> > > functions in the same name as a built-in function, like MySQL's
> approach,
> > > for the moment. It won't have any performance concern, since built-in
> > > functions are all in memory and thus cost of a name check will be
> really
> > > trivial.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:
> > >
> > >> From what I have seen, there are a couple of focal disagreements:
> > >>
> > >> 1. Resolution order: temp function --> flink built-in function -->
> > catalog
> > >> function vs flink built-in function --> temp function -> catalog
> > function.
> > >> 2. "External" built-in functions: how to treat built-in functions in
> > >> external system and how users reference them
> > >>
> > >> For #1, I agree with Bowen that temp function needs to be at the
> highest
> > >> priority because that's how a user might overwrite a built-in function
> > >> without referencing a persistent, overwriting catalog function with a
> > fully
> > >> qualified name. Putting built-in functions at the highest priority
> > >> eliminates that usage.
> > >>
> > >> For #2, I saw a general agreement on referencing "external" built-in
> > >> functions such as those in Hive needs to be explicit and deterministic
> > even
> > >> though different approaches are proposed. To limit the scope and
> simply
> > the
> > >> usage, it seems making sense to me to introduce special syntax for
> > user  to
> > >> explicitly reference an external built-in function such as hive1::sqrt
> > or
> > >> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> > call
> > >> hive1.getFunction(ObjectPath functionName) where the database name is
> > >> absent for bulit-in functions available in that catalog hive1. I
> > understand
> > >> that Bowen's original proposal was trying to avoid this, but this
> could
> > >> turn out to be a clean and simple solution.
> > >>
> > >> (Timo's modular approach is great way to "expand" Flink's built-in
> > function
> > >> set, which seems orthogonal and complementary to this, which could be
> > >> tackled in further future work.)
> > >>
> > >> I'd be happy to hear further thoughts on the two points.
> > >>
> > >> Thanks,
> > >> Xuefu
> > >>
> > >> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
> > >>
> > >>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> > the
> > >>> same
> > >>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > >>> suggestion.
> > >>>
> > >>> The reason is backward compatibility. If we follow Bowen's approach,
> > >> let's
> > >>> say we
> > >>> first find function in Flink's built-in functions, and then hive's
> > >>> built-in. For example, `foo`
> > >>> is not supported by Flink, but hive has such built-in function. So
> user
> > >>> will have hive's
> > >>> behavior for function `foo`. And in next release, Flink realize this
> > is a
> > >>> very popular function
> > >>> and add it into Flink's built-in functions, but with different
> behavior
> > >> as
> > >>> hive's. So in next
> > >>> release, the behavior changes.
> > >>>
> > >>> With Timo's approach, IIUC user have to tell the framework explicitly
> > >> what
> > >>> kind of
> > >>> built-in functions he would like to use. He can just tell framework
> to
> > >>> abandon Flink's built-in
> > >>> functions, and use hive's instead. User can only choose between them,
> > but
> > >>> not use
> > >>> them at the same time. I think this approach is more predictable.
> > >>>
> > >>> Best,
> > >>> Kurt
> > >>>
> > >>>
> > >>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > >>> section
> > >>>> in the google doc was updated, please take a look first and let me
> > know
> > >>> if
> > >>>> you have more questions.
> > >>>>
> > >>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com>
> wrote:
> > >>>>
> > >>>>> Hi Timo,
> > >>>>>
> > >>>>> Re> 1) We should not have the restriction "hive built-in functions
> > >> can
> > >>>>> only
> > >>>>>> be used when current catalog is hive catalog". Switching a catalog
> > >>>>>> should only have implications on the cat.db.object resolution but
> > >> not
> > >>>>>> functions. It would be quite convinient for users to use Hive
> > >>> built-ins
> > >>>>>> even if they use a Confluent schema registry or just the in-memory
> > >>>>> catalog.
> > >>>>>
> > >>>>> There might be a misunderstanding here.
> > >>>>>
> > >>>>> First of all, Hive built-in functions are not part of Flink
> built-in
> > >>>>> functions, they are catalog functions, thus if the current catalog
> is
> > >>>> not a
> > >>>>> HiveCatalog but, say, a schema registry catalog, ambiguous
> functions
> > >>>>> reference just shouldn't be resolved to a different catalog.
> > >>>>>
> > >>>>> Second, Hive built-in functions can potentially be referenced
> across
> > >>>>> catalog, but it doesn't have db namespace and we currently just
> don't
> > >>>> have
> > >>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
> > >>> defined,
> > >>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
> > >>>>>
> > >>>>> 2) I would propose to have separate concepts for catalog and
> built-in
> > >>>>> functions. In particular it would be nice to modularize built-in
> > >>>>> functions. Some built-in functions are very crucial (like AS, CAST,
> > >>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> > >> maybe
> > >>>>> we add more experimental functions in the future or function for
> some
> > >>>>> special application area (Geo functions, ML functions). A data
> > >> platform
> > >>>>> team might not want to make every built-in function available. Or a
> > >>>>> function module like ML functions is in a different Maven module.
> > >>>>>
> > >>>>> I think this is orthogonal to this FLIP, especially we don't have
> the
> > >>>>> "external built-in functions" anymore and currently the built-in
> > >>> function
> > >>>>> category remains untouched.
> > >>>>>
> > >>>>> But just to share some thoughts on the proposal, I'm not sure about
> > >> it:
> > >>>>> - I don't know if any other databases handle built-in functions
> like
> > >>>> that.
> > >>>>> Maybe you can give some examples? IMHO, built-in functions are
> system
> > >>>> info
> > >>>>> and should be deterministic, not depending on loaded libraries. Geo
> > >>>>> functions should be either built-in already or just libraries
> > >>> functions,
> > >>>>> and library functions can be adapted to catalog APIs or of some
> other
> > >>>>> syntax to use
> > >>>>> - I don't know if all use cases stand, and many can be achieved by
> > >>> other
> > >>>>> approaches too. E.g. experimental functions can be taken good care
> of
> > >>> by
> > >>>>> documentations, annotations, etc
> > >>>>> - the proposal basically introduces some concept like a pluggable
> > >>>> built-in
> > >>>>> function catalog, despite the already existing catalog APIs
> > >>>>> - it brings in even more complicated scenarios to the design. E.g.
> > >> how
> > >>> do
> > >>>>> you handle built-in functions in different modules but different
> > >> names?
> > >>>>> In short, I'm not sure if it really stands and it looks like an
> > >>> overkill
> > >>>>> to me. I'd rather not go to that route. Related discussion can be
> on
> > >>> its
> > >>>>> own thread.
> > >>>>>
> > >>>>> 3) Following the suggestion above, we can have a separate discovery
> > >>>>> mechanism for built-in functions. Instead of just going through a
> > >>> static
> > >>>>> list like in BuiltInFunctionDefinitions, a platform team should be
> > >> able
> > >>>>> to select function modules like
> > >>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > >>>>> HiveFunctions) or via service discovery;
> > >>>>>
> > >>>>> Same as above. I'll leave it to its own thread.
> > >>>>>
> > >>>>> re > 3) Dawid and I discussed the resulution order again. I agree
> > >> with
> > >>>>> Kurt
> > >>>>>> that we should unify built-in function (external or internal)
> > >> under a
> > >>>>>> common layer. However, the resolution order should be:
> > >>>>>>   1. built-in functions
> > >>>>>>   2. temporary functions
> > >>>>>>   3. regular catalog resolution logic
> > >>>>>> Otherwise a temporary function could cause clashes with Flink's
> > >>>> built-in
> > >>>>>> functions. If you take a look at other vendors, like SQL Server
> > >> they
> > >>>>>> also do not allow to overwrite built-in functions.
> > >>>>> ”I agree with Kurt that we should unify built-in function (external
> > >> or
> > >>>>> internal) under a common layer.“ <- I don't think this is what Kurt
> > >>>> means.
> > >>>>> Kurt and I are in favor of unifying built-in functions of external
> > >>>> systems
> > >>>>> and catalog functions. Did you type a mistake?
> > >>>>>
> > >>>>> Besides, I'm not sure about the resolution order you proposed.
> > >>> Temporary
> > >>>>> functions have a lifespan over a session and are only visible to
> the
> > >>>>> session owner, they are unique to each user, and users create them
> on
> > >>>>> purpose to be the highest priority in order to overwrite system
> info
> > >>>>> (built-in functions in this case).
> > >>>>>
> > >>>>> In your case, why would users name a temporary function the same
> as a
> > >>>>> built-in function then? Since using that name in ambiguous function
> > >>>>> reference will always be resolved to built-in functions, creating a
> > >>>>> same-named temp function would be meaningless in the end.
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Hi Jingsong,
> > >>>>>>
> > >>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
> > >>> should
> > >>>>>>> not introduce interfaces to influence the framework. To make
> > >>>>>>> Flink itself more powerful, we should implement the functions
> > >>>>>>> we need to add.
> > >>>>>> Yes, please see the doc.
> > >>>>>>
> > >>>>>> Re> 2.Non-flink built-in functions are easy for users to change
> > >> their
> > >>>>>>> behavior. If we support some flink built-in functions in the
> > >>>>>>> future but act differently from non-flink built-in, this will
> lead
> > >>> to
> > >>>>>>> changes in user behavior.
> > >>>>>> There's no such concept as "external built-in functions" any more.
> > >>>>>> Built-in functions of external systems will be treated as special
> > >>>> catalog
> > >>>>>> functions.
> > >>>>>>
> > >>>>>> Re> Another question is, does this fallback include all
> > >>>>>>> hive built-in functions? As far as I know, some hive functions
> > >>>>>>> have some hacky. If possible, can we start with a white list?
> > >>>>>>> Once we implement some functions to flink built-in, we can
> > >>>>>>> also update the whitelist.
> > >>>>>> Yes, that's something we thought of too. I don't think it's super
> > >>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
> > >>> future
> > >>>>>> efforts as a nice-to-have feature.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> > >> wrote:
> > >>>>>>> Hi Kurt,
> > >>>>>>>
> > >>>>>>> Re: > What I want to propose is we can merge #3 and #4, make them
> > >>> both
> > >>>>>>> under
> > >>>>>>>> "catalog" concept, by extending catalog function to make it have
> > >>>>>>> ability to
> > >>>>>>>> have built-in catalog functions. Some benefits I can see from
> this
> > >>>>>>> approach:
> > >>>>>>>> 1. We don't have to introduce new concept like external built-in
> > >>>>>>> functions.
> > >>>>>>>> Actually I don't see a full story about how to treat a built-in
> > >>>>>>> functions, and it
> > >>>>>>>> seems a little bit disrupt with catalog. As a result, you have
> to
> > >>> make
> > >>>>>>> some restriction
> > >>>>>>>> like "hive built-in functions can only be used when current
> > >> catalog
> > >>> is
> > >>>>>>> hive catalog".
> > >>>>>>>
> > >>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some
> part
> > >> of
> > >>>>>>> the doc. I've modified those sections, and they are up to date
> now.
> > >>>>>>>
> > >>>>>>> In short, now built-in function of external systems are defined
> as
> > >> a
> > >>>>>>> special kind of catalog function in Flink, and handled by Flink
> as
> > >>>>>>> following:
> > >>>>>>> - An external built-in function must be associated with a catalog
> > >> for
> > >>>>>>> the purpose of decoupling flink-table and external systems.
> > >>>>>>> - It always resides in front of catalog functions in ambiguous
> > >>> function
> > >>>>>>> reference order, just like in its own external system
> > >>>>>>> - It is a special catalog function that doesn’t have a
> > >>> schema/database
> > >>>>>>> namespace
> > >>>>>>> - It goes thru the same instantiation logic as other user defined
> > >>>>>>> catalog functions in the external system
> > >>>>>>>
> > >>>>>>> Please take another look at the doc, and let me know if you have
> > >> more
> > >>>>>>> questions.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > >>>> wrote:
> > >>>>>>>> Hi Kurt,
> > >>>>>>>>
> > >>>>>>>> it should not affect the functions and operations we currently
> > >> have
> > >>> in
> > >>>>>>>> SQL. It just categorizes the available built-in functions. It is
> > >>> kind
> > >>>>>>>> of
> > >>>>>>>> an orthogonal concept to the catalog API but built-in functions
> > >>>> deserve
> > >>>>>>>> this special kind of treatment. CatalogFunction still fits
> > >> perfectly
> > >>>> in
> > >>>>>>>> there because the regular catalog object resolution logic is not
> > >>>>>>>> affected. So tables and functions are resolved in the same way
> but
> > >>>> with
> > >>>>>>>> built-in functions that have priority as in the original design.
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Timo
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
> > >>>>>>>>> Does this only affect the functions and operations we currently
> > >>> have
> > >>>>>>>> in SQL
> > >>>>>>>>> and
> > >>>>>>>>> have no effect on tables, right? Looks like this is an
> > >> orthogonal
> > >>>>>>>> concept
> > >>>>>>>>> with Catalog?
> > >>>>>>>>> If the answer are both yes, then the catalog function will be a
> > >>>> weird
> > >>>>>>>>> concept?
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Kurt
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <
> yuzhao.cyz@gmail.com
> > >>>>>>>> wrote:
> > >>>>>>>>>> The way you proposed are basically the same as what Calcite
> > >>> does, I
> > >>>>>>>> think
> > >>>>>>>>>> we are in the same line.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Danny Chan
> > >>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <twalthr@apache.org
> >,写道:
> > >>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Timo
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
> > >>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a useful
> > >>>>>>>>>> refactoring to make our function usage more user friendly.
> > >>>>>>>>>>>> For the topic of how to organize the builtin operators and
> > >>>>>>>> operators
> > >>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
> > >> way
> > >>> is
> > >>>>>>>> to make
> > >>>>>>>>>> every dialect operators a “Library”, user can specify which
> > >>>>>>>> libraries they
> > >>>>>>>>>> want to use for a sql query. The builtin operators always
> comes
> > >>> as
> > >>>>>>>> the
> > >>>>>>>>>> first class objects and the others are used from the order
> they
> > >>>>>>>> appears.
> > >>>>>>>>>> Maybe you can take a reference.
> > >>>>>>>>>>>> [1]
> > >>
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Danny Chan
> > >>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bowenli86@gmail.com
> >,写道:
> > >>>>>>>>>>>>> Hi folks,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
> > >>>>>>>>>> FunctionCatalog.
> > >>>>>>>>>>>>> It's critically helpful to improve function usability in
> > >> SQL.
> > >>>>>>>>>>>>>
> > >>
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > >>>>>>>>>>>>> In short, it:
> > >>>>>>>>>>>>> - adds support for precise function reference with
> > >>>> fully/partially
> > >>>>>>>>>>>>> qualified name
> > >>>>>>>>>>>>> - redefines function resolution order for ambiguous
> function
> > >>>>>>>>>> reference
> > >>>>>>>>>>>>> - adds support for Hive's rich built-in functions (support
> > >> for
> > >>>>>>>> Hive
> > >>>>>>>>>> user
> > >>>>>>>>>>>>> defined functions was already added in 1.9.0)
> > >>>>>>>>>>>>> - clarifies the concept of temporary functions
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Would love to hear your thoughts.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Bowen
> > >>>>>>>>
> > >>
> > >> --
> > >> Xuefu Zhang
> > >>
> > >> "In Honey We Trust!"
> > >>
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
Hi David,

Thank you for sharing your findings. It seems to me that there is no SQL
standard regarding temporary functions. There are few systems that support
it. Here are what I have found:

1. Hive: no DB qualifier allowed. Can overwrite built-in.
2. Spark: basically follows Hive (
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
)
3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
behavior. (
http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)

Because of lack of standard, it's perfectly fine for Flink to define
whatever it sees appropriate. Thus, your proposal (no overwriting and must
have DB as holder) is one option. The advantage is simplicity, The downside
is the deviation from Hive, which is popular and de facto standard in big
data world.

However, I don't think we have to follow Hive. More importantly, we need a
consensus. I have no objection if your proposal is generally agreed upon.

Thanks,
Xuefu

On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
> > Hi,
> >
> > I agree with Xuefu that the main controversial points are mainly the two
> > places. My thoughts on them:
> >
> > 1) Determinism of referencing Hive built-in functions. We can either
> remove
> > Hive built-in functions from ambiguous function resolution and require
> > users to use special syntax for their qualified names, or add a config
> flag
> > to catalog constructor/yaml for turning on and off Hive built-in
> functions
> > with the flag set to 'false' by default and proper doc added to help
> users
> > make their decisions.
> >
> > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> function
> > resolution order. We believe Flink temp functions should precede Flink
> > built-in functions, and I have presented my reasons. Just in case if we
> > cannot reach an agreement, I propose forbid users registering temp
> > functions in the same name as a built-in function, like MySQL's approach,
> > for the moment. It won't have any performance concern, since built-in
> > functions are all in memory and thus cost of a name check will be really
> > trivial.
> >
> >
> > On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:
> >
> >> From what I have seen, there are a couple of focal disagreements:
> >>
> >> 1. Resolution order: temp function --> flink built-in function -->
> catalog
> >> function vs flink built-in function --> temp function -> catalog
> function.
> >> 2. "External" built-in functions: how to treat built-in functions in
> >> external system and how users reference them
> >>
> >> For #1, I agree with Bowen that temp function needs to be at the highest
> >> priority because that's how a user might overwrite a built-in function
> >> without referencing a persistent, overwriting catalog function with a
> fully
> >> qualified name. Putting built-in functions at the highest priority
> >> eliminates that usage.
> >>
> >> For #2, I saw a general agreement on referencing "external" built-in
> >> functions such as those in Hive needs to be explicit and deterministic
> even
> >> though different approaches are proposed. To limit the scope and simply
> the
> >> usage, it seems making sense to me to introduce special syntax for
> user  to
> >> explicitly reference an external built-in function such as hive1::sqrt
> or
> >> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
> call
> >> hive1.getFunction(ObjectPath functionName) where the database name is
> >> absent for bulit-in functions available in that catalog hive1. I
> understand
> >> that Bowen's original proposal was trying to avoid this, but this could
> >> turn out to be a clean and simple solution.
> >>
> >> (Timo's modular approach is great way to "expand" Flink's built-in
> function
> >> set, which seems orthogonal and complementary to this, which could be
> >> tackled in further future work.)
> >>
> >> I'd be happy to hear further thoughts on the two points.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
> >>
> >>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is
> the
> >>> same
> >>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> >>> suggestion.
> >>>
> >>> The reason is backward compatibility. If we follow Bowen's approach,
> >> let's
> >>> say we
> >>> first find function in Flink's built-in functions, and then hive's
> >>> built-in. For example, `foo`
> >>> is not supported by Flink, but hive has such built-in function. So user
> >>> will have hive's
> >>> behavior for function `foo`. And in next release, Flink realize this
> is a
> >>> very popular function
> >>> and add it into Flink's built-in functions, but with different behavior
> >> as
> >>> hive's. So in next
> >>> release, the behavior changes.
> >>>
> >>> With Timo's approach, IIUC user have to tell the framework explicitly
> >> what
> >>> kind of
> >>> built-in functions he would like to use. He can just tell framework to
> >>> abandon Flink's built-in
> >>> functions, and use hive's instead. User can only choose between them,
> but
> >>> not use
> >>> them at the same time. I think this approach is more predictable.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
> >>> section
> >>>> in the google doc was updated, please take a look first and let me
> know
> >>> if
> >>>> you have more questions.
> >>>>
> >>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
> >>>>
> >>>>> Hi Timo,
> >>>>>
> >>>>> Re> 1) We should not have the restriction "hive built-in functions
> >> can
> >>>>> only
> >>>>>> be used when current catalog is hive catalog". Switching a catalog
> >>>>>> should only have implications on the cat.db.object resolution but
> >> not
> >>>>>> functions. It would be quite convinient for users to use Hive
> >>> built-ins
> >>>>>> even if they use a Confluent schema registry or just the in-memory
> >>>>> catalog.
> >>>>>
> >>>>> There might be a misunderstanding here.
> >>>>>
> >>>>> First of all, Hive built-in functions are not part of Flink built-in
> >>>>> functions, they are catalog functions, thus if the current catalog is
> >>>> not a
> >>>>> HiveCatalog but, say, a schema registry catalog, ambiguous functions
> >>>>> reference just shouldn't be resolved to a different catalog.
> >>>>>
> >>>>> Second, Hive built-in functions can potentially be referenced across
> >>>>> catalog, but it doesn't have db namespace and we currently just don't
> >>>> have
> >>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
> >>> defined,
> >>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
> >>>>>
> >>>>> 2) I would propose to have separate concepts for catalog and built-in
> >>>>> functions. In particular it would be nice to modularize built-in
> >>>>> functions. Some built-in functions are very crucial (like AS, CAST,
> >>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> >> maybe
> >>>>> we add more experimental functions in the future or function for some
> >>>>> special application area (Geo functions, ML functions). A data
> >> platform
> >>>>> team might not want to make every built-in function available. Or a
> >>>>> function module like ML functions is in a different Maven module.
> >>>>>
> >>>>> I think this is orthogonal to this FLIP, especially we don't have the
> >>>>> "external built-in functions" anymore and currently the built-in
> >>> function
> >>>>> category remains untouched.
> >>>>>
> >>>>> But just to share some thoughts on the proposal, I'm not sure about
> >> it:
> >>>>> - I don't know if any other databases handle built-in functions like
> >>>> that.
> >>>>> Maybe you can give some examples? IMHO, built-in functions are system
> >>>> info
> >>>>> and should be deterministic, not depending on loaded libraries. Geo
> >>>>> functions should be either built-in already or just libraries
> >>> functions,
> >>>>> and library functions can be adapted to catalog APIs or of some other
> >>>>> syntax to use
> >>>>> - I don't know if all use cases stand, and many can be achieved by
> >>> other
> >>>>> approaches too. E.g. experimental functions can be taken good care of
> >>> by
> >>>>> documentations, annotations, etc
> >>>>> - the proposal basically introduces some concept like a pluggable
> >>>> built-in
> >>>>> function catalog, despite the already existing catalog APIs
> >>>>> - it brings in even more complicated scenarios to the design. E.g.
> >> how
> >>> do
> >>>>> you handle built-in functions in different modules but different
> >> names?
> >>>>> In short, I'm not sure if it really stands and it looks like an
> >>> overkill
> >>>>> to me. I'd rather not go to that route. Related discussion can be on
> >>> its
> >>>>> own thread.
> >>>>>
> >>>>> 3) Following the suggestion above, we can have a separate discovery
> >>>>> mechanism for built-in functions. Instead of just going through a
> >>> static
> >>>>> list like in BuiltInFunctionDefinitions, a platform team should be
> >> able
> >>>>> to select function modules like
> >>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> >>>>> HiveFunctions) or via service discovery;
> >>>>>
> >>>>> Same as above. I'll leave it to its own thread.
> >>>>>
> >>>>> re > 3) Dawid and I discussed the resulution order again. I agree
> >> with
> >>>>> Kurt
> >>>>>> that we should unify built-in function (external or internal)
> >> under a
> >>>>>> common layer. However, the resolution order should be:
> >>>>>>   1. built-in functions
> >>>>>>   2. temporary functions
> >>>>>>   3. regular catalog resolution logic
> >>>>>> Otherwise a temporary function could cause clashes with Flink's
> >>>> built-in
> >>>>>> functions. If you take a look at other vendors, like SQL Server
> >> they
> >>>>>> also do not allow to overwrite built-in functions.
> >>>>> ”I agree with Kurt that we should unify built-in function (external
> >> or
> >>>>> internal) under a common layer.“ <- I don't think this is what Kurt
> >>>> means.
> >>>>> Kurt and I are in favor of unifying built-in functions of external
> >>>> systems
> >>>>> and catalog functions. Did you type a mistake?
> >>>>>
> >>>>> Besides, I'm not sure about the resolution order you proposed.
> >>> Temporary
> >>>>> functions have a lifespan over a session and are only visible to the
> >>>>> session owner, they are unique to each user, and users create them on
> >>>>> purpose to be the highest priority in order to overwrite system info
> >>>>> (built-in functions in this case).
> >>>>>
> >>>>> In your case, why would users name a temporary function the same as a
> >>>>> built-in function then? Since using that name in ambiguous function
> >>>>> reference will always be resolved to built-in functions, creating a
> >>>>> same-named temp function would be meaningless in the end.
> >>>>>
> >>>>>
> >>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Jingsong,
> >>>>>>
> >>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
> >>> should
> >>>>>>> not introduce interfaces to influence the framework. To make
> >>>>>>> Flink itself more powerful, we should implement the functions
> >>>>>>> we need to add.
> >>>>>> Yes, please see the doc.
> >>>>>>
> >>>>>> Re> 2.Non-flink built-in functions are easy for users to change
> >> their
> >>>>>>> behavior. If we support some flink built-in functions in the
> >>>>>>> future but act differently from non-flink built-in, this will lead
> >>> to
> >>>>>>> changes in user behavior.
> >>>>>> There's no such concept as "external built-in functions" any more.
> >>>>>> Built-in functions of external systems will be treated as special
> >>>> catalog
> >>>>>> functions.
> >>>>>>
> >>>>>> Re> Another question is, does this fallback include all
> >>>>>>> hive built-in functions? As far as I know, some hive functions
> >>>>>>> have some hacky. If possible, can we start with a white list?
> >>>>>>> Once we implement some functions to flink built-in, we can
> >>>>>>> also update the whitelist.
> >>>>>> Yes, that's something we thought of too. I don't think it's super
> >>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
> >>> future
> >>>>>> efforts as a nice-to-have feature.
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> >> wrote:
> >>>>>>> Hi Kurt,
> >>>>>>>
> >>>>>>> Re: > What I want to propose is we can merge #3 and #4, make them
> >>> both
> >>>>>>> under
> >>>>>>>> "catalog" concept, by extending catalog function to make it have
> >>>>>>> ability to
> >>>>>>>> have built-in catalog functions. Some benefits I can see from this
> >>>>>>> approach:
> >>>>>>>> 1. We don't have to introduce new concept like external built-in
> >>>>>>> functions.
> >>>>>>>> Actually I don't see a full story about how to treat a built-in
> >>>>>>> functions, and it
> >>>>>>>> seems a little bit disrupt with catalog. As a result, you have to
> >>> make
> >>>>>>> some restriction
> >>>>>>>> like "hive built-in functions can only be used when current
> >> catalog
> >>> is
> >>>>>>> hive catalog".
> >>>>>>>
> >>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some part
> >> of
> >>>>>>> the doc. I've modified those sections, and they are up to date now.
> >>>>>>>
> >>>>>>> In short, now built-in function of external systems are defined as
> >> a
> >>>>>>> special kind of catalog function in Flink, and handled by Flink as
> >>>>>>> following:
> >>>>>>> - An external built-in function must be associated with a catalog
> >> for
> >>>>>>> the purpose of decoupling flink-table and external systems.
> >>>>>>> - It always resides in front of catalog functions in ambiguous
> >>> function
> >>>>>>> reference order, just like in its own external system
> >>>>>>> - It is a special catalog function that doesn’t have a
> >>> schema/database
> >>>>>>> namespace
> >>>>>>> - It goes thru the same instantiation logic as other user defined
> >>>>>>> catalog functions in the external system
> >>>>>>>
> >>>>>>> Please take another look at the doc, and let me know if you have
> >> more
> >>>>>>> questions.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> >>>> wrote:
> >>>>>>>> Hi Kurt,
> >>>>>>>>
> >>>>>>>> it should not affect the functions and operations we currently
> >> have
> >>> in
> >>>>>>>> SQL. It just categorizes the available built-in functions. It is
> >>> kind
> >>>>>>>> of
> >>>>>>>> an orthogonal concept to the catalog API but built-in functions
> >>>> deserve
> >>>>>>>> this special kind of treatment. CatalogFunction still fits
> >> perfectly
> >>>> in
> >>>>>>>> there because the regular catalog object resolution logic is not
> >>>>>>>> affected. So tables and functions are resolved in the same way but
> >>>> with
> >>>>>>>> built-in functions that have priority as in the original design.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Timo
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
> >>>>>>>>> Does this only affect the functions and operations we currently
> >>> have
> >>>>>>>> in SQL
> >>>>>>>>> and
> >>>>>>>>> have no effect on tables, right? Looks like this is an
> >> orthogonal
> >>>>>>>> concept
> >>>>>>>>> with Catalog?
> >>>>>>>>> If the answer are both yes, then the catalog function will be a
> >>>> weird
> >>>>>>>>> concept?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Kurt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>>> The way you proposed are basically the same as what Calcite
> >>> does, I
> >>>>>>>> think
> >>>>>>>>>> we are in the same line.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Danny Chan
> >>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> >>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Timo
> >>>>>>>>>>>
> >>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
> >>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a useful
> >>>>>>>>>> refactoring to make our function usage more user friendly.
> >>>>>>>>>>>> For the topic of how to organize the builtin operators and
> >>>>>>>> operators
> >>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
> >> way
> >>> is
> >>>>>>>> to make
> >>>>>>>>>> every dialect operators a “Library”, user can specify which
> >>>>>>>> libraries they
> >>>>>>>>>> want to use for a sql query. The builtin operators always comes
> >>> as
> >>>>>>>> the
> >>>>>>>>>> first class objects and the others are used from the order they
> >>>>>>>> appears.
> >>>>>>>>>> Maybe you can take a reference.
> >>>>>>>>>>>> [1]
> >>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> >>>>>>>>>>>>> Hi folks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
> >>>>>>>>>> FunctionCatalog.
> >>>>>>>>>>>>> It's critically helpful to improve function usability in
> >> SQL.
> >>>>>>>>>>>>>
> >>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >>>>>>>>>>>>> In short, it:
> >>>>>>>>>>>>> - adds support for precise function reference with
> >>>> fully/partially
> >>>>>>>>>>>>> qualified name
> >>>>>>>>>>>>> - redefines function resolution order for ambiguous function
> >>>>>>>>>> reference
> >>>>>>>>>>>>> - adds support for Hive's rich built-in functions (support
> >> for
> >>>>>>>> Hive
> >>>>>>>>>> user
> >>>>>>>>>>>>> defined functions was already added in 1.9.0)
> >>>>>>>>>>>>> - clarifies the concept of temporary functions
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Would love to hear your thoughts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bowen
> >>>>>>>>
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi all,

Just an opinion on the built-in <> temporary functions resolution and
NAMING issue. I think we should not allow overriding the built-in
functions, as this may pose serious issues and to be honest is rather
not feasible and would require major rework. What happens if a user
wants to override CAST? Calls to that function are generated at
different layers of the stack that unfortunately does not always go
through the Catalog API (at least yet). Moreover from what I've checked
no other systems allow overriding the built-in functions. All the
systems I've checked so far register temporary functions in a
database/schema (either special database for temporary functions, or
just current database). What I would suggest is to always register
temporary functions with a 3 part identifier. The same way as tables,
views etc. This effectively means you cannot override built-in
functions. With such approach it is natural that the temporary functions
end up a step lower in the resolution order:

1. built-in functions (1 part, maybe 2? - this is still under discussion)

2. temporary functions (always 3 part path)

3. catalog functions (always 3 part path)

Let me know what do you think.

Best,

Dawid

On 04/09/2019 06:13, Bowen Li wrote:
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the two
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either remove
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config flag
> to catalog constructor/yaml for turning on and off Hive built-in functions
> with the flag set to 'false' by default and proper doc added to help users
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous function
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's approach,
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be really
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:
>
>> From what I have seen, there are a couple of focal disagreements:
>>
>> 1. Resolution order: temp function --> flink built-in function --> catalog
>> function vs flink built-in function --> temp function -> catalog function.
>> 2. "External" built-in functions: how to treat built-in functions in
>> external system and how users reference them
>>
>> For #1, I agree with Bowen that temp function needs to be at the highest
>> priority because that's how a user might overwrite a built-in function
>> without referencing a persistent, overwriting catalog function with a fully
>> qualified name. Putting built-in functions at the highest priority
>> eliminates that usage.
>>
>> For #2, I saw a general agreement on referencing "external" built-in
>> functions such as those in Hive needs to be explicit and deterministic even
>> though different approaches are proposed. To limit the scope and simply the
>> usage, it seems making sense to me to introduce special syntax for user  to
>> explicitly reference an external built-in function such as hive1::sqrt or
>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API call
>> hive1.getFunction(ObjectPath functionName) where the database name is
>> absent for bulit-in functions available in that catalog hive1. I understand
>> that Bowen's original proposal was trying to avoid this, but this could
>> turn out to be a clean and simple solution.
>>
>> (Timo's modular approach is great way to "expand" Flink's built-in function
>> set, which seems orthogonal and complementary to this, which could be
>> tackled in further future work.)
>>
>> I'd be happy to hear further thoughts on the two points.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
>>
>>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
>>> same
>>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>>> suggestion.
>>>
>>> The reason is backward compatibility. If we follow Bowen's approach,
>> let's
>>> say we
>>> first find function in Flink's built-in functions, and then hive's
>>> built-in. For example, `foo`
>>> is not supported by Flink, but hive has such built-in function. So user
>>> will have hive's
>>> behavior for function `foo`. And in next release, Flink realize this is a
>>> very popular function
>>> and add it into Flink's built-in functions, but with different behavior
>> as
>>> hive's. So in next
>>> release, the behavior changes.
>>>
>>> With Timo's approach, IIUC user have to tell the framework explicitly
>> what
>>> kind of
>>> built-in functions he would like to use. He can just tell framework to
>>> abandon Flink's built-in
>>> functions, and use hive's instead. User can only choose between them, but
>>> not use
>>> them at the same time. I think this approach is more predictable.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Thanks for the feedback. Just a kindly reminder that the [Proposal]
>>> section
>>>> in the google doc was updated, please take a look first and let me know
>>> if
>>>> you have more questions.
>>>>
>>>> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
>>>>
>>>>> Hi Timo,
>>>>>
>>>>> Re> 1) We should not have the restriction "hive built-in functions
>> can
>>>>> only
>>>>>> be used when current catalog is hive catalog". Switching a catalog
>>>>>> should only have implications on the cat.db.object resolution but
>> not
>>>>>> functions. It would be quite convinient for users to use Hive
>>> built-ins
>>>>>> even if they use a Confluent schema registry or just the in-memory
>>>>> catalog.
>>>>>
>>>>> There might be a misunderstanding here.
>>>>>
>>>>> First of all, Hive built-in functions are not part of Flink built-in
>>>>> functions, they are catalog functions, thus if the current catalog is
>>>> not a
>>>>> HiveCatalog but, say, a schema registry catalog, ambiguous functions
>>>>> reference just shouldn't be resolved to a different catalog.
>>>>>
>>>>> Second, Hive built-in functions can potentially be referenced across
>>>>> catalog, but it doesn't have db namespace and we currently just don't
>>>> have
>>>>> a SQL syntax for it. It can be enabled when such a SQL syntax is
>>> defined,
>>>>> e.g. "catalog::function", but it's out of scope of this FLIP.
>>>>>
>>>>> 2) I would propose to have separate concepts for catalog and built-in
>>>>> functions. In particular it would be nice to modularize built-in
>>>>> functions. Some built-in functions are very crucial (like AS, CAST,
>>>>> MINUS), others are more optional but stable (MD5, CONCAT_WS), and
>> maybe
>>>>> we add more experimental functions in the future or function for some
>>>>> special application area (Geo functions, ML functions). A data
>> platform
>>>>> team might not want to make every built-in function available. Or a
>>>>> function module like ML functions is in a different Maven module.
>>>>>
>>>>> I think this is orthogonal to this FLIP, especially we don't have the
>>>>> "external built-in functions" anymore and currently the built-in
>>> function
>>>>> category remains untouched.
>>>>>
>>>>> But just to share some thoughts on the proposal, I'm not sure about
>> it:
>>>>> - I don't know if any other databases handle built-in functions like
>>>> that.
>>>>> Maybe you can give some examples? IMHO, built-in functions are system
>>>> info
>>>>> and should be deterministic, not depending on loaded libraries. Geo
>>>>> functions should be either built-in already or just libraries
>>> functions,
>>>>> and library functions can be adapted to catalog APIs or of some other
>>>>> syntax to use
>>>>> - I don't know if all use cases stand, and many can be achieved by
>>> other
>>>>> approaches too. E.g. experimental functions can be taken good care of
>>> by
>>>>> documentations, annotations, etc
>>>>> - the proposal basically introduces some concept like a pluggable
>>>> built-in
>>>>> function catalog, despite the already existing catalog APIs
>>>>> - it brings in even more complicated scenarios to the design. E.g.
>> how
>>> do
>>>>> you handle built-in functions in different modules but different
>> names?
>>>>> In short, I'm not sure if it really stands and it looks like an
>>> overkill
>>>>> to me. I'd rather not go to that route. Related discussion can be on
>>> its
>>>>> own thread.
>>>>>
>>>>> 3) Following the suggestion above, we can have a separate discovery
>>>>> mechanism for built-in functions. Instead of just going through a
>>> static
>>>>> list like in BuiltInFunctionDefinitions, a platform team should be
>> able
>>>>> to select function modules like
>>>>> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
>>>>> HiveFunctions) or via service discovery;
>>>>>
>>>>> Same as above. I'll leave it to its own thread.
>>>>>
>>>>> re > 3) Dawid and I discussed the resulution order again. I agree
>> with
>>>>> Kurt
>>>>>> that we should unify built-in function (external or internal)
>> under a
>>>>>> common layer. However, the resolution order should be:
>>>>>>   1. built-in functions
>>>>>>   2. temporary functions
>>>>>>   3. regular catalog resolution logic
>>>>>> Otherwise a temporary function could cause clashes with Flink's
>>>> built-in
>>>>>> functions. If you take a look at other vendors, like SQL Server
>> they
>>>>>> also do not allow to overwrite built-in functions.
>>>>> ”I agree with Kurt that we should unify built-in function (external
>> or
>>>>> internal) under a common layer.“ <- I don't think this is what Kurt
>>>> means.
>>>>> Kurt and I are in favor of unifying built-in functions of external
>>>> systems
>>>>> and catalog functions. Did you type a mistake?
>>>>>
>>>>> Besides, I'm not sure about the resolution order you proposed.
>>> Temporary
>>>>> functions have a lifespan over a session and are only visible to the
>>>>> session owner, they are unique to each user, and users create them on
>>>>> purpose to be the highest priority in order to overwrite system info
>>>>> (built-in functions in this case).
>>>>>
>>>>> In your case, why would users name a temporary function the same as a
>>>>> built-in function then? Since using that name in ambiguous function
>>>>> reference will always be resolved to built-in functions, creating a
>>>>> same-named temp function would be meaningless in the end.
>>>>>
>>>>>
>>>>> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
>>>>>
>>>>>> Hi Jingsong,
>>>>>>
>>>>>> Re> 1.Hive built-in functions is an intermediate solution. So we
>>> should
>>>>>>> not introduce interfaces to influence the framework. To make
>>>>>>> Flink itself more powerful, we should implement the functions
>>>>>>> we need to add.
>>>>>> Yes, please see the doc.
>>>>>>
>>>>>> Re> 2.Non-flink built-in functions are easy for users to change
>> their
>>>>>>> behavior. If we support some flink built-in functions in the
>>>>>>> future but act differently from non-flink built-in, this will lead
>>> to
>>>>>>> changes in user behavior.
>>>>>> There's no such concept as "external built-in functions" any more.
>>>>>> Built-in functions of external systems will be treated as special
>>>> catalog
>>>>>> functions.
>>>>>>
>>>>>> Re> Another question is, does this fallback include all
>>>>>>> hive built-in functions? As far as I know, some hive functions
>>>>>>> have some hacky. If possible, can we start with a white list?
>>>>>>> Once we implement some functions to flink built-in, we can
>>>>>>> also update the whitelist.
>>>>>> Yes, that's something we thought of too. I don't think it's super
>>>>>> critical to the scope of this FLIP, thus I'd like to leave it to
>>> future
>>>>>> efforts as a nice-to-have feature.
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
>> wrote:
>>>>>>> Hi Kurt,
>>>>>>>
>>>>>>> Re: > What I want to propose is we can merge #3 and #4, make them
>>> both
>>>>>>> under
>>>>>>>> "catalog" concept, by extending catalog function to make it have
>>>>>>> ability to
>>>>>>>> have built-in catalog functions. Some benefits I can see from this
>>>>>>> approach:
>>>>>>>> 1. We don't have to introduce new concept like external built-in
>>>>>>> functions.
>>>>>>>> Actually I don't see a full story about how to treat a built-in
>>>>>>> functions, and it
>>>>>>>> seems a little bit disrupt with catalog. As a result, you have to
>>> make
>>>>>>> some restriction
>>>>>>>> like "hive built-in functions can only be used when current
>> catalog
>>> is
>>>>>>> hive catalog".
>>>>>>>
>>>>>>> Yes, I've unified #3 and #4 but it seems I didn't update some part
>> of
>>>>>>> the doc. I've modified those sections, and they are up to date now.
>>>>>>>
>>>>>>> In short, now built-in function of external systems are defined as
>> a
>>>>>>> special kind of catalog function in Flink, and handled by Flink as
>>>>>>> following:
>>>>>>> - An external built-in function must be associated with a catalog
>> for
>>>>>>> the purpose of decoupling flink-table and external systems.
>>>>>>> - It always resides in front of catalog functions in ambiguous
>>> function
>>>>>>> reference order, just like in its own external system
>>>>>>> - It is a special catalog function that doesn’t have a
>>> schema/database
>>>>>>> namespace
>>>>>>> - It goes thru the same instantiation logic as other user defined
>>>>>>> catalog functions in the external system
>>>>>>>
>>>>>>> Please take another look at the doc, and let me know if you have
>> more
>>>>>>> questions.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
>>>> wrote:
>>>>>>>> Hi Kurt,
>>>>>>>>
>>>>>>>> it should not affect the functions and operations we currently
>> have
>>> in
>>>>>>>> SQL. It just categorizes the available built-in functions. It is
>>> kind
>>>>>>>> of
>>>>>>>> an orthogonal concept to the catalog API but built-in functions
>>>> deserve
>>>>>>>> this special kind of treatment. CatalogFunction still fits
>> perfectly
>>>> in
>>>>>>>> there because the regular catalog object resolution logic is not
>>>>>>>> affected. So tables and functions are resolved in the same way but
>>>> with
>>>>>>>> built-in functions that have priority as in the original design.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03.09.19 15:26, Kurt Young wrote:
>>>>>>>>> Does this only affect the functions and operations we currently
>>> have
>>>>>>>> in SQL
>>>>>>>>> and
>>>>>>>>> have no effect on tables, right? Looks like this is an
>> orthogonal
>>>>>>>> concept
>>>>>>>>> with Catalog?
>>>>>>>>> If the answer are both yes, then the catalog function will be a
>>>> weird
>>>>>>>>> concept?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kurt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
>>>>>>>> wrote:
>>>>>>>>>> The way you proposed are basically the same as what Calcite
>>> does, I
>>>>>>>> think
>>>>>>>>>> we are in the same line.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
>>>>>>>>>>> This sounds exactly as the module approach I mentioned, no?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>> On 03.09.19 13:42, Danny Chan wrote:
>>>>>>>>>>>> Thanks Bowen for bring up this topic, I think it’s a useful
>>>>>>>>>> refactoring to make our function usage more user friendly.
>>>>>>>>>>>> For the topic of how to organize the builtin operators and
>>>>>>>> operators
>>>>>>>>>> of Hive, here is a solution from Apache Calcite, the Calcite
>> way
>>> is
>>>>>>>> to make
>>>>>>>>>> every dialect operators a “Library”, user can specify which
>>>>>>>> libraries they
>>>>>>>>>> want to use for a sql query. The builtin operators always comes
>>> as
>>>>>>>> the
>>>>>>>>>> first class objects and the others are used from the order they
>>>>>>>> appears.
>>>>>>>>>> Maybe you can take a reference.
>>>>>>>>>>>> [1]
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to kick off a discussion on reworking Flink's
>>>>>>>>>> FunctionCatalog.
>>>>>>>>>>>>> It's critically helpful to improve function usability in
>> SQL.
>>>>>>>>>>>>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>>>>>>>>>>> In short, it:
>>>>>>>>>>>>> - adds support for precise function reference with
>>>> fully/partially
>>>>>>>>>>>>> qualified name
>>>>>>>>>>>>> - redefines function resolution order for ambiguous function
>>>>>>>>>> reference
>>>>>>>>>>>>> - adds support for Hive's rich built-in functions (support
>> for
>>>>>>>> Hive
>>>>>>>>>> user
>>>>>>>>>>>>> defined functions was already added in 1.9.0)
>>>>>>>>>>>>> - clarifies the concept of temporary functions
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would love to hear your thoughts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bowen
>>>>>>>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi,

I agree with Xuefu that the main controversial points are mainly the two
places. My thoughts on them:

1) Determinism of referencing Hive built-in functions. We can either remove
Hive built-in functions from ambiguous function resolution and require
users to use special syntax for their qualified names, or add a config flag
to catalog constructor/yaml for turning on and off Hive built-in functions
with the flag set to 'false' by default and proper doc added to help users
make their decisions.

2) Flink temp functions v.s. Flink built-in functions in ambiguous function
resolution order. We believe Flink temp functions should precede Flink
built-in functions, and I have presented my reasons. Just in case if we
cannot reach an agreement, I propose forbid users registering temp
functions in the same name as a built-in function, like MySQL's approach,
for the moment. It won't have any performance concern, since built-in
functions are all in memory and thus cost of a name check will be really
trivial.


On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z <us...@gmail.com> wrote:

> From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function --> catalog
> function vs flink built-in function --> temp function -> catalog function.
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the highest
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a fully
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic even
> though different approaches are proposed. To limit the scope and simply the
> usage, it seems making sense to me to introduce special syntax for user  to
> explicitly reference an external built-in function such as hive1::sqrt or
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API call
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I understand
> that Bowen's original proposal was trying to avoid this, but this could
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in function
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:
>
> > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
> > same
> > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > suggestion.
> >
> > The reason is backward compatibility. If we follow Bowen's approach,
> let's
> > say we
> > first find function in Flink's built-in functions, and then hive's
> > built-in. For example, `foo`
> > is not supported by Flink, but hive has such built-in function. So user
> > will have hive's
> > behavior for function `foo`. And in next release, Flink realize this is a
> > very popular function
> > and add it into Flink's built-in functions, but with different behavior
> as
> > hive's. So in next
> > release, the behavior changes.
> >
> > With Timo's approach, IIUC user have to tell the framework explicitly
> what
> > kind of
> > built-in functions he would like to use. He can just tell framework to
> > abandon Flink's built-in
> > functions, and use hive's instead. User can only choose between them, but
> > not use
> > them at the same time. I think this approach is more predictable.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > section
> > > in the google doc was updated, please take a look first and let me know
> > if
> > > you have more questions.
> > >
> > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
> > >
> > > > Hi Timo,
> > > >
> > > > Re> 1) We should not have the restriction "hive built-in functions
> can
> > > > only
> > > > > be used when current catalog is hive catalog". Switching a catalog
> > > > > should only have implications on the cat.db.object resolution but
> not
> > > > > functions. It would be quite convinient for users to use Hive
> > built-ins
> > > > > even if they use a Confluent schema registry or just the in-memory
> > > > catalog.
> > > >
> > > > There might be a misunderstanding here.
> > > >
> > > > First of all, Hive built-in functions are not part of Flink built-in
> > > > functions, they are catalog functions, thus if the current catalog is
> > > not a
> > > > HiveCatalog but, say, a schema registry catalog, ambiguous functions
> > > > reference just shouldn't be resolved to a different catalog.
> > > >
> > > > Second, Hive built-in functions can potentially be referenced across
> > > > catalog, but it doesn't have db namespace and we currently just don't
> > > have
> > > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> > defined,
> > > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > > >
> > > > 2) I would propose to have separate concepts for catalog and built-in
> > > > functions. In particular it would be nice to modularize built-in
> > > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and
> maybe
> > > > we add more experimental functions in the future or function for some
> > > > special application area (Geo functions, ML functions). A data
> platform
> > > > team might not want to make every built-in function available. Or a
> > > > function module like ML functions is in a different Maven module.
> > > >
> > > > I think this is orthogonal to this FLIP, especially we don't have the
> > > > "external built-in functions" anymore and currently the built-in
> > function
> > > > category remains untouched.
> > > >
> > > > But just to share some thoughts on the proposal, I'm not sure about
> it:
> > > > - I don't know if any other databases handle built-in functions like
> > > that.
> > > > Maybe you can give some examples? IMHO, built-in functions are system
> > > info
> > > > and should be deterministic, not depending on loaded libraries. Geo
> > > > functions should be either built-in already or just libraries
> > functions,
> > > > and library functions can be adapted to catalog APIs or of some other
> > > > syntax to use
> > > > - I don't know if all use cases stand, and many can be achieved by
> > other
> > > > approaches too. E.g. experimental functions can be taken good care of
> > by
> > > > documentations, annotations, etc
> > > > - the proposal basically introduces some concept like a pluggable
> > > built-in
> > > > function catalog, despite the already existing catalog APIs
> > > > - it brings in even more complicated scenarios to the design. E.g.
> how
> > do
> > > > you handle built-in functions in different modules but different
> names?
> > > >
> > > > In short, I'm not sure if it really stands and it looks like an
> > overkill
> > > > to me. I'd rather not go to that route. Related discussion can be on
> > its
> > > > own thread.
> > > >
> > > > 3) Following the suggestion above, we can have a separate discovery
> > > > mechanism for built-in functions. Instead of just going through a
> > static
> > > > list like in BuiltInFunctionDefinitions, a platform team should be
> able
> > > > to select function modules like
> > > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > > HiveFunctions) or via service discovery;
> > > >
> > > > Same as above. I'll leave it to its own thread.
> > > >
> > > > re > 3) Dawid and I discussed the resulution order again. I agree
> with
> > > > Kurt
> > > > > that we should unify built-in function (external or internal)
> under a
> > > > > common layer. However, the resolution order should be:
> > > > >   1. built-in functions
> > > > >   2. temporary functions
> > > > >   3. regular catalog resolution logic
> > > > > Otherwise a temporary function could cause clashes with Flink's
> > > built-in
> > > > > functions. If you take a look at other vendors, like SQL Server
> they
> > > > > also do not allow to overwrite built-in functions.
> > > >
> > > > ”I agree with Kurt that we should unify built-in function (external
> or
> > > > internal) under a common layer.“ <- I don't think this is what Kurt
> > > means.
> > > > Kurt and I are in favor of unifying built-in functions of external
> > > systems
> > > > and catalog functions. Did you type a mistake?
> > > >
> > > > Besides, I'm not sure about the resolution order you proposed.
> > Temporary
> > > > functions have a lifespan over a session and are only visible to the
> > > > session owner, they are unique to each user, and users create them on
> > > > purpose to be the highest priority in order to overwrite system info
> > > > (built-in functions in this case).
> > > >
> > > > In your case, why would users name a temporary function the same as a
> > > > built-in function then? Since using that name in ambiguous function
> > > > reference will always be resolved to built-in functions, creating a
> > > > same-named temp function would be meaningless in the end.
> > > >
> > > >
> > > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
> > > >
> > > >> Hi Jingsong,
> > > >>
> > > >> Re> 1.Hive built-in functions is an intermediate solution. So we
> > should
> > > >> > not introduce interfaces to influence the framework. To make
> > > >> > Flink itself more powerful, we should implement the functions
> > > >> > we need to add.
> > > >>
> > > >> Yes, please see the doc.
> > > >>
> > > >> Re> 2.Non-flink built-in functions are easy for users to change
> their
> > > >> > behavior. If we support some flink built-in functions in the
> > > >> > future but act differently from non-flink built-in, this will lead
> > to
> > > >> > changes in user behavior.
> > > >>
> > > >> There's no such concept as "external built-in functions" any more.
> > > >> Built-in functions of external systems will be treated as special
> > > catalog
> > > >> functions.
> > > >>
> > > >> Re> Another question is, does this fallback include all
> > > >> > hive built-in functions? As far as I know, some hive functions
> > > >> > have some hacky. If possible, can we start with a white list?
> > > >> > Once we implement some functions to flink built-in, we can
> > > >> > also update the whitelist.
> > > >>
> > > >> Yes, that's something we thought of too. I don't think it's super
> > > >> critical to the scope of this FLIP, thus I'd like to leave it to
> > future
> > > >> efforts as a nice-to-have feature.
> > > >>
> > > >>
> > > >> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Kurt,
> > > >>>
> > > >>> Re: > What I want to propose is we can merge #3 and #4, make them
> > both
> > > >>> under
> > > >>> >"catalog" concept, by extending catalog function to make it have
> > > >>> ability to
> > > >>> >have built-in catalog functions. Some benefits I can see from this
> > > >>> approach:
> > > >>> >1. We don't have to introduce new concept like external built-in
> > > >>> functions.
> > > >>> >Actually I don't see a full story about how to treat a built-in
> > > >>> functions, and it
> > > >>> >seems a little bit disrupt with catalog. As a result, you have to
> > make
> > > >>> some restriction
> > > >>> >like "hive built-in functions can only be used when current
> catalog
> > is
> > > >>> hive catalog".
> > > >>>
> > > >>> Yes, I've unified #3 and #4 but it seems I didn't update some part
> of
> > > >>> the doc. I've modified those sections, and they are up to date now.
> > > >>>
> > > >>> In short, now built-in function of external systems are defined as
> a
> > > >>> special kind of catalog function in Flink, and handled by Flink as
> > > >>> following:
> > > >>> - An external built-in function must be associated with a catalog
> for
> > > >>> the purpose of decoupling flink-table and external systems.
> > > >>> - It always resides in front of catalog functions in ambiguous
> > function
> > > >>> reference order, just like in its own external system
> > > >>> - It is a special catalog function that doesn’t have a
> > schema/database
> > > >>> namespace
> > > >>> - It goes thru the same instantiation logic as other user defined
> > > >>> catalog functions in the external system
> > > >>>
> > > >>> Please take another look at the doc, and let me know if you have
> more
> > > >>> questions.
> > > >>>
> > > >>>
> > > >>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > > wrote:
> > > >>>
> > > >>>> Hi Kurt,
> > > >>>>
> > > >>>> it should not affect the functions and operations we currently
> have
> > in
> > > >>>> SQL. It just categorizes the available built-in functions. It is
> > kind
> > > >>>> of
> > > >>>> an orthogonal concept to the catalog API but built-in functions
> > > deserve
> > > >>>> this special kind of treatment. CatalogFunction still fits
> perfectly
> > > in
> > > >>>> there because the regular catalog object resolution logic is not
> > > >>>> affected. So tables and functions are resolved in the same way but
> > > with
> > > >>>> built-in functions that have priority as in the original design.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Timo
> > > >>>>
> > > >>>>
> > > >>>> On 03.09.19 15:26, Kurt Young wrote:
> > > >>>> > Does this only affect the functions and operations we currently
> > have
> > > >>>> in SQL
> > > >>>> > and
> > > >>>> > have no effect on tables, right? Looks like this is an
> orthogonal
> > > >>>> concept
> > > >>>> > with Catalog?
> > > >>>> > If the answer are both yes, then the catalog function will be a
> > > weird
> > > >>>> > concept?
> > > >>>> >
> > > >>>> > Best,
> > > >>>> > Kurt
> > > >>>> >
> > > >>>> >
> > > >>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yuzhao.cyz@gmail.com
> >
> > > >>>> wrote:
> > > >>>> >
> > > >>>> >> The way you proposed are basically the same as what Calcite
> > does, I
> > > >>>> think
> > > >>>> >> we are in the same line.
> > > >>>> >>
> > > >>>> >> Best,
> > > >>>> >> Danny Chan
> > > >>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> > > >>>> >>> This sounds exactly as the module approach I mentioned, no?
> > > >>>> >>>
> > > >>>> >>> Regards,
> > > >>>> >>> Timo
> > > >>>> >>>
> > > >>>> >>> On 03.09.19 13:42, Danny Chan wrote:
> > > >>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> > > >>>> >> refactoring to make our function usage more user friendly.
> > > >>>> >>>> For the topic of how to organize the builtin operators and
> > > >>>> operators
> > > >>>> >> of Hive, here is a solution from Apache Calcite, the Calcite
> way
> > is
> > > >>>> to make
> > > >>>> >> every dialect operators a “Library”, user can specify which
> > > >>>> libraries they
> > > >>>> >> want to use for a sql query. The builtin operators always comes
> > as
> > > >>>> the
> > > >>>> >> first class objects and the others are used from the order they
> > > >>>> appears.
> > > >>>> >> Maybe you can take a reference.
> > > >>>> >>>> [1]
> > > >>>> >>
> > > >>>>
> > >
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > > >>>> >>>> Best,
> > > >>>> >>>> Danny Chan
> > > >>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> > > >>>> >>>>> Hi folks,
> > > >>>> >>>>>
> > > >>>> >>>>> I'd like to kick off a discussion on reworking Flink's
> > > >>>> >> FunctionCatalog.
> > > >>>> >>>>> It's critically helpful to improve function usability in
> SQL.
> > > >>>> >>>>>
> > > >>>> >>>>>
> > > >>>> >>
> > > >>>>
> > >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > > >>>> >>>>> In short, it:
> > > >>>> >>>>> - adds support for precise function reference with
> > > fully/partially
> > > >>>> >>>>> qualified name
> > > >>>> >>>>> - redefines function resolution order for ambiguous function
> > > >>>> >> reference
> > > >>>> >>>>> - adds support for Hive's rich built-in functions (support
> for
> > > >>>> Hive
> > > >>>> >> user
> > > >>>> >>>>> defined functions was already added in 1.9.0)
> > > >>>> >>>>> - clarifies the concept of temporary functions
> > > >>>> >>>>>
> > > >>>> >>>>> Would love to hear your thoughts.
> > > >>>> >>>>>
> > > >>>> >>>>> Bowen
> > > >>>> >>>
> > > >>>>
> > > >>>>
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Xuefu Z <us...@gmail.com>.
From what I have seen, there are a couple of focal disagreements:

1. Resolution order: temp function --> flink built-in function --> catalog
function vs flink built-in function --> temp function -> catalog function.
2. "External" built-in functions: how to treat built-in functions in
external system and how users reference them

For #1, I agree with Bowen that temp function needs to be at the highest
priority because that's how a user might overwrite a built-in function
without referencing a persistent, overwriting catalog function with a fully
qualified name. Putting built-in functions at the highest priority
eliminates that usage.

For #2, I saw a general agreement on referencing "external" built-in
functions such as those in Hive needs to be explicit and deterministic even
though different approaches are proposed. To limit the scope and simply the
usage, it seems making sense to me to introduce special syntax for user  to
explicitly reference an external built-in function such as hive1::sqrt or
hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API call
hive1.getFunction(ObjectPath functionName) where the database name is
absent for bulit-in functions available in that catalog hive1. I understand
that Bowen's original proposal was trying to avoid this, but this could
turn out to be a clean and simple solution.

(Timo's modular approach is great way to "expand" Flink's built-in function
set, which seems orthogonal and complementary to this, which could be
tackled in further future work.)

I'd be happy to hear further thoughts on the two points.

Thanks,
Xuefu

On Tue, Sep 3, 2019 at 7:11 PM Kurt Young <yk...@gmail.com> wrote:

> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
> same
> as Bowen's. But after thinking about it, I'm currently lean to Timo's
> suggestion.
>
> The reason is backward compatibility. If we follow Bowen's approach, let's
> say we
> first find function in Flink's built-in functions, and then hive's
> built-in. For example, `foo`
> is not supported by Flink, but hive has such built-in function. So user
> will have hive's
> behavior for function `foo`. And in next release, Flink realize this is a
> very popular function
> and add it into Flink's built-in functions, but with different behavior as
> hive's. So in next
> release, the behavior changes.
>
> With Timo's approach, IIUC user have to tell the framework explicitly what
> kind of
> built-in functions he would like to use. He can just tell framework to
> abandon Flink's built-in
> functions, and use hive's instead. User can only choose between them, but
> not use
> them at the same time. I think this approach is more predictable.
>
> Best,
> Kurt
>
>
> On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi all,
> >
> > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> section
> > in the google doc was updated, please take a look first and let me know
> if
> > you have more questions.
> >
> > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
> >
> > > Hi Timo,
> > >
> > > Re> 1) We should not have the restriction "hive built-in functions can
> > > only
> > > > be used when current catalog is hive catalog". Switching a catalog
> > > > should only have implications on the cat.db.object resolution but not
> > > > functions. It would be quite convinient for users to use Hive
> built-ins
> > > > even if they use a Confluent schema registry or just the in-memory
> > > catalog.
> > >
> > > There might be a misunderstanding here.
> > >
> > > First of all, Hive built-in functions are not part of Flink built-in
> > > functions, they are catalog functions, thus if the current catalog is
> > not a
> > > HiveCatalog but, say, a schema registry catalog, ambiguous functions
> > > reference just shouldn't be resolved to a different catalog.
> > >
> > > Second, Hive built-in functions can potentially be referenced across
> > > catalog, but it doesn't have db namespace and we currently just don't
> > have
> > > a SQL syntax for it. It can be enabled when such a SQL syntax is
> defined,
> > > e.g. "catalog::function", but it's out of scope of this FLIP.
> > >
> > > 2) I would propose to have separate concepts for catalog and built-in
> > > functions. In particular it would be nice to modularize built-in
> > > functions. Some built-in functions are very crucial (like AS, CAST,
> > > MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
> > > we add more experimental functions in the future or function for some
> > > special application area (Geo functions, ML functions). A data platform
> > > team might not want to make every built-in function available. Or a
> > > function module like ML functions is in a different Maven module.
> > >
> > > I think this is orthogonal to this FLIP, especially we don't have the
> > > "external built-in functions" anymore and currently the built-in
> function
> > > category remains untouched.
> > >
> > > But just to share some thoughts on the proposal, I'm not sure about it:
> > > - I don't know if any other databases handle built-in functions like
> > that.
> > > Maybe you can give some examples? IMHO, built-in functions are system
> > info
> > > and should be deterministic, not depending on loaded libraries. Geo
> > > functions should be either built-in already or just libraries
> functions,
> > > and library functions can be adapted to catalog APIs or of some other
> > > syntax to use
> > > - I don't know if all use cases stand, and many can be achieved by
> other
> > > approaches too. E.g. experimental functions can be taken good care of
> by
> > > documentations, annotations, etc
> > > - the proposal basically introduces some concept like a pluggable
> > built-in
> > > function catalog, despite the already existing catalog APIs
> > > - it brings in even more complicated scenarios to the design. E.g. how
> do
> > > you handle built-in functions in different modules but different names?
> > >
> > > In short, I'm not sure if it really stands and it looks like an
> overkill
> > > to me. I'd rather not go to that route. Related discussion can be on
> its
> > > own thread.
> > >
> > > 3) Following the suggestion above, we can have a separate discovery
> > > mechanism for built-in functions. Instead of just going through a
> static
> > > list like in BuiltInFunctionDefinitions, a platform team should be able
> > > to select function modules like
> > > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > > HiveFunctions) or via service discovery;
> > >
> > > Same as above. I'll leave it to its own thread.
> > >
> > > re > 3) Dawid and I discussed the resulution order again. I agree with
> > > Kurt
> > > > that we should unify built-in function (external or internal) under a
> > > > common layer. However, the resolution order should be:
> > > >   1. built-in functions
> > > >   2. temporary functions
> > > >   3. regular catalog resolution logic
> > > > Otherwise a temporary function could cause clashes with Flink's
> > built-in
> > > > functions. If you take a look at other vendors, like SQL Server they
> > > > also do not allow to overwrite built-in functions.
> > >
> > > ”I agree with Kurt that we should unify built-in function (external or
> > > internal) under a common layer.“ <- I don't think this is what Kurt
> > means.
> > > Kurt and I are in favor of unifying built-in functions of external
> > systems
> > > and catalog functions. Did you type a mistake?
> > >
> > > Besides, I'm not sure about the resolution order you proposed.
> Temporary
> > > functions have a lifespan over a session and are only visible to the
> > > session owner, they are unique to each user, and users create them on
> > > purpose to be the highest priority in order to overwrite system info
> > > (built-in functions in this case).
> > >
> > > In your case, why would users name a temporary function the same as a
> > > built-in function then? Since using that name in ambiguous function
> > > reference will always be resolved to built-in functions, creating a
> > > same-named temp function would be meaningless in the end.
> > >
> > >
> > > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
> > >
> > >> Hi Jingsong,
> > >>
> > >> Re> 1.Hive built-in functions is an intermediate solution. So we
> should
> > >> > not introduce interfaces to influence the framework. To make
> > >> > Flink itself more powerful, we should implement the functions
> > >> > we need to add.
> > >>
> > >> Yes, please see the doc.
> > >>
> > >> Re> 2.Non-flink built-in functions are easy for users to change their
> > >> > behavior. If we support some flink built-in functions in the
> > >> > future but act differently from non-flink built-in, this will lead
> to
> > >> > changes in user behavior.
> > >>
> > >> There's no such concept as "external built-in functions" any more.
> > >> Built-in functions of external systems will be treated as special
> > catalog
> > >> functions.
> > >>
> > >> Re> Another question is, does this fallback include all
> > >> > hive built-in functions? As far as I know, some hive functions
> > >> > have some hacky. If possible, can we start with a white list?
> > >> > Once we implement some functions to flink built-in, we can
> > >> > also update the whitelist.
> > >>
> > >> Yes, that's something we thought of too. I don't think it's super
> > >> critical to the scope of this FLIP, thus I'd like to leave it to
> future
> > >> efforts as a nice-to-have feature.
> > >>
> > >>
> > >> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:
> > >>
> > >>> Hi Kurt,
> > >>>
> > >>> Re: > What I want to propose is we can merge #3 and #4, make them
> both
> > >>> under
> > >>> >"catalog" concept, by extending catalog function to make it have
> > >>> ability to
> > >>> >have built-in catalog functions. Some benefits I can see from this
> > >>> approach:
> > >>> >1. We don't have to introduce new concept like external built-in
> > >>> functions.
> > >>> >Actually I don't see a full story about how to treat a built-in
> > >>> functions, and it
> > >>> >seems a little bit disrupt with catalog. As a result, you have to
> make
> > >>> some restriction
> > >>> >like "hive built-in functions can only be used when current catalog
> is
> > >>> hive catalog".
> > >>>
> > >>> Yes, I've unified #3 and #4 but it seems I didn't update some part of
> > >>> the doc. I've modified those sections, and they are up to date now.
> > >>>
> > >>> In short, now built-in function of external systems are defined as a
> > >>> special kind of catalog function in Flink, and handled by Flink as
> > >>> following:
> > >>> - An external built-in function must be associated with a catalog for
> > >>> the purpose of decoupling flink-table and external systems.
> > >>> - It always resides in front of catalog functions in ambiguous
> function
> > >>> reference order, just like in its own external system
> > >>> - It is a special catalog function that doesn’t have a
> schema/database
> > >>> namespace
> > >>> - It goes thru the same instantiation logic as other user defined
> > >>> catalog functions in the external system
> > >>>
> > >>> Please take another look at the doc, and let me know if you have more
> > >>> questions.
> > >>>
> > >>>
> > >>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> > wrote:
> > >>>
> > >>>> Hi Kurt,
> > >>>>
> > >>>> it should not affect the functions and operations we currently have
> in
> > >>>> SQL. It just categorizes the available built-in functions. It is
> kind
> > >>>> of
> > >>>> an orthogonal concept to the catalog API but built-in functions
> > deserve
> > >>>> this special kind of treatment. CatalogFunction still fits perfectly
> > in
> > >>>> there because the regular catalog object resolution logic is not
> > >>>> affected. So tables and functions are resolved in the same way but
> > with
> > >>>> built-in functions that have priority as in the original design.
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 03.09.19 15:26, Kurt Young wrote:
> > >>>> > Does this only affect the functions and operations we currently
> have
> > >>>> in SQL
> > >>>> > and
> > >>>> > have no effect on tables, right? Looks like this is an orthogonal
> > >>>> concept
> > >>>> > with Catalog?
> > >>>> > If the answer are both yes, then the catalog function will be a
> > weird
> > >>>> > concept?
> > >>>> >
> > >>>> > Best,
> > >>>> > Kurt
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com>
> > >>>> wrote:
> > >>>> >
> > >>>> >> The way you proposed are basically the same as what Calcite
> does, I
> > >>>> think
> > >>>> >> we are in the same line.
> > >>>> >>
> > >>>> >> Best,
> > >>>> >> Danny Chan
> > >>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> > >>>> >>> This sounds exactly as the module approach I mentioned, no?
> > >>>> >>>
> > >>>> >>> Regards,
> > >>>> >>> Timo
> > >>>> >>>
> > >>>> >>> On 03.09.19 13:42, Danny Chan wrote:
> > >>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> > >>>> >> refactoring to make our function usage more user friendly.
> > >>>> >>>> For the topic of how to organize the builtin operators and
> > >>>> operators
> > >>>> >> of Hive, here is a solution from Apache Calcite, the Calcite way
> is
> > >>>> to make
> > >>>> >> every dialect operators a “Library”, user can specify which
> > >>>> libraries they
> > >>>> >> want to use for a sql query. The builtin operators always comes
> as
> > >>>> the
> > >>>> >> first class objects and the others are used from the order they
> > >>>> appears.
> > >>>> >> Maybe you can take a reference.
> > >>>> >>>> [1]
> > >>>> >>
> > >>>>
> >
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > >>>> >>>> Best,
> > >>>> >>>> Danny Chan
> > >>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> > >>>> >>>>> Hi folks,
> > >>>> >>>>>
> > >>>> >>>>> I'd like to kick off a discussion on reworking Flink's
> > >>>> >> FunctionCatalog.
> > >>>> >>>>> It's critically helpful to improve function usability in SQL.
> > >>>> >>>>>
> > >>>> >>>>>
> > >>>> >>
> > >>>>
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > >>>> >>>>> In short, it:
> > >>>> >>>>> - adds support for precise function reference with
> > fully/partially
> > >>>> >>>>> qualified name
> > >>>> >>>>> - redefines function resolution order for ambiguous function
> > >>>> >> reference
> > >>>> >>>>> - adds support for Hive's rich built-in functions (support for
> > >>>> Hive
> > >>>> >> user
> > >>>> >>>>> defined functions was already added in 1.9.0)
> > >>>> >>>>> - clarifies the concept of temporary functions
> > >>>> >>>>>
> > >>>> >>>>> Would love to hear your thoughts.
> > >>>> >>>>>
> > >>>> >>>>> Bowen
> > >>>> >>>
> > >>>>
> > >>>>
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
same
as Bowen's. But after thinking about it, I'm currently lean to Timo's
suggestion.

The reason is backward compatibility. If we follow Bowen's approach, let's
say we
first find function in Flink's built-in functions, and then hive's
built-in. For example, `foo`
is not supported by Flink, but hive has such built-in function. So user
will have hive's
behavior for function `foo`. And in next release, Flink realize this is a
very popular function
and add it into Flink's built-in functions, but with different behavior as
hive's. So in next
release, the behavior changes.

With Timo's approach, IIUC user have to tell the framework explicitly what
kind of
built-in functions he would like to use. He can just tell framework to
abandon Flink's built-in
functions, and use hive's instead. User can only choose between them, but
not use
them at the same time. I think this approach is more predictable.

Best,
Kurt


On Wed, Sep 4, 2019 at 8:00 AM Bowen Li <bo...@gmail.com> wrote:

> Hi all,
>
> Thanks for the feedback. Just a kindly reminder that the [Proposal] section
> in the google doc was updated, please take a look first and let me know if
> you have more questions.
>
> On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi Timo,
> >
> > Re> 1) We should not have the restriction "hive built-in functions can
> > only
> > > be used when current catalog is hive catalog". Switching a catalog
> > > should only have implications on the cat.db.object resolution but not
> > > functions. It would be quite convinient for users to use Hive built-ins
> > > even if they use a Confluent schema registry or just the in-memory
> > catalog.
> >
> > There might be a misunderstanding here.
> >
> > First of all, Hive built-in functions are not part of Flink built-in
> > functions, they are catalog functions, thus if the current catalog is
> not a
> > HiveCatalog but, say, a schema registry catalog, ambiguous functions
> > reference just shouldn't be resolved to a different catalog.
> >
> > Second, Hive built-in functions can potentially be referenced across
> > catalog, but it doesn't have db namespace and we currently just don't
> have
> > a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
> > e.g. "catalog::function", but it's out of scope of this FLIP.
> >
> > 2) I would propose to have separate concepts for catalog and built-in
> > functions. In particular it would be nice to modularize built-in
> > functions. Some built-in functions are very crucial (like AS, CAST,
> > MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
> > we add more experimental functions in the future or function for some
> > special application area (Geo functions, ML functions). A data platform
> > team might not want to make every built-in function available. Or a
> > function module like ML functions is in a different Maven module.
> >
> > I think this is orthogonal to this FLIP, especially we don't have the
> > "external built-in functions" anymore and currently the built-in function
> > category remains untouched.
> >
> > But just to share some thoughts on the proposal, I'm not sure about it:
> > - I don't know if any other databases handle built-in functions like
> that.
> > Maybe you can give some examples? IMHO, built-in functions are system
> info
> > and should be deterministic, not depending on loaded libraries. Geo
> > functions should be either built-in already or just libraries functions,
> > and library functions can be adapted to catalog APIs or of some other
> > syntax to use
> > - I don't know if all use cases stand, and many can be achieved by other
> > approaches too. E.g. experimental functions can be taken good care of by
> > documentations, annotations, etc
> > - the proposal basically introduces some concept like a pluggable
> built-in
> > function catalog, despite the already existing catalog APIs
> > - it brings in even more complicated scenarios to the design. E.g. how do
> > you handle built-in functions in different modules but different names?
> >
> > In short, I'm not sure if it really stands and it looks like an overkill
> > to me. I'd rather not go to that route. Related discussion can be on its
> > own thread.
> >
> > 3) Following the suggestion above, we can have a separate discovery
> > mechanism for built-in functions. Instead of just going through a static
> > list like in BuiltInFunctionDefinitions, a platform team should be able
> > to select function modules like
> > catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> > HiveFunctions) or via service discovery;
> >
> > Same as above. I'll leave it to its own thread.
> >
> > re > 3) Dawid and I discussed the resulution order again. I agree with
> > Kurt
> > > that we should unify built-in function (external or internal) under a
> > > common layer. However, the resolution order should be:
> > >   1. built-in functions
> > >   2. temporary functions
> > >   3. regular catalog resolution logic
> > > Otherwise a temporary function could cause clashes with Flink's
> built-in
> > > functions. If you take a look at other vendors, like SQL Server they
> > > also do not allow to overwrite built-in functions.
> >
> > ”I agree with Kurt that we should unify built-in function (external or
> > internal) under a common layer.“ <- I don't think this is what Kurt
> means.
> > Kurt and I are in favor of unifying built-in functions of external
> systems
> > and catalog functions. Did you type a mistake?
> >
> > Besides, I'm not sure about the resolution order you proposed. Temporary
> > functions have a lifespan over a session and are only visible to the
> > session owner, they are unique to each user, and users create them on
> > purpose to be the highest priority in order to overwrite system info
> > (built-in functions in this case).
> >
> > In your case, why would users name a temporary function the same as a
> > built-in function then? Since using that name in ambiguous function
> > reference will always be resolved to built-in functions, creating a
> > same-named temp function would be meaningless in the end.
> >
> >
> > On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
> >
> >> Hi Jingsong,
> >>
> >> Re> 1.Hive built-in functions is an intermediate solution. So we should
> >> > not introduce interfaces to influence the framework. To make
> >> > Flink itself more powerful, we should implement the functions
> >> > we need to add.
> >>
> >> Yes, please see the doc.
> >>
> >> Re> 2.Non-flink built-in functions are easy for users to change their
> >> > behavior. If we support some flink built-in functions in the
> >> > future but act differently from non-flink built-in, this will lead to
> >> > changes in user behavior.
> >>
> >> There's no such concept as "external built-in functions" any more.
> >> Built-in functions of external systems will be treated as special
> catalog
> >> functions.
> >>
> >> Re> Another question is, does this fallback include all
> >> > hive built-in functions? As far as I know, some hive functions
> >> > have some hacky. If possible, can we start with a white list?
> >> > Once we implement some functions to flink built-in, we can
> >> > also update the whitelist.
> >>
> >> Yes, that's something we thought of too. I don't think it's super
> >> critical to the scope of this FLIP, thus I'd like to leave it to future
> >> efforts as a nice-to-have feature.
> >>
> >>
> >> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> Re: > What I want to propose is we can merge #3 and #4, make them both
> >>> under
> >>> >"catalog" concept, by extending catalog function to make it have
> >>> ability to
> >>> >have built-in catalog functions. Some benefits I can see from this
> >>> approach:
> >>> >1. We don't have to introduce new concept like external built-in
> >>> functions.
> >>> >Actually I don't see a full story about how to treat a built-in
> >>> functions, and it
> >>> >seems a little bit disrupt with catalog. As a result, you have to make
> >>> some restriction
> >>> >like "hive built-in functions can only be used when current catalog is
> >>> hive catalog".
> >>>
> >>> Yes, I've unified #3 and #4 but it seems I didn't update some part of
> >>> the doc. I've modified those sections, and they are up to date now.
> >>>
> >>> In short, now built-in function of external systems are defined as a
> >>> special kind of catalog function in Flink, and handled by Flink as
> >>> following:
> >>> - An external built-in function must be associated with a catalog for
> >>> the purpose of decoupling flink-table and external systems.
> >>> - It always resides in front of catalog functions in ambiguous function
> >>> reference order, just like in its own external system
> >>> - It is a special catalog function that doesn’t have a schema/database
> >>> namespace
> >>> - It goes thru the same instantiation logic as other user defined
> >>> catalog functions in the external system
> >>>
> >>> Please take another look at the doc, and let me know if you have more
> >>> questions.
> >>>
> >>>
> >>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org>
> wrote:
> >>>
> >>>> Hi Kurt,
> >>>>
> >>>> it should not affect the functions and operations we currently have in
> >>>> SQL. It just categorizes the available built-in functions. It is kind
> >>>> of
> >>>> an orthogonal concept to the catalog API but built-in functions
> deserve
> >>>> this special kind of treatment. CatalogFunction still fits perfectly
> in
> >>>> there because the regular catalog object resolution logic is not
> >>>> affected. So tables and functions are resolved in the same way but
> with
> >>>> built-in functions that have priority as in the original design.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 03.09.19 15:26, Kurt Young wrote:
> >>>> > Does this only affect the functions and operations we currently have
> >>>> in SQL
> >>>> > and
> >>>> > have no effect on tables, right? Looks like this is an orthogonal
> >>>> concept
> >>>> > with Catalog?
> >>>> > If the answer are both yes, then the catalog function will be a
> weird
> >>>> > concept?
> >>>> >
> >>>> > Best,
> >>>> > Kurt
> >>>> >
> >>>> >
> >>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> The way you proposed are basically the same as what Calcite does, I
> >>>> think
> >>>> >> we are in the same line.
> >>>> >>
> >>>> >> Best,
> >>>> >> Danny Chan
> >>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> >>>> >>> This sounds exactly as the module approach I mentioned, no?
> >>>> >>>
> >>>> >>> Regards,
> >>>> >>> Timo
> >>>> >>>
> >>>> >>> On 03.09.19 13:42, Danny Chan wrote:
> >>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> >>>> >> refactoring to make our function usage more user friendly.
> >>>> >>>> For the topic of how to organize the builtin operators and
> >>>> operators
> >>>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is
> >>>> to make
> >>>> >> every dialect operators a “Library”, user can specify which
> >>>> libraries they
> >>>> >> want to use for a sql query. The builtin operators always comes as
> >>>> the
> >>>> >> first class objects and the others are used from the order they
> >>>> appears.
> >>>> >> Maybe you can take a reference.
> >>>> >>>> [1]
> >>>> >>
> >>>>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >>>> >>>> Best,
> >>>> >>>> Danny Chan
> >>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> >>>> >>>>> Hi folks,
> >>>> >>>>>
> >>>> >>>>> I'd like to kick off a discussion on reworking Flink's
> >>>> >> FunctionCatalog.
> >>>> >>>>> It's critically helpful to improve function usability in SQL.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>
> >>>>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >>>> >>>>> In short, it:
> >>>> >>>>> - adds support for precise function reference with
> fully/partially
> >>>> >>>>> qualified name
> >>>> >>>>> - redefines function resolution order for ambiguous function
> >>>> >> reference
> >>>> >>>>> - adds support for Hive's rich built-in functions (support for
> >>>> Hive
> >>>> >> user
> >>>> >>>>> defined functions was already added in 1.9.0)
> >>>> >>>>> - clarifies the concept of temporary functions
> >>>> >>>>>
> >>>> >>>>> Would love to hear your thoughts.
> >>>> >>>>>
> >>>> >>>>> Bowen
> >>>> >>>
> >>>>
> >>>>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi all,

Thanks for the feedback. Just a kindly reminder that the [Proposal] section
in the google doc was updated, please take a look first and let me know if
you have more questions.

On Tue, Sep 3, 2019 at 4:57 PM Bowen Li <bo...@gmail.com> wrote:

> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions can
> only
> > be used when current catalog is hive catalog". Switching a catalog
> > should only have implications on the cat.db.object resolution but not
> > functions. It would be quite convinient for users to use Hive built-ins
> > even if they use a Confluent schema registry or just the in-memory
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink built-in
> functions, they are catalog functions, thus if the current catalog is not a
> HiveCatalog but, say, a schema registry catalog, ambiguous functions
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced across
> catalog, but it doesn't have db namespace and we currently just don't have
> a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and built-in
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
> we add more experimental functions in the future or function for some
> special application area (Geo functions, ML functions). A data platform
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have the
> "external built-in functions" anymore and currently the built-in function
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about it:
> - I don't know if any other databases handle built-in functions like that.
> Maybe you can give some examples? IMHO, built-in functions are system info
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries functions,
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by other
> approaches too. E.g. experimental functions can be taken good care of by
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable built-in
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g. how do
> you handle built-in functions in different modules but different names?
>
> In short, I'm not sure if it really stands and it looks like an overkill
> to me. I'd rather not go to that route. Related discussion can be on its
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a static
> list like in BuiltInFunctionDefinitions, a platform team should be able
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree with
> Kurt
> > that we should unify built-in function (external or internal) under a
> > common layer. However, the resolution order should be:
> >   1. built-in functions
> >   2. temporary functions
> >   3. regular catalog resolution logic
> > Otherwise a temporary function could cause clashes with Flink's built-in
> > functions. If you take a look at other vendors, like SQL Server they
> > also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external or
> internal) under a common layer.“ <- I don't think this is what Kurt means.
> Kurt and I are in favor of unifying built-in functions of external systems
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed. Temporary
> functions have a lifespan over a session and are only visible to the
> session owner, they are unique to each user, and users create them on
> purpose to be the highest priority in order to overwrite system info
> (built-in functions in this case).
>
> In your case, why would users name a temporary function the same as a
> built-in function then? Since using that name in ambiguous function
> reference will always be resolved to built-in functions, creating a
> same-named temp function would be meaningless in the end.
>
>
> On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:
>
>> Hi Jingsong,
>>
>> Re> 1.Hive built-in functions is an intermediate solution. So we should
>> > not introduce interfaces to influence the framework. To make
>> > Flink itself more powerful, we should implement the functions
>> > we need to add.
>>
>> Yes, please see the doc.
>>
>> Re> 2.Non-flink built-in functions are easy for users to change their
>> > behavior. If we support some flink built-in functions in the
>> > future but act differently from non-flink built-in, this will lead to
>> > changes in user behavior.
>>
>> There's no such concept as "external built-in functions" any more.
>> Built-in functions of external systems will be treated as special catalog
>> functions.
>>
>> Re> Another question is, does this fallback include all
>> > hive built-in functions? As far as I know, some hive functions
>> > have some hacky. If possible, can we start with a white list?
>> > Once we implement some functions to flink built-in, we can
>> > also update the whitelist.
>>
>> Yes, that's something we thought of too. I don't think it's super
>> critical to the scope of this FLIP, thus I'd like to leave it to future
>> efforts as a nice-to-have feature.
>>
>>
>> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:
>>
>>> Hi Kurt,
>>>
>>> Re: > What I want to propose is we can merge #3 and #4, make them both
>>> under
>>> >"catalog" concept, by extending catalog function to make it have
>>> ability to
>>> >have built-in catalog functions. Some benefits I can see from this
>>> approach:
>>> >1. We don't have to introduce new concept like external built-in
>>> functions.
>>> >Actually I don't see a full story about how to treat a built-in
>>> functions, and it
>>> >seems a little bit disrupt with catalog. As a result, you have to make
>>> some restriction
>>> >like "hive built-in functions can only be used when current catalog is
>>> hive catalog".
>>>
>>> Yes, I've unified #3 and #4 but it seems I didn't update some part of
>>> the doc. I've modified those sections, and they are up to date now.
>>>
>>> In short, now built-in function of external systems are defined as a
>>> special kind of catalog function in Flink, and handled by Flink as
>>> following:
>>> - An external built-in function must be associated with a catalog for
>>> the purpose of decoupling flink-table and external systems.
>>> - It always resides in front of catalog functions in ambiguous function
>>> reference order, just like in its own external system
>>> - It is a special catalog function that doesn’t have a schema/database
>>> namespace
>>> - It goes thru the same instantiation logic as other user defined
>>> catalog functions in the external system
>>>
>>> Please take another look at the doc, and let me know if you have more
>>> questions.
>>>
>>>
>>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Kurt,
>>>>
>>>> it should not affect the functions and operations we currently have in
>>>> SQL. It just categorizes the available built-in functions. It is kind
>>>> of
>>>> an orthogonal concept to the catalog API but built-in functions deserve
>>>> this special kind of treatment. CatalogFunction still fits perfectly in
>>>> there because the regular catalog object resolution logic is not
>>>> affected. So tables and functions are resolved in the same way but with
>>>> built-in functions that have priority as in the original design.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 03.09.19 15:26, Kurt Young wrote:
>>>> > Does this only affect the functions and operations we currently have
>>>> in SQL
>>>> > and
>>>> > have no effect on tables, right? Looks like this is an orthogonal
>>>> concept
>>>> > with Catalog?
>>>> > If the answer are both yes, then the catalog function will be a weird
>>>> > concept?
>>>> >
>>>> > Best,
>>>> > Kurt
>>>> >
>>>> >
>>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> The way you proposed are basically the same as what Calcite does, I
>>>> think
>>>> >> we are in the same line.
>>>> >>
>>>> >> Best,
>>>> >> Danny Chan
>>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
>>>> >>> This sounds exactly as the module approach I mentioned, no?
>>>> >>>
>>>> >>> Regards,
>>>> >>> Timo
>>>> >>>
>>>> >>> On 03.09.19 13:42, Danny Chan wrote:
>>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
>>>> >> refactoring to make our function usage more user friendly.
>>>> >>>> For the topic of how to organize the builtin operators and
>>>> operators
>>>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is
>>>> to make
>>>> >> every dialect operators a “Library”, user can specify which
>>>> libraries they
>>>> >> want to use for a sql query. The builtin operators always comes as
>>>> the
>>>> >> first class objects and the others are used from the order they
>>>> appears.
>>>> >> Maybe you can take a reference.
>>>> >>>> [1]
>>>> >>
>>>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>> >>>> Best,
>>>> >>>> Danny Chan
>>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>>>> >>>>> Hi folks,
>>>> >>>>>
>>>> >>>>> I'd like to kick off a discussion on reworking Flink's
>>>> >> FunctionCatalog.
>>>> >>>>> It's critically helpful to improve function usability in SQL.
>>>> >>>>>
>>>> >>>>>
>>>> >>
>>>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>> >>>>> In short, it:
>>>> >>>>> - adds support for precise function reference with fully/partially
>>>> >>>>> qualified name
>>>> >>>>> - redefines function resolution order for ambiguous function
>>>> >> reference
>>>> >>>>> - adds support for Hive's rich built-in functions (support for
>>>> Hive
>>>> >> user
>>>> >>>>> defined functions was already added in 1.9.0)
>>>> >>>>> - clarifies the concept of temporary functions
>>>> >>>>>
>>>> >>>>> Would love to hear your thoughts.
>>>> >>>>>
>>>> >>>>> Bowen
>>>> >>>
>>>>
>>>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi Timo,

Re> 1) We should not have the restriction "hive built-in functions can only
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but not
> functions. It would be quite convinient for users to use Hive built-ins
> even if they use a Confluent schema registry or just the in-memory
catalog.

There might be a misunderstanding here.

First of all, Hive built-in functions are not part of Flink built-in
functions, they are catalog functions, thus if the current catalog is not a
HiveCatalog but, say, a schema registry catalog, ambiguous functions
reference just shouldn't be resolved to a different catalog.

Second, Hive built-in functions can potentially be referenced across
catalog, but it doesn't have db namespace and we currently just don't have
a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
e.g. "catalog::function", but it's out of scope of this FLIP.

2) I would propose to have separate concepts for catalog and built-in
functions. In particular it would be nice to modularize built-in
functions. Some built-in functions are very crucial (like AS, CAST,
MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
we add more experimental functions in the future or function for some
special application area (Geo functions, ML functions). A data platform
team might not want to make every built-in function available. Or a
function module like ML functions is in a different Maven module.

I think this is orthogonal to this FLIP, especially we don't have the
"external built-in functions" anymore and currently the built-in function
category remains untouched.

But just to share some thoughts on the proposal, I'm not sure about it:
- I don't know if any other databases handle built-in functions like that.
Maybe you can give some examples? IMHO, built-in functions are system info
and should be deterministic, not depending on loaded libraries. Geo
functions should be either built-in already or just libraries functions,
and library functions can be adapted to catalog APIs or of some other
syntax to use
- I don't know if all use cases stand, and many can be achieved by other
approaches too. E.g. experimental functions can be taken good care of by
documentations, annotations, etc
- the proposal basically introduces some concept like a pluggable built-in
function catalog, despite the already existing catalog APIs
- it brings in even more complicated scenarios to the design. E.g. how do
you handle built-in functions in different modules but different names?

In short, I'm not sure if it really stands and it looks like an overkill to
me. I'd rather not go to that route. Related discussion can be on its own
thread.

3) Following the suggestion above, we can have a separate discovery
mechanism for built-in functions. Instead of just going through a static
list like in BuiltInFunctionDefinitions, a platform team should be able
to select function modules like
catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
HiveFunctions) or via service discovery;

Same as above. I'll leave it to its own thread.

re > 3) Dawid and I discussed the resulution order again. I agree with Kurt
> that we should unify built-in function (external or internal) under a
> common layer. However, the resolution order should be:
>   1. built-in functions
>   2. temporary functions
>   3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's built-in
> functions. If you take a look at other vendors, like SQL Server they
> also do not allow to overwrite built-in functions.

”I agree with Kurt that we should unify built-in function (external or
internal) under a common layer.“ <- I don't think this is what Kurt means.
Kurt and I are in favor of unifying built-in functions of external systems
and catalog functions. Did you type a mistake?

Besides, I'm not sure about the resolution order you proposed. Temporary
functions have a lifespan over a session and are only visible to the
session owner, they are unique to each user, and users create them on
purpose to be the highest priority in order to overwrite system info
(built-in functions in this case).

In your case, why would users name a temporary function the same as a
built-in function then? Since using that name in ambiguous function
reference will always be resolved to built-in functions, creating a
same-named temp function would be meaningless in the end.


On Tue, Sep 3, 2019 at 1:44 PM Bowen Li <bo...@gmail.com> wrote:

> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we should
> > not introduce interfaces to influence the framework. To make
> > Flink itself more powerful, we should implement the functions
> > we need to add.
>
> Yes, please see the doc.
>
> Re> 2.Non-flink built-in functions are easy for users to change their
> > behavior. If we support some flink built-in functions in the
> > future but act differently from non-flink built-in, this will lead to
> > changes in user behavior.
>
> There's no such concept as "external built-in functions" any more.
> Built-in functions of external systems will be treated as special catalog
> functions.
>
> Re> Another question is, does this fallback include all
> > hive built-in functions? As far as I know, some hive functions
> > have some hacky. If possible, can we start with a white list?
> > Once we implement some functions to flink built-in, we can
> > also update the whitelist.
>
> Yes, that's something we thought of too. I don't think it's super critical
> to the scope of this FLIP, thus I'd like to leave it to future efforts as a
> nice-to-have feature.
>
>
> On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:
>
>> Hi Kurt,
>>
>> Re: > What I want to propose is we can merge #3 and #4, make them both
>> under
>> >"catalog" concept, by extending catalog function to make it have ability
>> to
>> >have built-in catalog functions. Some benefits I can see from this
>> approach:
>> >1. We don't have to introduce new concept like external built-in
>> functions.
>> >Actually I don't see a full story about how to treat a built-in
>> functions, and it
>> >seems a little bit disrupt with catalog. As a result, you have to make
>> some restriction
>> >like "hive built-in functions can only be used when current catalog is
>> hive catalog".
>>
>> Yes, I've unified #3 and #4 but it seems I didn't update some part of the
>> doc. I've modified those sections, and they are up to date now.
>>
>> In short, now built-in function of external systems are defined as a
>> special kind of catalog function in Flink, and handled by Flink as
>> following:
>> - An external built-in function must be associated with a catalog for the
>> purpose of decoupling flink-table and external systems.
>> - It always resides in front of catalog functions in ambiguous function
>> reference order, just like in its own external system
>> - It is a special catalog function that doesn’t have a schema/database
>> namespace
>> - It goes thru the same instantiation logic as other user defined catalog
>> functions in the external system
>>
>> Please take another look at the doc, and let me know if you have more
>> questions.
>>
>>
>> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> wrote:
>>
>>> Hi Kurt,
>>>
>>> it should not affect the functions and operations we currently have in
>>> SQL. It just categorizes the available built-in functions. It is kind of
>>> an orthogonal concept to the catalog API but built-in functions deserve
>>> this special kind of treatment. CatalogFunction still fits perfectly in
>>> there because the regular catalog object resolution logic is not
>>> affected. So tables and functions are resolved in the same way but with
>>> built-in functions that have priority as in the original design.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 03.09.19 15:26, Kurt Young wrote:
>>> > Does this only affect the functions and operations we currently have
>>> in SQL
>>> > and
>>> > have no effect on tables, right? Looks like this is an orthogonal
>>> concept
>>> > with Catalog?
>>> > If the answer are both yes, then the catalog function will be a weird
>>> > concept?
>>> >
>>> > Best,
>>> > Kurt
>>> >
>>> >
>>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com>
>>> wrote:
>>> >
>>> >> The way you proposed are basically the same as what Calcite does, I
>>> think
>>> >> we are in the same line.
>>> >>
>>> >> Best,
>>> >> Danny Chan
>>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
>>> >>> This sounds exactly as the module approach I mentioned, no?
>>> >>>
>>> >>> Regards,
>>> >>> Timo
>>> >>>
>>> >>> On 03.09.19 13:42, Danny Chan wrote:
>>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
>>> >> refactoring to make our function usage more user friendly.
>>> >>>> For the topic of how to organize the builtin operators and operators
>>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is
>>> to make
>>> >> every dialect operators a “Library”, user can specify which libraries
>>> they
>>> >> want to use for a sql query. The builtin operators always comes as the
>>> >> first class objects and the others are used from the order they
>>> appears.
>>> >> Maybe you can take a reference.
>>> >>>> [1]
>>> >>
>>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>> >>>> Best,
>>> >>>> Danny Chan
>>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>>> >>>>> Hi folks,
>>> >>>>>
>>> >>>>> I'd like to kick off a discussion on reworking Flink's
>>> >> FunctionCatalog.
>>> >>>>> It's critically helpful to improve function usability in SQL.
>>> >>>>>
>>> >>>>>
>>> >>
>>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>> >>>>> In short, it:
>>> >>>>> - adds support for precise function reference with fully/partially
>>> >>>>> qualified name
>>> >>>>> - redefines function resolution order for ambiguous function
>>> >> reference
>>> >>>>> - adds support for Hive's rich built-in functions (support for Hive
>>> >> user
>>> >>>>> defined functions was already added in 1.9.0)
>>> >>>>> - clarifies the concept of temporary functions
>>> >>>>>
>>> >>>>> Would love to hear your thoughts.
>>> >>>>>
>>> >>>>> Bowen
>>> >>>
>>>
>>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi Jingsong,

Re> 1.Hive built-in functions is an intermediate solution. So we should
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.

Yes, please see the doc.

Re> 2.Non-flink built-in functions are easy for users to change their
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will lead to
> changes in user behavior.

There's no such concept as "external built-in functions" any more. Built-in
functions of external systems will be treated as special catalog functions.

Re> Another question is, does this fallback include all
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.

Yes, that's something we thought of too. I don't think it's super critical
to the scope of this FLIP, thus I'd like to leave it to future efforts as a
nice-to-have feature.


On Tue, Sep 3, 2019 at 1:37 PM Bowen Li <bo...@gmail.com> wrote:

> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them both
> under
> >"catalog" concept, by extending catalog function to make it have ability
> to
> >have built-in catalog functions. Some benefits I can see from this
> approach:
> >1. We don't have to introduce new concept like external built-in
> functions.
> >Actually I don't see a full story about how to treat a built-in
> functions, and it
> >seems a little bit disrupt with catalog. As a result, you have to make
> some restriction
> >like "hive built-in functions can only be used when current catalog is
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some part of the
> doc. I've modified those sections, and they are up to date now.
>
> In short, now built-in function of external systems are defined as a
> special kind of catalog function in Flink, and handled by Flink as
> following:
> - An external built-in function must be associated with a catalog for the
> purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous function
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a schema/database
> namespace
> - It goes thru the same instantiation logic as other user defined catalog
> functions in the external system
>
> Please take another look at the doc, and let me know if you have more
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> wrote:
>
>> Hi Kurt,
>>
>> it should not affect the functions and operations we currently have in
>> SQL. It just categorizes the available built-in functions. It is kind of
>> an orthogonal concept to the catalog API but built-in functions deserve
>> this special kind of treatment. CatalogFunction still fits perfectly in
>> there because the regular catalog object resolution logic is not
>> affected. So tables and functions are resolved in the same way but with
>> built-in functions that have priority as in the original design.
>>
>> Regards,
>> Timo
>>
>>
>> On 03.09.19 15:26, Kurt Young wrote:
>> > Does this only affect the functions and operations we currently have in
>> SQL
>> > and
>> > have no effect on tables, right? Looks like this is an orthogonal
>> concept
>> > with Catalog?
>> > If the answer are both yes, then the catalog function will be a weird
>> > concept?
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com> wrote:
>> >
>> >> The way you proposed are basically the same as what Calcite does, I
>> think
>> >> we are in the same line.
>> >>
>> >> Best,
>> >> Danny Chan
>> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
>> >>> This sounds exactly as the module approach I mentioned, no?
>> >>>
>> >>> Regards,
>> >>> Timo
>> >>>
>> >>> On 03.09.19 13:42, Danny Chan wrote:
>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
>> >> refactoring to make our function usage more user friendly.
>> >>>> For the topic of how to organize the builtin operators and operators
>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is to
>> make
>> >> every dialect operators a “Library”, user can specify which libraries
>> they
>> >> want to use for a sql query. The builtin operators always comes as the
>> >> first class objects and the others are used from the order they
>> appears.
>> >> Maybe you can take a reference.
>> >>>> [1]
>> >>
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>> >>>> Best,
>> >>>> Danny Chan
>> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>> >>>>> Hi folks,
>> >>>>>
>> >>>>> I'd like to kick off a discussion on reworking Flink's
>> >> FunctionCatalog.
>> >>>>> It's critically helpful to improve function usability in SQL.
>> >>>>>
>> >>>>>
>> >>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>> >>>>> In short, it:
>> >>>>> - adds support for precise function reference with fully/partially
>> >>>>> qualified name
>> >>>>> - redefines function resolution order for ambiguous function
>> >> reference
>> >>>>> - adds support for Hive's rich built-in functions (support for Hive
>> >> user
>> >>>>> defined functions was already added in 1.9.0)
>> >>>>> - clarifies the concept of temporary functions
>> >>>>>
>> >>>>> Would love to hear your thoughts.
>> >>>>>
>> >>>>> Bowen
>> >>>
>>
>>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Hi Kurt,

Re: > What I want to propose is we can merge #3 and #4, make them both under
>"catalog" concept, by extending catalog function to make it have ability to
>have built-in catalog functions. Some benefits I can see from this
approach:
>1. We don't have to introduce new concept like external built-in functions.
>Actually I don't see a full story about how to treat a built-in functions,
and it
>seems a little bit disrupt with catalog. As a result, you have to make
some restriction
>like "hive built-in functions can only be used when current catalog is
hive catalog".

Yes, I've unified #3 and #4 but it seems I didn't update some part of the
doc. I've modified those sections, and they are up to date now.

In short, now built-in function of external systems are defined as a
special kind of catalog function in Flink, and handled by Flink as
following:
- An external built-in function must be associated with a catalog for the
purpose of decoupling flink-table and external systems.
- It always resides in front of catalog functions in ambiguous function
reference order, just like in its own external system
- It is a special catalog function that doesn’t have a schema/database
namespace
- It goes thru the same instantiation logic as other user defined catalog
functions in the external system

Please take another look at the doc, and let me know if you have more
questions.


On Tue, Sep 3, 2019 at 7:28 AM Timo Walther <tw...@apache.org> wrote:

> Hi Kurt,
>
> it should not affect the functions and operations we currently have in
> SQL. It just categorizes the available built-in functions. It is kind of
> an orthogonal concept to the catalog API but built-in functions deserve
> this special kind of treatment. CatalogFunction still fits perfectly in
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way but with
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
> > Does this only affect the functions and operations we currently have in
> SQL
> > and
> > have no effect on tables, right? Looks like this is an orthogonal concept
> > with Catalog?
> > If the answer are both yes, then the catalog function will be a weird
> > concept?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com> wrote:
> >
> >> The way you proposed are basically the same as what Calcite does, I
> think
> >> we are in the same line.
> >>
> >> Best,
> >> Danny Chan
> >> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> >>> This sounds exactly as the module approach I mentioned, no?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>> On 03.09.19 13:42, Danny Chan wrote:
> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> >> refactoring to make our function usage more user friendly.
> >>>> For the topic of how to organize the builtin operators and operators
> >> of Hive, here is a solution from Apache Calcite, the Calcite way is to
> make
> >> every dialect operators a “Library”, user can specify which libraries
> they
> >> want to use for a sql query. The builtin operators always comes as the
> >> first class objects and the others are used from the order they appears.
> >> Maybe you can take a reference.
> >>>> [1]
> >>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >>>> Best,
> >>>> Danny Chan
> >>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> >>>>> Hi folks,
> >>>>>
> >>>>> I'd like to kick off a discussion on reworking Flink's
> >> FunctionCatalog.
> >>>>> It's critically helpful to improve function usability in SQL.
> >>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >>>>> In short, it:
> >>>>> - adds support for precise function reference with fully/partially
> >>>>> qualified name
> >>>>> - redefines function resolution order for ambiguous function
> >> reference
> >>>>> - adds support for Hive's rich built-in functions (support for Hive
> >> user
> >>>>> defined functions was already added in 1.9.0)
> >>>>> - clarifies the concept of temporary functions
> >>>>>
> >>>>> Would love to hear your thoughts.
> >>>>>
> >>>>> Bowen
> >>>
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi Kurt,

it should not affect the functions and operations we currently have in 
SQL. It just categorizes the available built-in functions. It is kind of 
an orthogonal concept to the catalog API but built-in functions deserve 
this special kind of treatment. CatalogFunction still fits perfectly in 
there because the regular catalog object resolution logic is not 
affected. So tables and functions are resolved in the same way but with 
built-in functions that have priority as in the original design.

Regards,
Timo


On 03.09.19 15:26, Kurt Young wrote:
> Does this only affect the functions and operations we currently have in SQL
> and
> have no effect on tables, right? Looks like this is an orthogonal concept
> with Catalog?
> If the answer are both yes, then the catalog function will be a weird
> concept?
>
> Best,
> Kurt
>
>
> On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com> wrote:
>
>> The way you proposed are basically the same as what Calcite does, I think
>> we are in the same line.
>>
>> Best,
>> Danny Chan
>> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
>>> This sounds exactly as the module approach I mentioned, no?
>>>
>>> Regards,
>>> Timo
>>>
>>> On 03.09.19 13:42, Danny Chan wrote:
>>>> Thanks Bowen for bring up this topic, I think it’s a useful
>> refactoring to make our function usage more user friendly.
>>>> For the topic of how to organize the builtin operators and operators
>> of Hive, here is a solution from Apache Calcite, the Calcite way is to make
>> every dialect operators a “Library”, user can specify which libraries they
>> want to use for a sql query. The builtin operators always comes as the
>> first class objects and the others are used from the order they appears.
>> Maybe you can take a reference.
>>>> [1]
>> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>>>> Best,
>>>> Danny Chan
>>>> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>>>>> Hi folks,
>>>>>
>>>>> I'd like to kick off a discussion on reworking Flink's
>> FunctionCatalog.
>>>>> It's critically helpful to improve function usability in SQL.
>>>>>
>>>>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>>>> In short, it:
>>>>> - adds support for precise function reference with fully/partially
>>>>> qualified name
>>>>> - redefines function resolution order for ambiguous function
>> reference
>>>>> - adds support for Hive's rich built-in functions (support for Hive
>> user
>>>>> defined functions was already added in 1.9.0)
>>>>> - clarifies the concept of temporary functions
>>>>>
>>>>> Would love to hear your thoughts.
>>>>>
>>>>> Bowen
>>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
Does this only affect the functions and operations we currently have in SQL
and
have no effect on tables, right? Looks like this is an orthogonal concept
with Catalog?
If the answer are both yes, then the catalog function will be a weird
concept?

Best,
Kurt


On Tue, Sep 3, 2019 at 8:10 PM Danny Chan <yu...@gmail.com> wrote:

> The way you proposed are basically the same as what Calcite does, I think
> we are in the same line.
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> > This sounds exactly as the module approach I mentioned, no?
> >
> > Regards,
> > Timo
> >
> > On 03.09.19 13:42, Danny Chan wrote:
> > > Thanks Bowen for bring up this topic, I think it’s a useful
> refactoring to make our function usage more user friendly.
> > >
> > > For the topic of how to organize the builtin operators and operators
> of Hive, here is a solution from Apache Calcite, the Calcite way is to make
> every dialect operators a “Library”, user can specify which libraries they
> want to use for a sql query. The builtin operators always comes as the
> first class objects and the others are used from the order they appears.
> Maybe you can take a reference.
> > >
> > > [1]
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> > > > Hi folks,
> > > >
> > > > I'd like to kick off a discussion on reworking Flink's
> FunctionCatalog.
> > > > It's critically helpful to improve function usability in SQL.
> > > >
> > > >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > > >
> > > > In short, it:
> > > > - adds support for precise function reference with fully/partially
> > > > qualified name
> > > > - redefines function resolution order for ambiguous function
> reference
> > > > - adds support for Hive's rich built-in functions (support for Hive
> user
> > > > defined functions was already added in 1.9.0)
> > > > - clarifies the concept of temporary functions
> > > >
> > > > Would love to hear your thoughts.
> > > >
> > > > Bowen
> >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Danny Chan <yu...@gmail.com>.
The way you proposed are basically the same as what Calcite does, I think we are in the same line.

Best,
Danny Chan
在 2019年9月3日 +0800 PM7:57,Timo Walther <tw...@apache.org>,写道:
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
> > Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make our function usage more user friendly.
> >
> > For the topic of how to organize the builtin operators and operators of Hive, here is a solution from Apache Calcite, the Calcite way is to make every dialect operators a “Library”, user can specify which libraries they want to use for a sql query. The builtin operators always comes as the first class objects and the others are used from the order they appears. Maybe you can take a reference.
> >
> > [1] https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >
> > Best,
> > Danny Chan
> > 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> > > Hi folks,
> > >
> > > I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> > > It's critically helpful to improve function usability in SQL.
> > >
> > > https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> > >
> > > In short, it:
> > > - adds support for precise function reference with fully/partially
> > > qualified name
> > > - redefines function resolution order for ambiguous function reference
> > > - adds support for Hive's rich built-in functions (support for Hive user
> > > defined functions was already added in 1.9.0)
> > > - clarifies the concept of temporary functions
> > >
> > > Would love to hear your thoughts.
> > >
> > > Bowen
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
This sounds exactly as the module approach I mentioned, no?

Regards,
Timo

On 03.09.19 13:42, Danny Chan wrote:
> Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make our function usage more user friendly.
>
> For the topic of how to organize the builtin operators and operators of Hive, here is a solution from Apache Calcite, the Calcite way is to make every dialect operators a “Library”, user can specify which libraries they want to use for a sql query. The builtin operators always comes as the first class objects and the others are used from the order they appears. Maybe you can take a reference.
>
> [1] https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
>
> Best,
> Danny Chan
> 在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
>> Hi folks,
>>
>> I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
>> It's critically helpful to improve function usability in SQL.
>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>
>> In short, it:
>> - adds support for precise function reference with fully/partially
>> qualified name
>> - redefines function resolution order for ambiguous function reference
>> - adds support for Hive's rich built-in functions (support for Hive user
>> defined functions was already added in 1.9.0)
>> - clarifies the concept of temporary functions
>>
>> Would love to hear your thoughts.
>>
>> Bowen



Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Danny Chan <yu...@gmail.com>.
Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make our function usage more user friendly.

For the topic of how to organize the builtin operators and operators of Hive, here is a solution from Apache Calcite, the Calcite way is to make every dialect operators a “Library”, user can specify which libraries they want to use for a sql query. The builtin operators always comes as the first class objects and the others are used from the order they appears. Maybe you can take a reference.

[1] https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28

Best,
Danny Chan
在 2019年8月28日 +0800 AM2:50,Bowen Li <bo...@gmail.com>,写道:
> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> It's critically helpful to improve function usability in SQL.
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with fully/partially
> qualified name
> - redefines function resolution order for ambiguous function reference
> - adds support for Hive's rich built-in functions (support for Hive user
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Timo Walther <tw...@apache.org>.
Hi Bowen,

thanks for your proposal. Here are some thoughts:

1) We should not have the restriction "hive built-in functions can only 
be used when current catalog is hive catalog". Switching a catalog 
should only have implications on the cat.db.object resolution but not 
functions. It would be quite convinient for users to use Hive built-ins 
even if they use a Confluent schema registry or just the in-memory catalog.

2) I would propose to have separate concepts for catalog and built-in 
functions. In particular it would be nice to modularize built-in 
functions. Some built-in functions are very crucial (like AS, CAST, 
MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe 
we add more experimental functions in the future or function for some 
special application area (Geo functions, ML functions). A data platform 
team might not want to make every built-in function available. Or a 
function module like ML functions is in a different Maven module.

3) Following the suggestion above, we can have a separate discovery 
mechanism for built-in functions. Instead of just going through a static 
list like in BuiltInFunctionDefinitions, a platform team should be able 
to select function modules like 
catalogManager.setFunctionModules(CoreFunctions, GeoFunctions, 
HiveFunctions) or via service discovery;

3) Dawid and I discussed the resulution order again. I agree with Kurt 
that we should unify built-in function (external or internal) under a 
common layer. However, the resolution order should be:
   1. built-in functions
   2. temporary functions
   3. regular catalog resolution logic
Otherwise a temporary function could cause clashes with Flink's built-in 
functions. If you take a look at other vendors, like SQL Server they 
also do not allow to overwrite built-in functions.

Regards,
Timo


On 03.09.19 10:35, JingsongLee wrote:
> Thanks Bowen:
>
> +1 for this. And +1 to Kurt's suggestion. My other points are:
>
> 1.Hive built-in functions is an intermediate solution. So we should
>   not introduce interfaces to influence the framework. To make
>   Flink itself more powerful, we should implement the functions
>   we need to add.
>
> 2.Non-flink built-in functions are easy for users to change their
> behavior. If we support some flink built-in functions in the
>   future but act differently from non-flink built-in, this will lead to
>   changes in user behavior.
>
> 3.Fallback to Non-flink built-in functions is a bad choice to
>   performance. Without flink internal codegen and data format,
>   and bring data format conversion, the performance is not so
>   good.
>
> We need to support more complete hive jobs now, we need to
>   have this fallback strategy. But it's not worth adding this
>   concept at the catalog interface level, and it's not worth
>   encouraging other catalogs to do so.
>
> Another question is, does this fallback include all
>   hive built-in functions? As far as I know, some hive functions
>   have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <yk...@gmail.com>
> Send Time:2019年9月3日(星期二) 15:41
> To:dev <de...@flink.apache.org>
> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
>
> Thanks Bowen for driving this.
>
> +1 for the general idea. It makes the function resolved behavior more
> clear and deterministic. Besides, the user can use all hive built-in
> functions, which is a great feature.
>
> I only have one comment, but maybe it may touch your design so I think
> it would make sense to reply this mail instead of comment on google doc.
> Regarding to the classfication of functions, you currently have 4 types
> of functions, which are:
> 1. temporary functions
> 2. Flink built-in functions
> 3. Hive built-in functions (or generalized as external built-in functions)
> 4. catalog functions
>
> What I want to propose is we can merge #3 and #4, make them both under
> "catalog" concept, by extending catalog function to make it have ability to
> have built-in catalog functions. Some benefits I can see from this approach:
> 1. We don't have to introduce new concept like external built-in functions.
> Actually
> I don't see a full story about how to treat a built-in functions, and it
> seems a little
> bit disrupt with catalog. As a result, you have to make some restriction
> like "hive
> built-in functions can only be used when current catalog is hive catalog".
>
> 2. It makes us easier to adopt another system's built-in functions to
> Flink, such as
> MySQL. If we don't treat uniformly with  "external built-in functions" and
> "external
> catalog function", things like user set current catalog to hive but want to
> use MySQL's
> built-in function will happen.
>
> One more thing, follow this approach, it's clear for your question about
> how to support
> external built-in functions, which is "add a  getBuiltInFunction to current
> Catalog API".
>
> What do you think?
>
> Best,
> Kurt
>
>
> On Fri, Aug 30, 2019 at 7:14 AM Bowen Li <bo...@gmail.com> wrote:
>
>> Thanks everyone for the feedback.
>>
>> I have updated the document accordingly. Here're the summary of changes:
>>
>> - clarify the concept of temporary functions, to facilitate deciding
>> function resolution order
>> - provide two options to support Hive built-in functions, with the 2nd one
>> being preferred
>> - add detailed prototype code for FunctionCatalog#lookupFunction(name)
>> - move the section of ”rename existing FunctionCatalog APIs in favor of
>> temporary functions“ out of the scope of the FLIP
>> - add another reasonable limitation for function resolution, to not
>> consider resolving overloaded functions - those with the same name but
>> different params. (It's still valid to have a single function with
>> overloaded eval() methods)
>>
>> Please take another look.
>>
>> Thanks,
>> Bowen
>>
>> On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <bo...@gmail.com> wrote:
>>
>>> Hi folks,
>>>
>>> I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
>>> It's critically helpful to improve function usability in SQL.
>>>
>>>
>>>
>> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>>> In short, it:
>>> - adds support for precise function reference with fully/partially
>>> qualified name
>>> - redefines function resolution order for ambiguous function reference
>>> - adds support for Hive's rich built-in functions (support for Hive user
>>> defined functions was already added in 1.9.0)
>>> - clarifies the concept of temporary functions
>>>
>>> Would love to hear your thoughts.
>>>
>>> Bowen
>>>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by JingsongLee <lz...@aliyun.com.INVALID>.
Thanks Bowen:

+1 for this. And +1 to Kurt's suggestion. My other points are:

1.Hive built-in functions is an intermediate solution. So we should
 not introduce interfaces to influence the framework. To make
 Flink itself more powerful, we should implement the functions
 we need to add.

2.Non-flink built-in functions are easy for users to change their 
behavior. If we support some flink built-in functions in the
 future but act differently from non-flink built-in, this will lead to
 changes in user behavior.

3.Fallback to Non-flink built-in functions is a bad choice to
 performance. Without flink internal codegen and data format,
 and bring data format conversion, the performance is not so
 good.

We need to support more complete hive jobs now, we need to
 have this fallback strategy. But it's not worth adding this
 concept at the catalog interface level, and it's not worth
 encouraging other catalogs to do so.

Another question is, does this fallback include all
 hive built-in functions? As far as I know, some hive functions
 have some hacky. If possible, can we start with a white list?
Once we implement some functions to flink built-in, we can
also update the whitelist.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <yk...@gmail.com>
Send Time:2019年9月3日(星期二) 15:41
To:dev <de...@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Thanks Bowen for driving this.

+1 for the general idea. It makes the function resolved behavior more
clear and deterministic. Besides, the user can use all hive built-in
functions, which is a great feature.

I only have one comment, but maybe it may touch your design so I think
it would make sense to reply this mail instead of comment on google doc.
Regarding to the classfication of functions, you currently have 4 types
of functions, which are:
1. temporary functions
2. Flink built-in functions
3. Hive built-in functions (or generalized as external built-in functions)
4. catalog functions

What I want to propose is we can merge #3 and #4, make them both under
"catalog" concept, by extending catalog function to make it have ability to
have built-in catalog functions. Some benefits I can see from this approach:
1. We don't have to introduce new concept like external built-in functions.
Actually
I don't see a full story about how to treat a built-in functions, and it
seems a little
bit disrupt with catalog. As a result, you have to make some restriction
like "hive
built-in functions can only be used when current catalog is hive catalog".

2. It makes us easier to adopt another system's built-in functions to
Flink, such as
MySQL. If we don't treat uniformly with  "external built-in functions" and
"external
catalog function", things like user set current catalog to hive but want to
use MySQL's
built-in function will happen.

One more thing, follow this approach, it's clear for your question about
how to support
external built-in functions, which is "add a  getBuiltInFunction to current
Catalog API".

What do you think?

Best,
Kurt


On Fri, Aug 30, 2019 at 7:14 AM Bowen Li <bo...@gmail.com> wrote:

> Thanks everyone for the feedback.
>
> I have updated the document accordingly. Here're the summary of changes:
>
> - clarify the concept of temporary functions, to facilitate deciding
> function resolution order
> - provide two options to support Hive built-in functions, with the 2nd one
> being preferred
> - add detailed prototype code for FunctionCatalog#lookupFunction(name)
> - move the section of ”rename existing FunctionCatalog APIs in favor of
> temporary functions“ out of the scope of the FLIP
> - add another reasonable limitation for function resolution, to not
> consider resolving overloaded functions - those with the same name but
> different params. (It's still valid to have a single function with
> overloaded eval() methods)
>
> Please take another look.
>
> Thanks,
> Bowen
>
> On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> > It's critically helpful to improve function usability in SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with fully/partially
> > qualified name
> > - redefines function resolution order for ambiguous function reference
> > - adds support for Hive's rich built-in functions (support for Hive user
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Kurt Young <yk...@gmail.com>.
Thanks Bowen for driving this.

+1 for the general idea. It makes the function resolved behavior more
clear and deterministic. Besides, the user can use all hive built-in
functions, which is a great feature.

I only have one comment, but maybe it may touch your design so I think
it would make sense to reply this mail instead of comment on google doc.
Regarding to the classfication of functions, you currently have 4 types
of functions, which are:
1. temporary functions
2. Flink built-in functions
3. Hive built-in functions (or generalized as external built-in functions)
4. catalog functions

What I want to propose is we can merge #3 and #4, make them both under
"catalog" concept, by extending catalog function to make it have ability to
have built-in catalog functions. Some benefits I can see from this approach:
1. We don't have to introduce new concept like external built-in functions.
Actually
I don't see a full story about how to treat a built-in functions, and it
seems a little
bit disrupt with catalog. As a result, you have to make some restriction
like "hive
built-in functions can only be used when current catalog is hive catalog".

2. It makes us easier to adopt another system's built-in functions to
Flink, such as
MySQL. If we don't treat uniformly with  "external built-in functions" and
"external
catalog function", things like user set current catalog to hive but want to
use MySQL's
built-in function will happen.

One more thing, follow this approach, it's clear for your question about
how to support
external built-in functions, which is "add a  getBuiltInFunction to current
Catalog API".

What do you think?

Best,
Kurt


On Fri, Aug 30, 2019 at 7:14 AM Bowen Li <bo...@gmail.com> wrote:

> Thanks everyone for the feedback.
>
> I have updated the document accordingly. Here're the summary of changes:
>
> - clarify the concept of temporary functions, to facilitate deciding
> function resolution order
> - provide two options to support Hive built-in functions, with the 2nd one
> being preferred
> - add detailed prototype code for FunctionCatalog#lookupFunction(name)
> - move the section of ”rename existing FunctionCatalog APIs in favor of
> temporary functions“ out of the scope of the FLIP
> - add another reasonable limitation for function resolution, to not
> consider resolving overloaded functions - those with the same name but
> different params. (It's still valid to have a single function with
> overloaded eval() methods)
>
> Please take another look.
>
> Thanks,
> Bowen
>
> On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <bo...@gmail.com> wrote:
>
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> > It's critically helpful to improve function usability in SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with fully/partially
> > qualified name
> > - redefines function resolution order for ambiguous function reference
> > - adds support for Hive's rich built-in functions (support for Hive user
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Posted by Bowen Li <bo...@gmail.com>.
Thanks everyone for the feedback.

I have updated the document accordingly. Here're the summary of changes:

- clarify the concept of temporary functions, to facilitate deciding
function resolution order
- provide two options to support Hive built-in functions, with the 2nd one
being preferred
- add detailed prototype code for FunctionCatalog#lookupFunction(name)
- move the section of ”rename existing FunctionCatalog APIs in favor of
temporary functions“ out of the scope of the FLIP
- add another reasonable limitation for function resolution, to not
consider resolving overloaded functions - those with the same name but
different params. (It's still valid to have a single function with
overloaded eval() methods)

Please take another look.

Thanks,
Bowen

On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <bo...@gmail.com> wrote:

> Hi folks,
>
> I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> It's critically helpful to improve function usability in SQL.
>
>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
>
> In short, it:
> - adds support for precise function reference with fully/partially
> qualified name
> - redefines function resolution order for ambiguous function reference
> - adds support for Hive's rich built-in functions (support for Hive user
> defined functions was already added in 1.9.0)
> - clarifies the concept of temporary functions
>
> Would love to hear your thoughts.
>
> Bowen
>