You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Dawid Wysakowicz <dw...@apache.org> on 2019/10/01 12:29:01 UTC

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Hi,

I think we're really close to a very good design for the functions.
Thank you to all involved in the discussion and especially Bowen for
driving the FLIP.

The last part I'm missing in the FLIP is a bit more elaborate
description of the FunctionIdentifier. I gave it a bit more thought and
discussed it with Timo and I think the purpose of the FunctionIdentifier
is to unambigiously reference a function. That said it should be
effectively either identifier of a built-in function or a catalog
function. In order to do that I would suggest a slight change to the class:

class FunctionIdentifier {

  public FunctionIdentifier ofBuiltinFunction(String name) {

        return new FunctionIdentifier(name, null);

  }

  public FunctionIdentifier ofCatalogFunction(ObjectIdentifier identifier){

        return new FunctionIdentifier(null, identifier);

  }

  public Optional<String> getBuiltinFunctionIdentifier() {

       return Optional.ofNullable(name);

  }

  public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {

       return Optional.ofNullable(identifier);

  }

  private FunctionIdentifier(String name, ObjectIdentifier identifier) {

     this.name = name;

     this.identifier = identifier

  }

}

Moreover as we are changing the way functions are referenced we need to
update classes like UnresolvedCallExpression and CallExpression. As
those are PublicEvolving classes we should cover those changes in the
FLIP. Lastly, as those classes need a serializable representation, we
need a serializable representation for FunctionIdentifier as well.
Therefore we need to add a method to the FunctionIdentifier class:

    public String asSerializableString() {...}

similar to the method in the ObjectIdentifier.

Let me know what do you think.

Best,

Dawid

On 30/09/2019 23:23, Bowen Li wrote:
> Hi all,
>
> I've updated the FLIP wiki with the following changes:
>
> - Lifespan of temp functions are not tied to those of catalogs and
> databases. Users can create temp functions even though catalogs/dbs in
> their fully qualified names don't even exist.
> - some new SQL commands
>     - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
> functions, and names of temp and catalog functions in the current catalog
> and db
>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
> functions, and fully qualified names of temp and catalog functions in all
> catalogs and dbs
>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
> functions in all catalog and db
>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
> functions
>
> Let me know if you have any questions.
>
> Seems we have resolved all concerns. If there's no more ones, I'd like to
> close the vote by this time tomorrow.
>
> Cheers,
> Bowen
>
> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <bo...@gmail.com> wrote:
>
>> Hi,
>>
>> I think above are some valid points, and we can adopt the suggestions.
>>
>> To elaborate a bit on the new SQL syntax, it would imply that, unlike
>> "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
>> FUNCTIONS" would return functions' fully qualified names with catalog and
>> db names.
>>
>>
>>
>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <tw...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> I support Fabian's arguments. In my opinion, temporary objects should
>>> just be an additional layer on top of the regular catalog/database
>>> lookup logic. Thus, a temporary table or function has always highest
>>> precedence and should be stable within the local session. Otherwise it
>>> could magically disappear while someone else is performing modifications
>>> in the catalog.
>>>
>>> Furthermore, this feature is very useful for prototyping as users can
>>> simply express that a catalog/database is present even through they
>>> might not have access to it currently.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 30.09.19 14:57, Fabian Hueske wrote:
>>>> Hi all,
>>>>
>>>> Sorry for the late reply.
>>>>
>>>> I think it might lead to confusing situations if temporary functions (or
>>>> any temporary db objects for that matter) are bound to the life cycle
>>> of an
>>>> (external) db/catalog.
>>>> Imaging a situation where you create a temp function in a db in an
>>> external
>>>> catalog and use it but at some point it does not work anymore because
>>> some
>>>> other dropped the database from the external catalog.
>>>> Shouldn't temporary objects be only controlled by the owner of a
>>> session?
>>>> I agree that creating temp objects in non-existing db/catalogs sounds a
>>> bit
>>>> strange, but IMO the opposite (the db/catalog must exist for a temp
>>>> function to be created/exist) can have significant implications like the
>>>> one I described.
>>>> I think it would be quite easy for users to understand that temporary
>>>> objects are solely owned by them (and their session).
>>>> The problem of listing temporary objects could be solved by adding a ALL
>>>> [TEMPORARY] clause:
>>>>
>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
>>>> catalog/database including temporary functions.
>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>>> regardless
>>>> of the catalog/database.
>>>>
>>>> Best,
>>>> Fabian
>>>>
>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>>> bowenli86@gmail.com>:
>>>>> @Dawid, do you have any other concerns? If not, I hope we can close the
>>>>> voting.
>>>>>
>>>>>
>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <li...@gmail.com> wrote:
>>>>>
>>>>>> I'm not sure how much benefit #1 can bring us. If users just want to
>>> try
>>>>>> out temporary functions, they can create temporary system functions
>>> which
>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
>>> temporary
>>>>>> catalog function is to let users override permanent catalog functions.
>>>>>> Therefore a temporary function in a non-existing catalog won't serve
>>> that
>>>>>> purpose. Besides, each session is provided with a default catalog and
>>> DB.
>>>>>> So even if users simply want to create some catalog functions they can
>>>>>> forget about after the session, wouldn't the default catalog/DB be
>>> enough
>>>>>> for such experiments?
>>>>>>
>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <bo...@gmail.com> wrote:
>>>>>>
>>>>>>> Re 1) As described in the FLIP, a temp function lookup will first
>>> make
>>>>>> sure
>>>>>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
>>>>>> remove
>>>>>>> that temp function.
>>>>>>>
>>>>>>> I agree Hive doesn't handle it consistently, and we are not copying
>>>>> Hive.
>>>>>>> IMHO, allowing registering temp functions in nonexistent catalog/db
>>> is
>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
>>> system
>>>>>>> functions and functions in the current catalog/db, since users cannot
>>>>>>> designate a nonexistent catalog/db as current ones, how can they list
>>>>>>> functions in nonexistent catalog/db? They may end up never knowing
>>> what
>>>>>>> temp functions they've created unless trying out with queries or we
>>>>>>> introducing some more nonstandard SQL statements. The same applies to
>>>>>> other
>>>>>>> temp objects like temp tables.
>>>>>>>
>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>>>>>>>
>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>>>>>>> wysakowicz.dawid@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ad. 1
>>>>>>>> I wouldn't say it is hacky.
>>>>>>>> Moreover, how do you want ensure that the dB always exists when a
>>>>>>> temporary
>>>>>>>> object is used?( in this particular case function). Do you want to
>>>>>> query
>>>>>>>> for the database existence whenever e.g a temporary function is
>>>>> used? I
>>>>>>>> think important aspect here is that the database can be dropped from
>>>>>>>> external system, not just flink or a different flink session.
>>>>>>>>
>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
>>>>> database
>>>>>>> that
>>>>>>>> does not exist, that's true. But if you create a temporary table in
>>> a
>>>>>>>> database and drop that database from a different session, you can
>>>>> still
>>>>>>>> query the previously created temporary table from the original
>>>>> session.
>>>>>>> It
>>>>>>>> does not sound like a consistent behaviour to me. Why don't we make
>>>>>> this
>>>>>>>> behaviour of not binding a temporary objects to the lifetime of a
>>>>>>> database
>>>>>>>> explicit as part of the temporary objects contract? In the end they
>>>>>> exist
>>>>>>>> in different layers. Permanent objects & databases in a catalog (in
>>>>>> case
>>>>>>> of
>>>>>>>> hive megastore) whereas temporary objects in flink sessions. That's
>>>>>> also
>>>>>>>> true for the original hive client. The temporary objects live in the
>>>>>> hive
>>>>>>>> client whereas databases are created in the metastore.
>>>>>>>>
>>>>>>>> Ad.2
>>>>>>>> I'm open for suggestions here. The one thing I wanted to achieve
>>> here
>>>>>> is
>>>>>>> so
>>>>>>>> that we do not change the contract of ObjectIdentifier. One
>>> important
>>>>>>> thing
>>>>>>>> to remember here is that we need the function identifier to be part
>>>>> of
>>>>>>> the
>>>>>>>> FunctionDefinition object and not only as the result of the function
>>>>>>>> lookup. At some point we want to be able to store QueryOperations in
>>>>>> the
>>>>>>>> catalogs. They can contain function calls within which we need to
>>>>> have
>>>>>>> the
>>>>>>>> identifier.
>>>>>>>>
>>>>>>>> I agree my initial suggestion is over complicated. How about we have
>>>>>> just
>>>>>>>> the FunctionIdentifier as top level class without making the
>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the same
>>>>> what
>>>>>>> you
>>>>>>>> suggested. The only difference is that it would be a top level class
>>>>>>> with a
>>>>>>>> more descriptive name.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <bo...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sorry, I missed some parts of the solution. The complete
>>>>> alternative
>>>>>> is
>>>>>>>> the
>>>>>>>>> following, basically having separate APIs in FunctionLookup for
>>>>>>> ambiguous
>>>>>>>>> and precise function lookup since planner is able to tell which API
>>>>>> to
>>>>>>>> call
>>>>>>>>> with parsed queries, and have a unified result:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> class FunctionLookup {
>>>>>>>>>
>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> class Result {
>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>>>>>>>>>      String getName() { ... }
>>>>>>>>>      // ...
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Bowen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <bo...@gmail.com>
>>>>>> wrote:
>>>>>>>>>> Hi Dawid,
>>>>>>>>>>
>>>>>>>>>> Re 1): I agree making it easy for users to run experiments is
>>>>>>>> important.
>>>>>>>>>> However, I'm not sure allowing users to register temp functions
>>>>> in
>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
>>>>>> and
>>>>>>>>> breaks
>>>>>>>>>> the current contract between Flink and users that catalog/db must
>>>>>> be
>>>>>>>>> valid
>>>>>>>>>> in order to operate on.
>>>>>>>>>>
>>>>>>>>>> How about we instead focus on making it convenient to create
>>>>>>> catalogs?
>>>>>>>>>> Users actually can already do it with ease via program or SQL CLI
>>>>>>> yaml
>>>>>>>>> file
>>>>>>>>>> for an in-memory catalog which has neither extra dependency nor
>>>>>>>> external
>>>>>>>>>> connections. What we can further improve is DDL for catalogs,
>>>>> and I
>>>>>>>>> raised
>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
>>>>> by
>>>>>>>> Terry
>>>>>>>>>> now.
>>>>>>>>>>
>>>>>>>>>> In that case, if users would like to experiment via SQL, they can
>>>>>>>> easily
>>>>>>>>>> create an in memory catalog/database using DDL, then play with
>>>>> temp
>>>>>>>>>> functions.
>>>>>>>>>>
>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
>>>>> not
>>>>>>>> been
>>>>>>>>>> resolved when stack call reaches
>>>>> FunctionCatalog#lookupFunction(),
>>>>>>> but
>>>>>>>>> only
>>>>>>>>>> been parsed?
>>>>>>>>>>
>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with
>>>>>> the
>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass of
>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>>>>>>>>>>
>>>>>>>>>> Another potentially simpler way is:
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> // in class FunctionLookup
>>>>>>>>>> class Result {
>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>>>>>>>>>>      String getName() { ... }
>>>>>>>>>>      ...
>>>>>>>>>> }
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>>>>>>>>>> wysakowicz.dawid@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I really like the flip and think it clarifies important aspects
>>>>> of
>>>>>>> the
>>>>>>>>>>> system.
>>>>>>>>>>>
>>>>>>>>>>> I have two, I hope small suggestions, which will not take much
>>>>>> time
>>>>>>> to
>>>>>>>>>>> agree on.
>>>>>>>>>>>
>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>>>>> existence
>>>>>> of
>>>>>>>>>>> cat/db
>>>>>>>>>>> for temporary functions? That means not to check it, so e.g.
>>>>> it's
>>>>>>>>> possible
>>>>>>>>>>> to create a temporary function in a database that does not
>>>>> exist.
>>>>>> I
>>>>>>>>> think
>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
>>>>>>> experiments
>>>>>>>>> but
>>>>>>>>>>> does not have access to the db yet or temporarily does not have
>>>>>>>>> connection
>>>>>>>>>>> to a catalog.
>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen
>>>>>> the
>>>>>>>>>>> requirements for all catalog objects such as tables, views,
>>>>> types
>>>>>>> just
>>>>>>>>> for
>>>>>>>>>>> the functions? It's really important later on from e.g the
>>>>>>>>> serializability
>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier is
>>>>> that
>>>>>> we
>>>>>>>>> know
>>>>>>>>>>> that it has been resolved. The suggested changes break that
>>>>>>>> assumption.
>>>>>>>>>>> What do you think about adding an interface FunctionIdentifier {
>>>>>>>>>>>
>>>>>>>>>>> String getName();
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>>>>> function.
>>>>>>>>>>> */
>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>>>>>>>>>>   return Optional.of(this);
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
>>>>> {...}
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <us...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1. LGTM
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>>>>> zjuwangg@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>> +1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Terry Wang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <yk...@gmail.com> 写道:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>>>>>> bowenli86@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
>>>>>>> we've
>>>>>>>>>>> reached
>>>>>>>>>>>>>>> consensus in [2].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>>>>>> UTC,
>>>>>>>> Sep
>>>>>>>>>>> 26.
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>>>>>>>>>>> --
>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>
>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>