You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Samrat Deb <de...@gmail.com> on 2022/12/03 04:29:16 UTC

[DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Hi everyone,

I would like to open a discussion[1] on providing GlueCatalog support
in Flink.
Currently, Flink offers 3 major types of catalog[2]. Out of which only
HiveCatalog is a persistent catalog backed by Hive Metastore. We would like
to introduce GlueCatalog in Flink offering another option for users which
will be persistent in nature. Aws Glue data catalog is a centralized data
catalog in AWS cloud that provides integrations with many different
connectors[3]. Flink GlueCatalog can use the features provided by glue and
create strong integration with other services in the cloud.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/

[3]
https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro

[4] https://issues.apache.org/jira/browse/FLINK-29549

Bests
Samrat

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Samrat.
I have seen some users are asking for GlueCatalog support[1], it's really exciting that you're driving it. 
After a quick look of this Flip, I have some comments:

1: I noticed there's a YAML part in the section of "Using the Catalog", what do you mean by that? Do you mean how to use glue catalog in sql client? If so, just for your information, it's not supported to use yaml envrioment file in sql client[2].

2: Seems there's a typo in "Design#views" part, it contains "listTables" which I think shouldn't be contained. Also, I'm curious about how to list views using Glue API. Is there an on-hand api to list views directly or we need to list the tables and then filter the views using the table-kind?

3: In "Flink Glue DataType Mapping" part, CharType is mapped to String. It seems the char's size will lose, is it possible to have a better mapping which won't loss the size of char type?

4: About the "Flink CatalogFunction mapping with Glue Function" part, how do we map the function language in Flink's CatalogFunction.



[1] https://lists.apache.org/thread/pdd780wl4f26p447fohvm9osky2r9fhh
[2] https://issues.apache.org/jira/browse/FLINK-22540

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Samrat Deb" <de...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
抄送: "prabhujose gates" <pr...@gmail.com>
发送时间: 星期六, 2022年 12 月 03日 下午 12:29:16
主题: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Hi everyone,

I would like to open a discussion[1] on providing GlueCatalog support
in Flink.
Currently, Flink offers 3 major types of catalog[2]. Out of which only
HiveCatalog is a persistent catalog backed by Hive Metastore. We would like
to introduce GlueCatalog in Flink offering another option for users which
will be persistent in nature. Aws Glue data catalog is a centralized data
catalog in AWS cloud that provides integrations with many different
connectors[3]. Flink GlueCatalog can use the features provided by glue and
create strong integration with other services in the cloud.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/

[3]
https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro

[4] https://issues.apache.org/jira/browse/FLINK-29549

Bests
Samrat

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi All ,

Thank you for all your valuable suggestions and questions regarding the
proposals.

In case there are more queries or questions from the community , I will
keep this discussion Thread open for a couple of more days and proceed with
next steps.

Bests
Samrat

On Wed, Dec 14, 2022 at 9:41 PM Samrat Deb <de...@gmail.com> wrote:

>
>
> Thank you Danny for more insights on the flink-connector-aws-base[1].
>
> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>
>
> As GlueCatalog will be a part of flink-connector-aws-base. As per
> suggestion, we will reuse code and resources as much as possible and add
> extra things required in extensible manner.
>
> Bests,
> Samrat
>
>
> [1]
> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>
>
>
>
> On Tue, Dec 13, 2022 at 9:32 PM Danny Cranmer <da...@apache.org>
> wrote:
>
>> Hello Samrat,
>>
>> Sorry for the late response.
>>
>> +1 for a native Glue Data Catalog integration. We have
>> internally developed a Glue Data Catalog catalog implementation that shims
>> hive. We have been meaning to contribute, but this solution can replace our
>> internal one.
>>
>> +1 for putting this in the flink-connector-aws. With regards to
>> configuration, we have a flink-connector-aws-base [1] module where all the
>> common configurations should go. Anything common, such as authentication
>> providers, please use. Additionally any new configurations you need to add
>> please consider them going into aws-base if they might be reusable for
>> other AWS integrations.
>>
>> > We will create an e2e integration test cases capturing all the
>> implementation in a mock environment.
>>
>> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>>
>> Thanks,
>> Danny
>>
>> [1]
>> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
>> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>>
>> On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb <de...@gmail.com>
>> wrote:
>>
>>> Hi Konstantin Knauf,
>>>
>>> Can you explain how users are expected to authenticate with AWS Glue? I
>>>> don't see any catalog options regardng authx. So I assume the
>>>> credentials
>>>> are taken from the environment?
>>>
>>>
>>> We are planning to put GlueCatalog in flink-connector-aws[1].
>>> flink-connector-aws already provides base and already built AwsConfigs[2].
>>> These configs can be reused for the Catalog purpose also.
>>> I will update the FLIP-277[3] with the auth related configs in the
>>> Configuration Section.
>>>
>>> Users can pass these values as a part of config in catalog creation and
>>> if not provided it will try to fetch from the environment.
>>> This will allow users to create multiple catalog instances on the same
>>> session pointing to different accounts. ( I haven't tested multi
>>> account glue catalog instances during POC) .
>>>
>>> [1] https://github.com/apache/flink-connector-aws
>>> <https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java>
>>> [2]
>>> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
>>> [3]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>
>>> Bests,
>>> Samrat
>>>
>>> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <de...@gmail.com>
>>> wrote:
>>>
>>>> Hi Jark,
>>>> Apologies for late reply.
>>>> Thank you for your valuable input.
>>>>
>>>> Besides, I have a question about Glue Namespace. Could you share the
>>>>> documentation of the Glue
>>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>>> Metaspace Mapping" section,
>>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>>> database name in Flink is "ns1.mydb"?
>>>>
>>>> There is no concept of namespace in glue data catalog.
>>>> There are 3 levels in glue data catalog
>>>> - catalog
>>>> - database
>>>> - table
>>>>
>>>> I have added the mapping in FLIP-277[1]. and updated it .
>>>> it is directly database name from flink to database name in glue
>>>> Please ignore the typo leftover in doc previously.
>>>>
>>>> Best,
>>>> Samrat
>>>>
>>>>
>>>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <im...@gmail.com> wrote:
>>>>
>>>>> Hi Samrat,
>>>>>
>>>>> Thanks a lot for driving the new catalog, and sorry for jumping into
>>>>> the
>>>>> discussion late.
>>>>>
>>>>> As Flink SQL is becoming the first-class citizen of the Flink API, we
>>>>> are
>>>>> planning to push Catalog
>>>>> to become the first-class citizen of the connector instead of Source &
>>>>> Sink. For Flink SQL users,
>>>>> using Catalog is as natural and user-friendly as working with
>>>>> databases,
>>>>> rather than having to define
>>>>> DDL and schemas over and over again. This is also how Trino/Presto
>>>>> does.
>>>>>
>>>>> Regarding the repo for the Glue catalog, I think we can add it to
>>>>> flink-connector-aws. We don't need
>>>>> separate repos for Catalogs because Catalog is a kind of connector
>>>>> (others
>>>>> are sources & sinks).
>>>>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>>>>> flink-connector-jdbc, and HiveCatalog is
>>>>> in flink-connector-hive. This can reduce repository maintenance, and I
>>>>> think maybe some common
>>>>> AWS utils can be shared there.  cc @Danny Cranmer <
>>>>> dannycranmer@apache.org>
>>>>> what do you think about this?
>>>>>
>>>>> Besides, I have a question about Glue Namespace. Could you share the
>>>>> documentation of the Glue
>>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>>> Metaspace Mapping" section,
>>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>>> database name in Flink is "ns1.mydb"?
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>>>>> [2]:
>>>>>
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>>>>>
>>>>> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>>>>>
>>>>> > Hi Samrat,
>>>>> >
>>>>> > Sorry for the late reply. Yeah I am referring to creating a similar
>>>>> > external repo such as flink-catalog-glue. flink-connector-aws is
>>>>> already
>>>>> > named with `connector` so it seems a bit weird to put a catalog
>>>>> there.
>>>>> >
>>>>> > Thanks!
>>>>> > Dong
>>>>> >
>>>>> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > Hi Dong Lin,
>>>>> > >
>>>>> > > Since this is the first proposal for adding a vendor-specific
>>>>> catalog
>>>>> > > > library in Flink, I think maybe we should also externalize those
>>>>> > catalog
>>>>> > > > libraries similar to how we are externalizing connector
>>>>> libraries. It
>>>>> > is
>>>>> > > > likely that we might want to add catalogs for other vectors in
>>>>> the
>>>>> > > future.
>>>>> > > > Externalizing those catalogs can make Flink development more
>>>>> scalable
>>>>> > in
>>>>> > > > the long term.
>>>>> > >
>>>>> > > Initially i mis-interpretted externalising the catalogs, There
>>>>> already
>>>>> > > exists an externalised connector for aws [1].
>>>>> > > Are you referring to creating a similar external repo for catalogs
>>>>> or
>>>>> > will
>>>>> > > it be better to add it in flink-connector-aws[1] ?
>>>>> > >
>>>>> > > [1] https://github.com/apache/flink-connector-aws
>>>>> > >
>>>>> > > Samrat
>>>>> > >
>>>>> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > > Hi Dong Lin,
>>>>> > > >
>>>>> > > > Aws Glue Data catalog is vendor specific and in future we will
>>>>> get such
>>>>> > > > type of implementation from different providers. We should
>>>>> > > > definitely externalize these catalog libraries similar to flink
>>>>> > > connectors.
>>>>> > > > I am thinking of creating
>>>>> > > > flink-catalog similar to flink-connector under the root (flink).
>>>>> glue
>>>>> > > > catalog can be one of modules under the flink-catalog . Please
>>>>> suggest
>>>>> > if
>>>>> > > > there is a better structure we can create for catalogs.
>>>>> > > >
>>>>> > > >
>>>>> > > > It is mentioned in the FLIP that there will be two types of
>>>>> > SdkHttpClient
>>>>> > > >> supported based on the catalog option http-client.type. Is
>>>>> > > >> http-client.type
>>>>> > > >> a public config for the GlueCatalog? If yes, can we add this
>>>>> config to
>>>>> > > the
>>>>> > > >> "Configurations" section and explain how users should choose the
>>>>> > client
>>>>> > > >> type?
>>>>> > > >
>>>>> > > >
>>>>> > > > yes http-client.type is public config for the GlueCatalog. By
>>>>> default
>>>>> > > > client-type will be `urlconnection` , if user don't specify any
>>>>> > > connection
>>>>> > > > type.
>>>>> > > > I have updated the FLIP-277[1] #configuration section with all
>>>>> the
>>>>> > > configs
>>>>> > > > . Please review it again .
>>>>> > > >
>>>>> > > > [1]
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>>> > > >
>>>>> > > > Samrat
>>>>> > > >
>>>>> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordeapex@gmail.com
>>>>> >
>>>>> > wrote:
>>>>> > > >
>>>>> > > >> Hi Yuxia,
>>>>> > > >>
>>>>> > > >> Thank you for reviewing the flip and putting forward your
>>>>> observations
>>>>> > > >> and comments.
>>>>> > > >>
>>>>> > > >> 1: I noticed there's a YAML part in the section of "Using the
>>>>> > Catalog",
>>>>> > > >>> what do you mean by that? Do you mean how to use glue catalog
>>>>> in sql
>>>>> > > >>> client? If so, just for your information, it's not supported
>>>>> to use
>>>>> > > yaml
>>>>> > > >>> envrioment file in sql client[2].
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Thank you for attaching the jira ticket [1] . I missed the
>>>>> changes.
>>>>> > > >> There is a provision to register catalog directly through
>>>>> factory
>>>>> > > resources
>>>>> > > >> .
>>>>> > > >> - GenericInMemoryCatalog is defined through
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>>> > > >> - HiveCatalog is defined through
>>>>> > > >> path
>>>>> > >
>>>>> >
>>>>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>>> > > >> Similarly on the vendor specific module for Aws Glue we can
>>>>> define it.
>>>>> > > >>
>>>>> > > >> 2: Seems there's a typo in "Design#views" part, it contains
>>>>> > "listTables"
>>>>> > > >>> which I think shouldn't be contained.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Also, I'm curious about how to list views using Glue API. Is
>>>>> there an
>>>>> > > >>> on-hand api to list views directly or we need to list the
>>>>> tables and
>>>>> > > then
>>>>> > > >>> filter the views using the table-kind?
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> yes there is no in-hand api for list views directly , we need
>>>>> to list
>>>>> > > all
>>>>> > > >> tables and then filter the views based on attribute tableKind
>>>>> which
>>>>> > is a
>>>>> > > >> part of table object in api response.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
>>>>> > String.
>>>>> > > >>> It seems the char's size will lose, is it possible to have a
>>>>> better
>>>>> > > mapping
>>>>> > > >>> which won't loss the size of char type?
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Thanks for pointing this out ! I have updated the flip with the
>>>>> > correct
>>>>> > > >> type. Initilially i mapped chartype , varchar type to string but
>>>>> > > updated it
>>>>> > > >> to directly map to the same type .
>>>>> > > >>
>>>>> > > >>
>>>>> > > >>
>>>>> > > >>> 4: About the "Flink CatalogFunction mapping with Glue
>>>>> Function" part,
>>>>> > > >>> how do we map the function language in Flink's CatalogFunction.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
>>>>> attribute
>>>>> > > >> for function language. Here is how aws hive compatible
>>>>> metastore is
>>>>> > > mapping
>>>>> > > >> hive function to glue function[2]. We will append a prefix of
>>>>> Language
>>>>> > > in
>>>>> > > >> the function name itself indicating the language. I see this
>>>>> has been
>>>>> > > >> already done for the Hive Catalog [3]. We are thinking of
>>>>> implementing
>>>>> > > it
>>>>> > > >> in the same way.
>>>>> > > >>
>>>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
>>>>> > > >> [2]
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>>>>> > > >> [3]
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>>>> > > >>
>>>>> > > >> Samrat
>>>>> > > >>
>>>>> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
>>>>> wrote:
>>>>> > > >>
>>>>> > > >>> Hi Samrat,
>>>>> > > >>>
>>>>> > > >>> Thanks for the FLIP!
>>>>> > > >>>
>>>>> > > >>> Since this is the first proposal for adding a vendor-specific
>>>>> catalog
>>>>> > > >>> library in Flink, I think maybe we should also externalize
>>>>> those
>>>>> > > catalog
>>>>> > > >>> libraries similar to how we are externalizing connector
>>>>> libraries. It
>>>>> > > is
>>>>> > > >>> likely that we might want to add catalogs for other vectors in
>>>>> the
>>>>> > > >>> future.
>>>>> > > >>> Externalizing those catalogs can make Flink development more
>>>>> scalable
>>>>> > > in
>>>>> > > >>> the long term.
>>>>> > > >>>
>>>>> > > >>> It is mentioned in the FLIP that there will be two types of
>>>>> > > SdkHttpClient
>>>>> > > >>> supported based on the catalog option http-client.type. Is
>>>>> > > >>> http-client.type
>>>>> > > >>> a public config for the GlueCatalog? If yes, can we add this
>>>>> config
>>>>> > to
>>>>> > > >>> the
>>>>> > > >>> "Configurations" section and explain how users should choose
>>>>> the
>>>>> > client
>>>>> > > >>> type?
>>>>> > > >>>
>>>>> > > >>> Regards,
>>>>> > > >>> Dong
>>>>> > > >>>
>>>>> > > >>>
>>>>> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <
>>>>> decordeapex@gmail.com>
>>>>> > > >>> wrote:
>>>>> > > >>>
>>>>> > > >>> > Hi everyone,
>>>>> > > >>> >
>>>>> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
>>>>> > support
>>>>> > > >>> > in Flink.
>>>>> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of
>>>>> which
>>>>> > > only
>>>>> > > >>> > HiveCatalog is a persistent catalog backed by Hive
>>>>> Metastore. We
>>>>> > > would
>>>>> > > >>> like
>>>>> > > >>> > to introduce GlueCatalog in Flink offering another option
>>>>> for users
>>>>> > > >>> which
>>>>> > > >>> > will be persistent in nature. Aws Glue data catalog is a
>>>>> > centralized
>>>>> > > >>> data
>>>>> > > >>> > catalog in AWS cloud that provides integrations with many
>>>>> different
>>>>> > > >>> > connectors[3]. Flink GlueCatalog can use the features
>>>>> provided by
>>>>> > > glue
>>>>> > > >>> and
>>>>> > > >>> > create strong integration with other services in the cloud.
>>>>> > > >>> >
>>>>> > > >>> > [1]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>>> > > >>> >
>>>>> > > >>> > [2]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>>>> > > >>> >
>>>>> > > >>> > [3]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>>>> > > >>> >
>>>>> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>>>> > > >>> >
>>>>> > > >>> > Bests
>>>>> > > >>> > Samrat
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>>
>>>>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Thank you Danny for more insights on the flink-connector-aws-base[1].

It looks like localstack supports glue [2], we already use localstack for
> integration tests so we can follow suite here.


As GlueCatalog will be a part of flink-connector-aws-base. As per
suggestion, we will reuse code and resources as much as possible and add
extra things required in extensible manner.

Bests,
Samrat


[1]
https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
[2] https://docs.localstack.cloud/user-guide/aws/glue/




On Tue, Dec 13, 2022 at 9:32 PM Danny Cranmer <da...@apache.org>
wrote:

> Hello Samrat,
>
> Sorry for the late response.
>
> +1 for a native Glue Data Catalog integration. We have
> internally developed a Glue Data Catalog catalog implementation that shims
> hive. We have been meaning to contribute, but this solution can replace our
> internal one.
>
> +1 for putting this in the flink-connector-aws. With regards to
> configuration, we have a flink-connector-aws-base [1] module where all the
> common configurations should go. Anything common, such as authentication
> providers, please use. Additionally any new configurations you need to add
> please consider them going into aws-base if they might be reusable for
> other AWS integrations.
>
> > We will create an e2e integration test cases capturing all the
> implementation in a mock environment.
>
> It looks like localstack supports glue [2], we already use localstack for
> integration tests so we can follow suite here.
>
> Thanks,
> Danny
>
> [1]
> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>
> On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb <de...@gmail.com> wrote:
>
>> Hi Konstantin Knauf,
>>
>> Can you explain how users are expected to authenticate with AWS Glue? I
>>> don't see any catalog options regardng authx. So I assume the credentials
>>> are taken from the environment?
>>
>>
>> We are planning to put GlueCatalog in flink-connector-aws[1].
>> flink-connector-aws already provides base and already built AwsConfigs[2].
>> These configs can be reused for the Catalog purpose also.
>> I will update the FLIP-277[3] with the auth related configs in the
>> Configuration Section.
>>
>> Users can pass these values as a part of config in catalog creation and
>> if not provided it will try to fetch from the environment.
>> This will allow users to create multiple catalog instances on the same
>> session pointing to different accounts. ( I haven't tested multi
>> account glue catalog instances during POC) .
>>
>> [1] https://github.com/apache/flink-connector-aws
>> <https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java>
>> [2]
>> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
>> [3]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>
>> Bests,
>> Samrat
>>
>> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <de...@gmail.com> wrote:
>>
>>> Hi Jark,
>>> Apologies for late reply.
>>> Thank you for your valuable input.
>>>
>>> Besides, I have a question about Glue Namespace. Could you share the
>>>> documentation of the Glue
>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>> Metaspace Mapping" section,
>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>> database name in Flink is "ns1.mydb"?
>>>
>>> There is no concept of namespace in glue data catalog.
>>> There are 3 levels in glue data catalog
>>> - catalog
>>> - database
>>> - table
>>>
>>> I have added the mapping in FLIP-277[1]. and updated it .
>>> it is directly database name from flink to database name in glue
>>> Please ignore the typo leftover in doc previously.
>>>
>>> Best,
>>> Samrat
>>>
>>>
>>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <im...@gmail.com> wrote:
>>>
>>>> Hi Samrat,
>>>>
>>>> Thanks a lot for driving the new catalog, and sorry for jumping into the
>>>> discussion late.
>>>>
>>>> As Flink SQL is becoming the first-class citizen of the Flink API, we
>>>> are
>>>> planning to push Catalog
>>>> to become the first-class citizen of the connector instead of Source &
>>>> Sink. For Flink SQL users,
>>>> using Catalog is as natural and user-friendly as working with databases,
>>>> rather than having to define
>>>> DDL and schemas over and over again. This is also how Trino/Presto does.
>>>>
>>>> Regarding the repo for the Glue catalog, I think we can add it to
>>>> flink-connector-aws. We don't need
>>>> separate repos for Catalogs because Catalog is a kind of connector
>>>> (others
>>>> are sources & sinks).
>>>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>>>> flink-connector-jdbc, and HiveCatalog is
>>>> in flink-connector-hive. This can reduce repository maintenance, and I
>>>> think maybe some common
>>>> AWS utils can be shared there.  cc @Danny Cranmer <
>>>> dannycranmer@apache.org>
>>>> what do you think about this?
>>>>
>>>> Besides, I have a question about Glue Namespace. Could you share the
>>>> documentation of the Glue
>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>> Metaspace Mapping" section,
>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>> database name in Flink is "ns1.mydb"?
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>>
>>>> [1]:
>>>>
>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>>>> [2]:
>>>>
>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>>>>
>>>> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>>>>
>>>> > Hi Samrat,
>>>> >
>>>> > Sorry for the late reply. Yeah I am referring to creating a similar
>>>> > external repo such as flink-catalog-glue. flink-connector-aws is
>>>> already
>>>> > named with `connector` so it seems a bit weird to put a catalog there.
>>>> >
>>>> > Thanks!
>>>> > Dong
>>>> >
>>>> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Hi Dong Lin,
>>>> > >
>>>> > > Since this is the first proposal for adding a vendor-specific
>>>> catalog
>>>> > > > library in Flink, I think maybe we should also externalize those
>>>> > catalog
>>>> > > > libraries similar to how we are externalizing connector
>>>> libraries. It
>>>> > is
>>>> > > > likely that we might want to add catalogs for other vectors in the
>>>> > > future.
>>>> > > > Externalizing those catalogs can make Flink development more
>>>> scalable
>>>> > in
>>>> > > > the long term.
>>>> > >
>>>> > > Initially i mis-interpretted externalising the catalogs, There
>>>> already
>>>> > > exists an externalised connector for aws [1].
>>>> > > Are you referring to creating a similar external repo for catalogs
>>>> or
>>>> > will
>>>> > > it be better to add it in flink-connector-aws[1] ?
>>>> > >
>>>> > > [1] https://github.com/apache/flink-connector-aws
>>>> > >
>>>> > > Samrat
>>>> > >
>>>> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > Hi Dong Lin,
>>>> > > >
>>>> > > > Aws Glue Data catalog is vendor specific and in future we will
>>>> get such
>>>> > > > type of implementation from different providers. We should
>>>> > > > definitely externalize these catalog libraries similar to flink
>>>> > > connectors.
>>>> > > > I am thinking of creating
>>>> > > > flink-catalog similar to flink-connector under the root (flink).
>>>> glue
>>>> > > > catalog can be one of modules under the flink-catalog . Please
>>>> suggest
>>>> > if
>>>> > > > there is a better structure we can create for catalogs.
>>>> > > >
>>>> > > >
>>>> > > > It is mentioned in the FLIP that there will be two types of
>>>> > SdkHttpClient
>>>> > > >> supported based on the catalog option http-client.type. Is
>>>> > > >> http-client.type
>>>> > > >> a public config for the GlueCatalog? If yes, can we add this
>>>> config to
>>>> > > the
>>>> > > >> "Configurations" section and explain how users should choose the
>>>> > client
>>>> > > >> type?
>>>> > > >
>>>> > > >
>>>> > > > yes http-client.type is public config for the GlueCatalog. By
>>>> default
>>>> > > > client-type will be `urlconnection` , if user don't specify any
>>>> > > connection
>>>> > > > type.
>>>> > > > I have updated the FLIP-277[1] #configuration section with all the
>>>> > > configs
>>>> > > > . Please review it again .
>>>> > > >
>>>> > > > [1]
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>> > > >
>>>> > > > Samrat
>>>> > > >
>>>> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
>>>> > wrote:
>>>> > > >
>>>> > > >> Hi Yuxia,
>>>> > > >>
>>>> > > >> Thank you for reviewing the flip and putting forward your
>>>> observations
>>>> > > >> and comments.
>>>> > > >>
>>>> > > >> 1: I noticed there's a YAML part in the section of "Using the
>>>> > Catalog",
>>>> > > >>> what do you mean by that? Do you mean how to use glue catalog
>>>> in sql
>>>> > > >>> client? If so, just for your information, it's not supported to
>>>> use
>>>> > > yaml
>>>> > > >>> envrioment file in sql client[2].
>>>> > > >>
>>>> > > >>
>>>> > > >> Thank you for attaching the jira ticket [1] . I missed the
>>>> changes.
>>>> > > >> There is a provision to register catalog directly through factory
>>>> > > resources
>>>> > > >> .
>>>> > > >> - GenericInMemoryCatalog is defined through
>>>> > > >>
>>>> > >
>>>> >
>>>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>> > > >> - HiveCatalog is defined through
>>>> > > >> path
>>>> > >
>>>> >
>>>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>> > > >> Similarly on the vendor specific module for Aws Glue we can
>>>> define it.
>>>> > > >>
>>>> > > >> 2: Seems there's a typo in "Design#views" part, it contains
>>>> > "listTables"
>>>> > > >>> which I think shouldn't be contained.
>>>> > > >>
>>>> > > >>
>>>> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
>>>> > > >>
>>>> > > >>
>>>> > > >> Also, I'm curious about how to list views using Glue API. Is
>>>> there an
>>>> > > >>> on-hand api to list views directly or we need to list the
>>>> tables and
>>>> > > then
>>>> > > >>> filter the views using the table-kind?
>>>> > > >>
>>>> > > >>
>>>> > > >> yes there is no in-hand api for list views directly , we need to
>>>> list
>>>> > > all
>>>> > > >> tables and then filter the views based on attribute tableKind
>>>> which
>>>> > is a
>>>> > > >> part of table object in api response.
>>>> > > >>
>>>> > > >>
>>>> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
>>>> > String.
>>>> > > >>> It seems the char's size will lose, is it possible to have a
>>>> better
>>>> > > mapping
>>>> > > >>> which won't loss the size of char type?
>>>> > > >>
>>>> > > >>
>>>> > > >> Thanks for pointing this out ! I have updated the flip with the
>>>> > correct
>>>> > > >> type. Initilially i mapped chartype , varchar type to string but
>>>> > > updated it
>>>> > > >> to directly map to the same type .
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
>>>> part,
>>>> > > >>> how do we map the function language in Flink's CatalogFunction.
>>>> > > >>
>>>> > > >>
>>>> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
>>>> attribute
>>>> > > >> for function language. Here is how aws hive compatible metastore
>>>> is
>>>> > > mapping
>>>> > > >> hive function to glue function[2]. We will append a prefix of
>>>> Language
>>>> > > in
>>>> > > >> the function name itself indicating the language. I see this has
>>>> been
>>>> > > >> already done for the Hive Catalog [3]. We are thinking of
>>>> implementing
>>>> > > it
>>>> > > >> in the same way.
>>>> > > >>
>>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
>>>> > > >> [2]
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>>>> > > >> [3]
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>>> > > >>
>>>> > > >> Samrat
>>>> > > >>
>>>> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
>>>> wrote:
>>>> > > >>
>>>> > > >>> Hi Samrat,
>>>> > > >>>
>>>> > > >>> Thanks for the FLIP!
>>>> > > >>>
>>>> > > >>> Since this is the first proposal for adding a vendor-specific
>>>> catalog
>>>> > > >>> library in Flink, I think maybe we should also externalize those
>>>> > > catalog
>>>> > > >>> libraries similar to how we are externalizing connector
>>>> libraries. It
>>>> > > is
>>>> > > >>> likely that we might want to add catalogs for other vectors in
>>>> the
>>>> > > >>> future.
>>>> > > >>> Externalizing those catalogs can make Flink development more
>>>> scalable
>>>> > > in
>>>> > > >>> the long term.
>>>> > > >>>
>>>> > > >>> It is mentioned in the FLIP that there will be two types of
>>>> > > SdkHttpClient
>>>> > > >>> supported based on the catalog option http-client.type. Is
>>>> > > >>> http-client.type
>>>> > > >>> a public config for the GlueCatalog? If yes, can we add this
>>>> config
>>>> > to
>>>> > > >>> the
>>>> > > >>> "Configurations" section and explain how users should choose the
>>>> > client
>>>> > > >>> type?
>>>> > > >>>
>>>> > > >>> Regards,
>>>> > > >>> Dong
>>>> > > >>>
>>>> > > >>>
>>>> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <
>>>> decordeapex@gmail.com>
>>>> > > >>> wrote:
>>>> > > >>>
>>>> > > >>> > Hi everyone,
>>>> > > >>> >
>>>> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
>>>> > support
>>>> > > >>> > in Flink.
>>>> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of
>>>> which
>>>> > > only
>>>> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore.
>>>> We
>>>> > > would
>>>> > > >>> like
>>>> > > >>> > to introduce GlueCatalog in Flink offering another option for
>>>> users
>>>> > > >>> which
>>>> > > >>> > will be persistent in nature. Aws Glue data catalog is a
>>>> > centralized
>>>> > > >>> data
>>>> > > >>> > catalog in AWS cloud that provides integrations with many
>>>> different
>>>> > > >>> > connectors[3]. Flink GlueCatalog can use the features
>>>> provided by
>>>> > > glue
>>>> > > >>> and
>>>> > > >>> > create strong integration with other services in the cloud.
>>>> > > >>> >
>>>> > > >>> > [1]
>>>> > > >>> >
>>>> > > >>> >
>>>> > > >>>
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>> > > >>> >
>>>> > > >>> > [2]
>>>> > > >>> >
>>>> > > >>> >
>>>> > > >>>
>>>> > >
>>>> >
>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>>> > > >>> >
>>>> > > >>> > [3]
>>>> > > >>> >
>>>> > > >>> >
>>>> > > >>>
>>>> > >
>>>> >
>>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>>> > > >>> >
>>>> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>>> > > >>> >
>>>> > > >>> > Bests
>>>> > > >>> > Samrat
>>>> > > >>> >
>>>> > > >>>
>>>> > > >>
>>>> > >
>>>> >
>>>>
>>>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Danny Cranmer <da...@apache.org>.
Hello Samrat,

Sorry for the late response.

+1 for a native Glue Data Catalog integration. We have
internally developed a Glue Data Catalog catalog implementation that shims
hive. We have been meaning to contribute, but this solution can replace our
internal one.

+1 for putting this in the flink-connector-aws. With regards to
configuration, we have a flink-connector-aws-base [1] module where all the
common configurations should go. Anything common, such as authentication
providers, please use. Additionally any new configurations you need to add
please consider them going into aws-base if they might be reusable for
other AWS integrations.

> We will create an e2e integration test cases capturing all the
implementation in a mock environment.

It looks like localstack supports glue [2], we already use localstack for
integration tests so we can follow suite here.

Thanks,
Danny

[1]
https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
[2] https://docs.localstack.cloud/user-guide/aws/glue/

On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb <de...@gmail.com> wrote:

> Hi Konstantin Knauf,
>
> Can you explain how users are expected to authenticate with AWS Glue? I
>> don't see any catalog options regardng authx. So I assume the credentials
>> are taken from the environment?
>
>
> We are planning to put GlueCatalog in flink-connector-aws[1].
> flink-connector-aws already provides base and already built AwsConfigs[2].
> These configs can be reused for the Catalog purpose also.
> I will update the FLIP-277[3] with the auth related configs in the
> Configuration Section.
>
> Users can pass these values as a part of config in catalog creation and if
> not provided it will try to fetch from the environment.
> This will allow users to create multiple catalog instances on the same
> session pointing to different accounts. ( I haven't tested multi
> account glue catalog instances during POC) .
>
> [1] https://github.com/apache/flink-connector-aws
> <https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java>
> [2]
> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>
> Bests,
> Samrat
>
> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <de...@gmail.com> wrote:
>
>> Hi Jark,
>> Apologies for late reply.
>> Thank you for your valuable input.
>>
>> Besides, I have a question about Glue Namespace. Could you share the
>>> documentation of the Glue
>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>> Metaspace Mapping" section,
>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>> database name in Flink is "ns1.mydb"?
>>
>> There is no concept of namespace in glue data catalog.
>> There are 3 levels in glue data catalog
>> - catalog
>> - database
>> - table
>>
>> I have added the mapping in FLIP-277[1]. and updated it .
>> it is directly database name from flink to database name in glue
>> Please ignore the typo leftover in doc previously.
>>
>> Best,
>> Samrat
>>
>>
>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <im...@gmail.com> wrote:
>>
>>> Hi Samrat,
>>>
>>> Thanks a lot for driving the new catalog, and sorry for jumping into the
>>> discussion late.
>>>
>>> As Flink SQL is becoming the first-class citizen of the Flink API, we are
>>> planning to push Catalog
>>> to become the first-class citizen of the connector instead of Source &
>>> Sink. For Flink SQL users,
>>> using Catalog is as natural and user-friendly as working with databases,
>>> rather than having to define
>>> DDL and schemas over and over again. This is also how Trino/Presto does.
>>>
>>> Regarding the repo for the Glue catalog, I think we can add it to
>>> flink-connector-aws. We don't need
>>> separate repos for Catalogs because Catalog is a kind of connector
>>> (others
>>> are sources & sinks).
>>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>>> flink-connector-jdbc, and HiveCatalog is
>>> in flink-connector-hive. This can reduce repository maintenance, and I
>>> think maybe some common
>>> AWS utils can be shared there.  cc @Danny Cranmer <
>>> dannycranmer@apache.org>
>>> what do you think about this?
>>>
>>> Besides, I have a question about Glue Namespace. Could you share the
>>> documentation of the Glue
>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>> Metaspace Mapping" section,
>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>> database name in Flink is "ns1.mydb"?
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> [1]:
>>>
>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>>> [2]:
>>>
>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>>>
>>> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>>>
>>> > Hi Samrat,
>>> >
>>> > Sorry for the late reply. Yeah I am referring to creating a similar
>>> > external repo such as flink-catalog-glue. flink-connector-aws is
>>> already
>>> > named with `connector` so it seems a bit weird to put a catalog there.
>>> >
>>> > Thanks!
>>> > Dong
>>> >
>>> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com>
>>> wrote:
>>> >
>>> > > Hi Dong Lin,
>>> > >
>>> > > Since this is the first proposal for adding a vendor-specific catalog
>>> > > > library in Flink, I think maybe we should also externalize those
>>> > catalog
>>> > > > libraries similar to how we are externalizing connector libraries.
>>> It
>>> > is
>>> > > > likely that we might want to add catalogs for other vectors in the
>>> > > future.
>>> > > > Externalizing those catalogs can make Flink development more
>>> scalable
>>> > in
>>> > > > the long term.
>>> > >
>>> > > Initially i mis-interpretted externalising the catalogs, There
>>> already
>>> > > exists an externalised connector for aws [1].
>>> > > Are you referring to creating a similar external repo for catalogs or
>>> > will
>>> > > it be better to add it in flink-connector-aws[1] ?
>>> > >
>>> > > [1] https://github.com/apache/flink-connector-aws
>>> > >
>>> > > Samrat
>>> > >
>>> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
>>> wrote:
>>> > >
>>> > > > Hi Dong Lin,
>>> > > >
>>> > > > Aws Glue Data catalog is vendor specific and in future we will get
>>> such
>>> > > > type of implementation from different providers. We should
>>> > > > definitely externalize these catalog libraries similar to flink
>>> > > connectors.
>>> > > > I am thinking of creating
>>> > > > flink-catalog similar to flink-connector under the root (flink).
>>> glue
>>> > > > catalog can be one of modules under the flink-catalog . Please
>>> suggest
>>> > if
>>> > > > there is a better structure we can create for catalogs.
>>> > > >
>>> > > >
>>> > > > It is mentioned in the FLIP that there will be two types of
>>> > SdkHttpClient
>>> > > >> supported based on the catalog option http-client.type. Is
>>> > > >> http-client.type
>>> > > >> a public config for the GlueCatalog? If yes, can we add this
>>> config to
>>> > > the
>>> > > >> "Configurations" section and explain how users should choose the
>>> > client
>>> > > >> type?
>>> > > >
>>> > > >
>>> > > > yes http-client.type is public config for the GlueCatalog. By
>>> default
>>> > > > client-type will be `urlconnection` , if user don't specify any
>>> > > connection
>>> > > > type.
>>> > > > I have updated the FLIP-277[1] #configuration section with all the
>>> > > configs
>>> > > > . Please review it again .
>>> > > >
>>> > > > [1]
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>> > > >
>>> > > > Samrat
>>> > > >
>>> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
>>> > wrote:
>>> > > >
>>> > > >> Hi Yuxia,
>>> > > >>
>>> > > >> Thank you for reviewing the flip and putting forward your
>>> observations
>>> > > >> and comments.
>>> > > >>
>>> > > >> 1: I noticed there's a YAML part in the section of "Using the
>>> > Catalog",
>>> > > >>> what do you mean by that? Do you mean how to use glue catalog in
>>> sql
>>> > > >>> client? If so, just for your information, it's not supported to
>>> use
>>> > > yaml
>>> > > >>> envrioment file in sql client[2].
>>> > > >>
>>> > > >>
>>> > > >> Thank you for attaching the jira ticket [1] . I missed the
>>> changes.
>>> > > >> There is a provision to register catalog directly through factory
>>> > > resources
>>> > > >> .
>>> > > >> - GenericInMemoryCatalog is defined through
>>> > > >>
>>> > >
>>> >
>>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>> > > >> - HiveCatalog is defined through
>>> > > >> path
>>> > >
>>> >
>>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>> > > >> Similarly on the vendor specific module for Aws Glue we can
>>> define it.
>>> > > >>
>>> > > >> 2: Seems there's a typo in "Design#views" part, it contains
>>> > "listTables"
>>> > > >>> which I think shouldn't be contained.
>>> > > >>
>>> > > >>
>>> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
>>> > > >>
>>> > > >>
>>> > > >> Also, I'm curious about how to list views using Glue API. Is
>>> there an
>>> > > >>> on-hand api to list views directly or we need to list the tables
>>> and
>>> > > then
>>> > > >>> filter the views using the table-kind?
>>> > > >>
>>> > > >>
>>> > > >> yes there is no in-hand api for list views directly , we need to
>>> list
>>> > > all
>>> > > >> tables and then filter the views based on attribute tableKind
>>> which
>>> > is a
>>> > > >> part of table object in api response.
>>> > > >>
>>> > > >>
>>> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
>>> > String.
>>> > > >>> It seems the char's size will lose, is it possible to have a
>>> better
>>> > > mapping
>>> > > >>> which won't loss the size of char type?
>>> > > >>
>>> > > >>
>>> > > >> Thanks for pointing this out ! I have updated the flip with the
>>> > correct
>>> > > >> type. Initilially i mapped chartype , varchar type to string but
>>> > > updated it
>>> > > >> to directly map to the same type .
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
>>> part,
>>> > > >>> how do we map the function language in Flink's CatalogFunction.
>>> > > >>
>>> > > >>
>>> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
>>> attribute
>>> > > >> for function language. Here is how aws hive compatible metastore
>>> is
>>> > > mapping
>>> > > >> hive function to glue function[2]. We will append a prefix of
>>> Language
>>> > > in
>>> > > >> the function name itself indicating the language. I see this has
>>> been
>>> > > >> already done for the Hive Catalog [3]. We are thinking of
>>> implementing
>>> > > it
>>> > > >> in the same way.
>>> > > >>
>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
>>> > > >> [2]
>>> > > >>
>>> > >
>>> >
>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>>> > > >> [3]
>>> > > >>
>>> > >
>>> >
>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>> > > >>
>>> > > >> Samrat
>>> > > >>
>>> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
>>> wrote:
>>> > > >>
>>> > > >>> Hi Samrat,
>>> > > >>>
>>> > > >>> Thanks for the FLIP!
>>> > > >>>
>>> > > >>> Since this is the first proposal for adding a vendor-specific
>>> catalog
>>> > > >>> library in Flink, I think maybe we should also externalize those
>>> > > catalog
>>> > > >>> libraries similar to how we are externalizing connector
>>> libraries. It
>>> > > is
>>> > > >>> likely that we might want to add catalogs for other vectors in
>>> the
>>> > > >>> future.
>>> > > >>> Externalizing those catalogs can make Flink development more
>>> scalable
>>> > > in
>>> > > >>> the long term.
>>> > > >>>
>>> > > >>> It is mentioned in the FLIP that there will be two types of
>>> > > SdkHttpClient
>>> > > >>> supported based on the catalog option http-client.type. Is
>>> > > >>> http-client.type
>>> > > >>> a public config for the GlueCatalog? If yes, can we add this
>>> config
>>> > to
>>> > > >>> the
>>> > > >>> "Configurations" section and explain how users should choose the
>>> > client
>>> > > >>> type?
>>> > > >>>
>>> > > >>> Regards,
>>> > > >>> Dong
>>> > > >>>
>>> > > >>>
>>> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <
>>> decordeapex@gmail.com>
>>> > > >>> wrote:
>>> > > >>>
>>> > > >>> > Hi everyone,
>>> > > >>> >
>>> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
>>> > support
>>> > > >>> > in Flink.
>>> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of
>>> which
>>> > > only
>>> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore.
>>> We
>>> > > would
>>> > > >>> like
>>> > > >>> > to introduce GlueCatalog in Flink offering another option for
>>> users
>>> > > >>> which
>>> > > >>> > will be persistent in nature. Aws Glue data catalog is a
>>> > centralized
>>> > > >>> data
>>> > > >>> > catalog in AWS cloud that provides integrations with many
>>> different
>>> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided
>>> by
>>> > > glue
>>> > > >>> and
>>> > > >>> > create strong integration with other services in the cloud.
>>> > > >>> >
>>> > > >>> > [1]
>>> > > >>> >
>>> > > >>> >
>>> > > >>>
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>> > > >>> >
>>> > > >>> > [2]
>>> > > >>> >
>>> > > >>> >
>>> > > >>>
>>> > >
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>> > > >>> >
>>> > > >>> > [3]
>>> > > >>> >
>>> > > >>> >
>>> > > >>>
>>> > >
>>> >
>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>> > > >>> >
>>> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>> > > >>> >
>>> > > >>> > Bests
>>> > > >>> > Samrat
>>> > > >>> >
>>> > > >>>
>>> > > >>
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi Konstantin Knauf,

Can you explain how users are expected to authenticate with AWS Glue? I
> don't see any catalog options regardng authx. So I assume the credentials
> are taken from the environment?


We are planning to put GlueCatalog in flink-connector-aws[1].
flink-connector-aws already provides base and already built AwsConfigs[2].
These configs can be reused for the Catalog purpose also.
I will update the FLIP-277[3] with the auth related configs in the
Configuration Section.

Users can pass these values as a part of config in catalog creation and if
not provided it will try to fetch from the environment.
This will allow users to create multiple catalog instances on the same
session pointing to different accounts. ( I haven't tested multi
account glue catalog instances during POC) .

[1] https://github.com/apache/flink-connector-aws
<https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java>
[2]
https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink

Bests,
Samrat

On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <de...@gmail.com> wrote:

> Hi Jark,
> Apologies for late reply.
> Thank you for your valuable input.
>
> Besides, I have a question about Glue Namespace. Could you share the
>> documentation of the Glue
>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>> Metaspace Mapping" section,
>> if there is a database "mydb" under namespace "ns1", is that mean the
>> database name in Flink is "ns1.mydb"?
>
> There is no concept of namespace in glue data catalog.
> There are 3 levels in glue data catalog
> - catalog
> - database
> - table
>
> I have added the mapping in FLIP-277[1]. and updated it .
> it is directly database name from flink to database name in glue
> Please ignore the typo leftover in doc previously.
>
> Best,
> Samrat
>
>
> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <im...@gmail.com> wrote:
>
>> Hi Samrat,
>>
>> Thanks a lot for driving the new catalog, and sorry for jumping into the
>> discussion late.
>>
>> As Flink SQL is becoming the first-class citizen of the Flink API, we are
>> planning to push Catalog
>> to become the first-class citizen of the connector instead of Source &
>> Sink. For Flink SQL users,
>> using Catalog is as natural and user-friendly as working with databases,
>> rather than having to define
>> DDL and schemas over and over again. This is also how Trino/Presto does.
>>
>> Regarding the repo for the Glue catalog, I think we can add it to
>> flink-connector-aws. We don't need
>> separate repos for Catalogs because Catalog is a kind of connector (others
>> are sources & sinks).
>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>> flink-connector-jdbc, and HiveCatalog is
>> in flink-connector-hive. This can reduce repository maintenance, and I
>> think maybe some common
>> AWS utils can be shared there.  cc @Danny Cranmer <
>> dannycranmer@apache.org>
>> what do you think about this?
>>
>> Besides, I have a question about Glue Namespace. Could you share the
>> documentation of the Glue
>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>> Metaspace Mapping" section,
>> if there is a database "mydb" under namespace "ns1", is that mean the
>> database name in Flink is "ns1.mydb"?
>>
>> Best,
>> Jark
>>
>>
>> [1]:
>>
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>> [2]:
>>
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>>
>> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>>
>> > Hi Samrat,
>> >
>> > Sorry for the late reply. Yeah I am referring to creating a similar
>> > external repo such as flink-catalog-glue. flink-connector-aws is already
>> > named with `connector` so it seems a bit weird to put a catalog there.
>> >
>> > Thanks!
>> > Dong
>> >
>> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com>
>> wrote:
>> >
>> > > Hi Dong Lin,
>> > >
>> > > Since this is the first proposal for adding a vendor-specific catalog
>> > > > library in Flink, I think maybe we should also externalize those
>> > catalog
>> > > > libraries similar to how we are externalizing connector libraries.
>> It
>> > is
>> > > > likely that we might want to add catalogs for other vectors in the
>> > > future.
>> > > > Externalizing those catalogs can make Flink development more
>> scalable
>> > in
>> > > > the long term.
>> > >
>> > > Initially i mis-interpretted externalising the catalogs, There already
>> > > exists an externalised connector for aws [1].
>> > > Are you referring to creating a similar external repo for catalogs or
>> > will
>> > > it be better to add it in flink-connector-aws[1] ?
>> > >
>> > > [1] https://github.com/apache/flink-connector-aws
>> > >
>> > > Samrat
>> > >
>> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
>> wrote:
>> > >
>> > > > Hi Dong Lin,
>> > > >
>> > > > Aws Glue Data catalog is vendor specific and in future we will get
>> such
>> > > > type of implementation from different providers. We should
>> > > > definitely externalize these catalog libraries similar to flink
>> > > connectors.
>> > > > I am thinking of creating
>> > > > flink-catalog similar to flink-connector under the root (flink).
>> glue
>> > > > catalog can be one of modules under the flink-catalog . Please
>> suggest
>> > if
>> > > > there is a better structure we can create for catalogs.
>> > > >
>> > > >
>> > > > It is mentioned in the FLIP that there will be two types of
>> > SdkHttpClient
>> > > >> supported based on the catalog option http-client.type. Is
>> > > >> http-client.type
>> > > >> a public config for the GlueCatalog? If yes, can we add this
>> config to
>> > > the
>> > > >> "Configurations" section and explain how users should choose the
>> > client
>> > > >> type?
>> > > >
>> > > >
>> > > > yes http-client.type is public config for the GlueCatalog. By
>> default
>> > > > client-type will be `urlconnection` , if user don't specify any
>> > > connection
>> > > > type.
>> > > > I have updated the FLIP-277[1] #configuration section with all the
>> > > configs
>> > > > . Please review it again .
>> > > >
>> > > > [1]
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>> > > >
>> > > > Samrat
>> > > >
>> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
>> > wrote:
>> > > >
>> > > >> Hi Yuxia,
>> > > >>
>> > > >> Thank you for reviewing the flip and putting forward your
>> observations
>> > > >> and comments.
>> > > >>
>> > > >> 1: I noticed there's a YAML part in the section of "Using the
>> > Catalog",
>> > > >>> what do you mean by that? Do you mean how to use glue catalog in
>> sql
>> > > >>> client? If so, just for your information, it's not supported to
>> use
>> > > yaml
>> > > >>> envrioment file in sql client[2].
>> > > >>
>> > > >>
>> > > >> Thank you for attaching the jira ticket [1] . I missed the changes.
>> > > >> There is a provision to register catalog directly through factory
>> > > resources
>> > > >> .
>> > > >> - GenericInMemoryCatalog is defined through
>> > > >>
>> > >
>> >
>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> > > >> - HiveCatalog is defined through
>> > > >> path
>> > >
>> >
>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> > > >> Similarly on the vendor specific module for Aws Glue we can define
>> it.
>> > > >>
>> > > >> 2: Seems there's a typo in "Design#views" part, it contains
>> > "listTables"
>> > > >>> which I think shouldn't be contained.
>> > > >>
>> > > >>
>> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
>> > > >>
>> > > >>
>> > > >> Also, I'm curious about how to list views using Glue API. Is there
>> an
>> > > >>> on-hand api to list views directly or we need to list the tables
>> and
>> > > then
>> > > >>> filter the views using the table-kind?
>> > > >>
>> > > >>
>> > > >> yes there is no in-hand api for list views directly , we need to
>> list
>> > > all
>> > > >> tables and then filter the views based on attribute tableKind which
>> > is a
>> > > >> part of table object in api response.
>> > > >>
>> > > >>
>> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
>> > String.
>> > > >>> It seems the char's size will lose, is it possible to have a
>> better
>> > > mapping
>> > > >>> which won't loss the size of char type?
>> > > >>
>> > > >>
>> > > >> Thanks for pointing this out ! I have updated the flip with the
>> > correct
>> > > >> type. Initilially i mapped chartype , varchar type to string but
>> > > updated it
>> > > >> to directly map to the same type .
>> > > >>
>> > > >>
>> > > >>
>> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
>> part,
>> > > >>> how do we map the function language in Flink's CatalogFunction.
>> > > >>
>> > > >>
>> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
>> attribute
>> > > >> for function language. Here is how aws hive compatible metastore is
>> > > mapping
>> > > >> hive function to glue function[2]. We will append a prefix of
>> Language
>> > > in
>> > > >> the function name itself indicating the language. I see this has
>> been
>> > > >> already done for the Hive Catalog [3]. We are thinking of
>> implementing
>> > > it
>> > > >> in the same way.
>> > > >>
>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
>> > > >> [2]
>> > > >>
>> > >
>> >
>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>> > > >> [3]
>> > > >>
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>> > > >>
>> > > >> Samrat
>> > > >>
>> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
>> wrote:
>> > > >>
>> > > >>> Hi Samrat,
>> > > >>>
>> > > >>> Thanks for the FLIP!
>> > > >>>
>> > > >>> Since this is the first proposal for adding a vendor-specific
>> catalog
>> > > >>> library in Flink, I think maybe we should also externalize those
>> > > catalog
>> > > >>> libraries similar to how we are externalizing connector
>> libraries. It
>> > > is
>> > > >>> likely that we might want to add catalogs for other vectors in the
>> > > >>> future.
>> > > >>> Externalizing those catalogs can make Flink development more
>> scalable
>> > > in
>> > > >>> the long term.
>> > > >>>
>> > > >>> It is mentioned in the FLIP that there will be two types of
>> > > SdkHttpClient
>> > > >>> supported based on the catalog option http-client.type. Is
>> > > >>> http-client.type
>> > > >>> a public config for the GlueCatalog? If yes, can we add this
>> config
>> > to
>> > > >>> the
>> > > >>> "Configurations" section and explain how users should choose the
>> > client
>> > > >>> type?
>> > > >>>
>> > > >>> Regards,
>> > > >>> Dong
>> > > >>>
>> > > >>>
>> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordeapex@gmail.com
>> >
>> > > >>> wrote:
>> > > >>>
>> > > >>> > Hi everyone,
>> > > >>> >
>> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
>> > support
>> > > >>> > in Flink.
>> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of
>> which
>> > > only
>> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
>> > > would
>> > > >>> like
>> > > >>> > to introduce GlueCatalog in Flink offering another option for
>> users
>> > > >>> which
>> > > >>> > will be persistent in nature. Aws Glue data catalog is a
>> > centralized
>> > > >>> data
>> > > >>> > catalog in AWS cloud that provides integrations with many
>> different
>> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided
>> by
>> > > glue
>> > > >>> and
>> > > >>> > create strong integration with other services in the cloud.
>> > > >>> >
>> > > >>> > [1]
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>> > > >>> >
>> > > >>> > [2]
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>> > > >>> >
>> > > >>> > [3]
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > >
>> >
>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>> > > >>> >
>> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>> > > >>> >
>> > > >>> > Bests
>> > > >>> > Samrat
>> > > >>> >
>> > > >>>
>> > > >>
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi Jark,
Apologies for late reply.
Thank you for your valuable input.

Besides, I have a question about Glue Namespace. Could you share the
> documentation of the Glue
>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
> Metaspace Mapping" section,
> if there is a database "mydb" under namespace "ns1", is that mean the
> database name in Flink is "ns1.mydb"?

There is no concept of namespace in glue data catalog.
There are 3 levels in glue data catalog
- catalog
- database
- table

I have added the mapping in FLIP-277[1]. and updated it .
it is directly database name from flink to database name in glue
Please ignore the typo leftover in doc previously.

Best,
Samrat


On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <im...@gmail.com> wrote:

> Hi Samrat,
>
> Thanks a lot for driving the new catalog, and sorry for jumping into the
> discussion late.
>
> As Flink SQL is becoming the first-class citizen of the Flink API, we are
> planning to push Catalog
> to become the first-class citizen of the connector instead of Source &
> Sink. For Flink SQL users,
> using Catalog is as natural and user-friendly as working with databases,
> rather than having to define
> DDL and schemas over and over again. This is also how Trino/Presto does.
>
> Regarding the repo for the Glue catalog, I think we can add it to
> flink-connector-aws. We don't need
> separate repos for Catalogs because Catalog is a kind of connector (others
> are sources & sinks).
> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
> flink-connector-jdbc, and HiveCatalog is
> in flink-connector-hive. This can reduce repository maintenance, and I
> think maybe some common
> AWS utils can be shared there.  cc @Danny Cranmer <dannycranmer@apache.org
> >
> what do you think about this?
>
> Besides, I have a question about Glue Namespace. Could you share the
> documentation of the Glue
>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
> Metaspace Mapping" section,
> if there is a database "mydb" under namespace "ns1", is that mean the
> database name in Flink is "ns1.mydb"?
>
> Best,
> Jark
>
>
> [1]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
> [2]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>
> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>
> > Hi Samrat,
> >
> > Sorry for the late reply. Yeah I am referring to creating a similar
> > external repo such as flink-catalog-glue. flink-connector-aws is already
> > named with `connector` so it seems a bit weird to put a catalog there.
> >
> > Thanks!
> > Dong
> >
> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com> wrote:
> >
> > > Hi Dong Lin,
> > >
> > > Since this is the first proposal for adding a vendor-specific catalog
> > > > library in Flink, I think maybe we should also externalize those
> > catalog
> > > > libraries similar to how we are externalizing connector libraries. It
> > is
> > > > likely that we might want to add catalogs for other vectors in the
> > > future.
> > > > Externalizing those catalogs can make Flink development more scalable
> > in
> > > > the long term.
> > >
> > > Initially i mis-interpretted externalising the catalogs, There already
> > > exists an externalised connector for aws [1].
> > > Are you referring to creating a similar external repo for catalogs or
> > will
> > > it be better to add it in flink-connector-aws[1] ?
> > >
> > > [1] https://github.com/apache/flink-connector-aws
> > >
> > > Samrat
> > >
> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
> wrote:
> > >
> > > > Hi Dong Lin,
> > > >
> > > > Aws Glue Data catalog is vendor specific and in future we will get
> such
> > > > type of implementation from different providers. We should
> > > > definitely externalize these catalog libraries similar to flink
> > > connectors.
> > > > I am thinking of creating
> > > > flink-catalog similar to flink-connector under the root (flink). glue
> > > > catalog can be one of modules under the flink-catalog . Please
> suggest
> > if
> > > > there is a better structure we can create for catalogs.
> > > >
> > > >
> > > > It is mentioned in the FLIP that there will be two types of
> > SdkHttpClient
> > > >> supported based on the catalog option http-client.type. Is
> > > >> http-client.type
> > > >> a public config for the GlueCatalog? If yes, can we add this config
> to
> > > the
> > > >> "Configurations" section and explain how users should choose the
> > client
> > > >> type?
> > > >
> > > >
> > > > yes http-client.type is public config for the GlueCatalog. By default
> > > > client-type will be `urlconnection` , if user don't specify any
> > > connection
> > > > type.
> > > > I have updated the FLIP-277[1] #configuration section with all the
> > > configs
> > > > . Please review it again .
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >
> > > > Samrat
> > > >
> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Yuxia,
> > > >>
> > > >> Thank you for reviewing the flip and putting forward your
> observations
> > > >> and comments.
> > > >>
> > > >> 1: I noticed there's a YAML part in the section of "Using the
> > Catalog",
> > > >>> what do you mean by that? Do you mean how to use glue catalog in
> sql
> > > >>> client? If so, just for your information, it's not supported to use
> > > yaml
> > > >>> envrioment file in sql client[2].
> > > >>
> > > >>
> > > >> Thank you for attaching the jira ticket [1] . I missed the changes.
> > > >> There is a provision to register catalog directly through factory
> > > resources
> > > >> .
> > > >> - GenericInMemoryCatalog is defined through
> > > >>
> > >
> >
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> - HiveCatalog is defined through
> > > >> path
> > >
> >
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> Similarly on the vendor specific module for Aws Glue we can define
> it.
> > > >>
> > > >> 2: Seems there's a typo in "Design#views" part, it contains
> > "listTables"
> > > >>> which I think shouldn't be contained.
> > > >>
> > > >>
> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
> > > >>
> > > >>
> > > >> Also, I'm curious about how to list views using Glue API. Is there
> an
> > > >>> on-hand api to list views directly or we need to list the tables
> and
> > > then
> > > >>> filter the views using the table-kind?
> > > >>
> > > >>
> > > >> yes there is no in-hand api for list views directly , we need to
> list
> > > all
> > > >> tables and then filter the views based on attribute tableKind which
> > is a
> > > >> part of table object in api response.
> > > >>
> > > >>
> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
> > String.
> > > >>> It seems the char's size will lose, is it possible to have a better
> > > mapping
> > > >>> which won't loss the size of char type?
> > > >>
> > > >>
> > > >> Thanks for pointing this out ! I have updated the flip with the
> > correct
> > > >> type. Initilially i mapped chartype , varchar type to string but
> > > updated it
> > > >> to directly map to the same type .
> > > >>
> > > >>
> > > >>
> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
> part,
> > > >>> how do we map the function language in Flink's CatalogFunction.
> > > >>
> > > >>
> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
> attribute
> > > >> for function language. Here is how aws hive compatible metastore is
> > > mapping
> > > >> hive function to glue function[2]. We will append a prefix of
> Language
> > > in
> > > >> the function name itself indicating the language. I see this has
> been
> > > >> already done for the Hive Catalog [3]. We are thinking of
> implementing
> > > it
> > > >> in the same way.
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> > > >> [2]
> > > >>
> > >
> >
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> > > >> [3]
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> > > >>
> > > >> Samrat
> > > >>
> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Samrat,
> > > >>>
> > > >>> Thanks for the FLIP!
> > > >>>
> > > >>> Since this is the first proposal for adding a vendor-specific
> catalog
> > > >>> library in Flink, I think maybe we should also externalize those
> > > catalog
> > > >>> libraries similar to how we are externalizing connector libraries.
> It
> > > is
> > > >>> likely that we might want to add catalogs for other vectors in the
> > > >>> future.
> > > >>> Externalizing those catalogs can make Flink development more
> scalable
> > > in
> > > >>> the long term.
> > > >>>
> > > >>> It is mentioned in the FLIP that there will be two types of
> > > SdkHttpClient
> > > >>> supported based on the catalog option http-client.type. Is
> > > >>> http-client.type
> > > >>> a public config for the GlueCatalog? If yes, can we add this config
> > to
> > > >>> the
> > > >>> "Configurations" section and explain how users should choose the
> > client
> > > >>> type?
> > > >>>
> > > >>> Regards,
> > > >>> Dong
> > > >>>
> > > >>>
> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi everyone,
> > > >>> >
> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
> > support
> > > >>> > in Flink.
> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> > > only
> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> > > would
> > > >>> like
> > > >>> > to introduce GlueCatalog in Flink offering another option for
> users
> > > >>> which
> > > >>> > will be persistent in nature. Aws Glue data catalog is a
> > centralized
> > > >>> data
> > > >>> > catalog in AWS cloud that provides integrations with many
> different
> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> > > glue
> > > >>> and
> > > >>> > create strong integration with other services in the cloud.
> > > >>> >
> > > >>> > [1]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >>> >
> > > >>> > [2]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> > > >>> >
> > > >>> > [3]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> > > >>> >
> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> > > >>> >
> > > >>> > Bests
> > > >>> > Samrat
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Konstantin Knauf <kn...@apache.org>.
Hi Samrat,

+1 to the effort and +1 to adding it to flink-connector-aws.

Can you explain how users are expected to authenticate with AWS Glue? I
don't see any catalog options regardng authx. So I assume the credentials
are taken from the environment?

Best,

Konstantin



Am Fr., 9. Dez. 2022 um 16:08 Uhr schrieb Jark Wu <im...@gmail.com>:

> Hi Samrat,
>
> Thanks a lot for driving the new catalog, and sorry for jumping into the
> discussion late.
>
> As Flink SQL is becoming the first-class citizen of the Flink API, we are
> planning to push Catalog
> to become the first-class citizen of the connector instead of Source &
> Sink. For Flink SQL users,
> using Catalog is as natural and user-friendly as working with databases,
> rather than having to define
> DDL and schemas over and over again. This is also how Trino/Presto does.
>
> Regarding the repo for the Glue catalog, I think we can add it to
> flink-connector-aws. We don't need
> separate repos for Catalogs because Catalog is a kind of connector (others
> are sources & sinks).
> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
> flink-connector-jdbc, and HiveCatalog is
> in flink-connector-hive. This can reduce repository maintenance, and I
> think maybe some common
> AWS utils can be shared there.  cc @Danny Cranmer <dannycranmer@apache.org
> >
> what do you think about this?
>
> Besides, I have a question about Glue Namespace. Could you share the
> documentation of the Glue
>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
> Metaspace Mapping" section,
> if there is a database "mydb" under namespace "ns1", is that mean the
> database name in Flink is "ns1.mydb"?
>
> Best,
> Jark
>
>
> [1]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
> [2]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>
> On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:
>
> > Hi Samrat,
> >
> > Sorry for the late reply. Yeah I am referring to creating a similar
> > external repo such as flink-catalog-glue. flink-connector-aws is already
> > named with `connector` so it seems a bit weird to put a catalog there.
> >
> > Thanks!
> > Dong
> >
> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com> wrote:
> >
> > > Hi Dong Lin,
> > >
> > > Since this is the first proposal for adding a vendor-specific catalog
> > > > library in Flink, I think maybe we should also externalize those
> > catalog
> > > > libraries similar to how we are externalizing connector libraries. It
> > is
> > > > likely that we might want to add catalogs for other vectors in the
> > > future.
> > > > Externalizing those catalogs can make Flink development more scalable
> > in
> > > > the long term.
> > >
> > > Initially i mis-interpretted externalising the catalogs, There already
> > > exists an externalised connector for aws [1].
> > > Are you referring to creating a similar external repo for catalogs or
> > will
> > > it be better to add it in flink-connector-aws[1] ?
> > >
> > > [1] https://github.com/apache/flink-connector-aws
> > >
> > > Samrat
> > >
> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com>
> wrote:
> > >
> > > > Hi Dong Lin,
> > > >
> > > > Aws Glue Data catalog is vendor specific and in future we will get
> such
> > > > type of implementation from different providers. We should
> > > > definitely externalize these catalog libraries similar to flink
> > > connectors.
> > > > I am thinking of creating
> > > > flink-catalog similar to flink-connector under the root (flink). glue
> > > > catalog can be one of modules under the flink-catalog . Please
> suggest
> > if
> > > > there is a better structure we can create for catalogs.
> > > >
> > > >
> > > > It is mentioned in the FLIP that there will be two types of
> > SdkHttpClient
> > > >> supported based on the catalog option http-client.type. Is
> > > >> http-client.type
> > > >> a public config for the GlueCatalog? If yes, can we add this config
> to
> > > the
> > > >> "Configurations" section and explain how users should choose the
> > client
> > > >> type?
> > > >
> > > >
> > > > yes http-client.type is public config for the GlueCatalog. By default
> > > > client-type will be `urlconnection` , if user don't specify any
> > > connection
> > > > type.
> > > > I have updated the FLIP-277[1] #configuration section with all the
> > > configs
> > > > . Please review it again .
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >
> > > > Samrat
> > > >
> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Yuxia,
> > > >>
> > > >> Thank you for reviewing the flip and putting forward your
> observations
> > > >> and comments.
> > > >>
> > > >> 1: I noticed there's a YAML part in the section of "Using the
> > Catalog",
> > > >>> what do you mean by that? Do you mean how to use glue catalog in
> sql
> > > >>> client? If so, just for your information, it's not supported to use
> > > yaml
> > > >>> envrioment file in sql client[2].
> > > >>
> > > >>
> > > >> Thank you for attaching the jira ticket [1] . I missed the changes.
> > > >> There is a provision to register catalog directly through factory
> > > resources
> > > >> .
> > > >> - GenericInMemoryCatalog is defined through
> > > >>
> > >
> >
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> - HiveCatalog is defined through
> > > >> path
> > >
> >
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> Similarly on the vendor specific module for Aws Glue we can define
> it.
> > > >>
> > > >> 2: Seems there's a typo in "Design#views" part, it contains
> > "listTables"
> > > >>> which I think shouldn't be contained.
> > > >>
> > > >>
> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
> > > >>
> > > >>
> > > >> Also, I'm curious about how to list views using Glue API. Is there
> an
> > > >>> on-hand api to list views directly or we need to list the tables
> and
> > > then
> > > >>> filter the views using the table-kind?
> > > >>
> > > >>
> > > >> yes there is no in-hand api for list views directly , we need to
> list
> > > all
> > > >> tables and then filter the views based on attribute tableKind which
> > is a
> > > >> part of table object in api response.
> > > >>
> > > >>
> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
> > String.
> > > >>> It seems the char's size will lose, is it possible to have a better
> > > mapping
> > > >>> which won't loss the size of char type?
> > > >>
> > > >>
> > > >> Thanks for pointing this out ! I have updated the flip with the
> > correct
> > > >> type. Initilially i mapped chartype , varchar type to string but
> > > updated it
> > > >> to directly map to the same type .
> > > >>
> > > >>
> > > >>
> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
> part,
> > > >>> how do we map the function language in Flink's CatalogFunction.
> > > >>
> > > >>
> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
> attribute
> > > >> for function language. Here is how aws hive compatible metastore is
> > > mapping
> > > >> hive function to glue function[2]. We will append a prefix of
> Language
> > > in
> > > >> the function name itself indicating the language. I see this has
> been
> > > >> already done for the Hive Catalog [3]. We are thinking of
> implementing
> > > it
> > > >> in the same way.
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> > > >> [2]
> > > >>
> > >
> >
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> > > >> [3]
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> > > >>
> > > >> Samrat
> > > >>
> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Samrat,
> > > >>>
> > > >>> Thanks for the FLIP!
> > > >>>
> > > >>> Since this is the first proposal for adding a vendor-specific
> catalog
> > > >>> library in Flink, I think maybe we should also externalize those
> > > catalog
> > > >>> libraries similar to how we are externalizing connector libraries.
> It
> > > is
> > > >>> likely that we might want to add catalogs for other vectors in the
> > > >>> future.
> > > >>> Externalizing those catalogs can make Flink development more
> scalable
> > > in
> > > >>> the long term.
> > > >>>
> > > >>> It is mentioned in the FLIP that there will be two types of
> > > SdkHttpClient
> > > >>> supported based on the catalog option http-client.type. Is
> > > >>> http-client.type
> > > >>> a public config for the GlueCatalog? If yes, can we add this config
> > to
> > > >>> the
> > > >>> "Configurations" section and explain how users should choose the
> > client
> > > >>> type?
> > > >>>
> > > >>> Regards,
> > > >>> Dong
> > > >>>
> > > >>>
> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi everyone,
> > > >>> >
> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
> > support
> > > >>> > in Flink.
> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> > > only
> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> > > would
> > > >>> like
> > > >>> > to introduce GlueCatalog in Flink offering another option for
> users
> > > >>> which
> > > >>> > will be persistent in nature. Aws Glue data catalog is a
> > centralized
> > > >>> data
> > > >>> > catalog in AWS cloud that provides integrations with many
> different
> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> > > glue
> > > >>> and
> > > >>> > create strong integration with other services in the cloud.
> > > >>> >
> > > >>> > [1]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >>> >
> > > >>> > [2]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> > > >>> >
> > > >>> > [3]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> > > >>> >
> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> > > >>> >
> > > >>> > Bests
> > > >>> > Samrat
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Jark Wu <im...@gmail.com>.
Hi Samrat,

Thanks a lot for driving the new catalog, and sorry for jumping into the
discussion late.

As Flink SQL is becoming the first-class citizen of the Flink API, we are
planning to push Catalog
to become the first-class citizen of the connector instead of Source &
Sink. For Flink SQL users,
using Catalog is as natural and user-friendly as working with databases,
rather than having to define
DDL and schemas over and over again. This is also how Trino/Presto does.

Regarding the repo for the Glue catalog, I think we can add it to
flink-connector-aws. We don't need
separate repos for Catalogs because Catalog is a kind of connector (others
are sources & sinks).
For example, MySqlCatalog[1] and PostgresCatalog[2] are in
flink-connector-jdbc, and HiveCatalog is
in flink-connector-hive. This can reduce repository maintenance, and I
think maybe some common
AWS utils can be shared there.  cc @Danny Cranmer <da...@apache.org>
what do you think about this?

Besides, I have a question about Glue Namespace. Could you share the
documentation of the Glue
 Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
Metaspace Mapping" section,
if there is a database "mydb" under namespace "ns1", is that mean the
database name in Flink is "ns1.mydb"?

Best,
Jark


[1]:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
[2]:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java

On Fri, 9 Dec 2022 at 08:51, Dong Lin <li...@gmail.com> wrote:

> Hi Samrat,
>
> Sorry for the late reply. Yeah I am referring to creating a similar
> external repo such as flink-catalog-glue. flink-connector-aws is already
> named with `connector` so it seems a bit weird to put a catalog there.
>
> Thanks!
> Dong
>
> On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com> wrote:
>
> > Hi Dong Lin,
> >
> > Since this is the first proposal for adding a vendor-specific catalog
> > > library in Flink, I think maybe we should also externalize those
> catalog
> > > libraries similar to how we are externalizing connector libraries. It
> is
> > > likely that we might want to add catalogs for other vectors in the
> > future.
> > > Externalizing those catalogs can make Flink development more scalable
> in
> > > the long term.
> >
> > Initially i mis-interpretted externalising the catalogs, There already
> > exists an externalised connector for aws [1].
> > Are you referring to creating a similar external repo for catalogs or
> will
> > it be better to add it in flink-connector-aws[1] ?
> >
> > [1] https://github.com/apache/flink-connector-aws
> >
> > Samrat
> >
> > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com> wrote:
> >
> > > Hi Dong Lin,
> > >
> > > Aws Glue Data catalog is vendor specific and in future we will get such
> > > type of implementation from different providers. We should
> > > definitely externalize these catalog libraries similar to flink
> > connectors.
> > > I am thinking of creating
> > > flink-catalog similar to flink-connector under the root (flink). glue
> > > catalog can be one of modules under the flink-catalog . Please suggest
> if
> > > there is a better structure we can create for catalogs.
> > >
> > >
> > > It is mentioned in the FLIP that there will be two types of
> SdkHttpClient
> > >> supported based on the catalog option http-client.type. Is
> > >> http-client.type
> > >> a public config for the GlueCatalog? If yes, can we add this config to
> > the
> > >> "Configurations" section and explain how users should choose the
> client
> > >> type?
> > >
> > >
> > > yes http-client.type is public config for the GlueCatalog. By default
> > > client-type will be `urlconnection` , if user don't specify any
> > connection
> > > type.
> > > I have updated the FLIP-277[1] #configuration section with all the
> > configs
> > > . Please review it again .
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > >
> > > Samrat
> > >
> > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com>
> wrote:
> > >
> > >> Hi Yuxia,
> > >>
> > >> Thank you for reviewing the flip and putting forward your observations
> > >> and comments.
> > >>
> > >> 1: I noticed there's a YAML part in the section of "Using the
> Catalog",
> > >>> what do you mean by that? Do you mean how to use glue catalog in sql
> > >>> client? If so, just for your information, it's not supported to use
> > yaml
> > >>> envrioment file in sql client[2].
> > >>
> > >>
> > >> Thank you for attaching the jira ticket [1] . I missed the changes.
> > >> There is a provision to register catalog directly through factory
> > resources
> > >> .
> > >> - GenericInMemoryCatalog is defined through
> > >>
> >
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > >> - HiveCatalog is defined through
> > >> path
> >
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > >> Similarly on the vendor specific module for Aws Glue we can define it.
> > >>
> > >> 2: Seems there's a typo in "Design#views" part, it contains
> "listTables"
> > >>> which I think shouldn't be contained.
> > >>
> > >>
> > >> oh yes 😅 ! fixed it now thanks for pointing it out.
> > >>
> > >>
> > >> Also, I'm curious about how to list views using Glue API. Is there an
> > >>> on-hand api to list views directly or we need to list the tables and
> > then
> > >>> filter the views using the table-kind?
> > >>
> > >>
> > >> yes there is no in-hand api for list views directly , we need to list
> > all
> > >> tables and then filter the views based on attribute tableKind which
> is a
> > >> part of table object in api response.
> > >>
> > >>
> > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
> String.
> > >>> It seems the char's size will lose, is it possible to have a better
> > mapping
> > >>> which won't loss the size of char type?
> > >>
> > >>
> > >> Thanks for pointing this out ! I have updated the flip with the
> correct
> > >> type. Initilially i mapped chartype , varchar type to string but
> > updated it
> > >> to directly map to the same type .
> > >>
> > >>
> > >>
> > >>> 4: About the "Flink CatalogFunction mapping with Glue Function" part,
> > >>> how do we map the function language in Flink's CatalogFunction.
> > >>
> > >>
> > >> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute
> > >> for function language. Here is how aws hive compatible metastore is
> > mapping
> > >> hive function to glue function[2]. We will append a prefix of Language
> > in
> > >> the function name itself indicating the language. I see this has been
> > >> already done for the Hive Catalog [3]. We are thinking of implementing
> > it
> > >> in the same way.
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> > >> [2]
> > >>
> >
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> > >> [3]
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> > >>
> > >> Samrat
> > >>
> > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com> wrote:
> > >>
> > >>> Hi Samrat,
> > >>>
> > >>> Thanks for the FLIP!
> > >>>
> > >>> Since this is the first proposal for adding a vendor-specific catalog
> > >>> library in Flink, I think maybe we should also externalize those
> > catalog
> > >>> libraries similar to how we are externalizing connector libraries. It
> > is
> > >>> likely that we might want to add catalogs for other vectors in the
> > >>> future.
> > >>> Externalizing those catalogs can make Flink development more scalable
> > in
> > >>> the long term.
> > >>>
> > >>> It is mentioned in the FLIP that there will be two types of
> > SdkHttpClient
> > >>> supported based on the catalog option http-client.type. Is
> > >>> http-client.type
> > >>> a public config for the GlueCatalog? If yes, can we add this config
> to
> > >>> the
> > >>> "Configurations" section and explain how users should choose the
> client
> > >>> type?
> > >>>
> > >>> Regards,
> > >>> Dong
> > >>>
> > >>>
> > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > Hi everyone,
> > >>> >
> > >>> > I would like to open a discussion[1] on providing GlueCatalog
> support
> > >>> > in Flink.
> > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> > only
> > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> > would
> > >>> like
> > >>> > to introduce GlueCatalog in Flink offering another option for users
> > >>> which
> > >>> > will be persistent in nature. Aws Glue data catalog is a
> centralized
> > >>> data
> > >>> > catalog in AWS cloud that provides integrations with many different
> > >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> > glue
> > >>> and
> > >>> > create strong integration with other services in the cloud.
> > >>> >
> > >>> > [1]
> > >>> >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > >>> >
> > >>> > [2]
> > >>> >
> > >>> >
> > >>>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> > >>> >
> > >>> > [3]
> > >>> >
> > >>> >
> > >>>
> >
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> > >>> >
> > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> > >>> >
> > >>> > Bests
> > >>> > Samrat
> > >>> >
> > >>>
> > >>
> >
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Dong Lin <li...@gmail.com>.
Hi Samrat,

Sorry for the late reply. Yeah I am referring to creating a similar
external repo such as flink-catalog-glue. flink-connector-aws is already
named with `connector` so it seems a bit weird to put a catalog there.

Thanks!
Dong

On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <de...@gmail.com> wrote:

> Hi Dong Lin,
>
> Since this is the first proposal for adding a vendor-specific catalog
> > library in Flink, I think maybe we should also externalize those catalog
> > libraries similar to how we are externalizing connector libraries. It is
> > likely that we might want to add catalogs for other vectors in the
> future.
> > Externalizing those catalogs can make Flink development more scalable in
> > the long term.
>
> Initially i mis-interpretted externalising the catalogs, There already
> exists an externalised connector for aws [1].
> Are you referring to creating a similar external repo for catalogs or will
> it be better to add it in flink-connector-aws[1] ?
>
> [1] https://github.com/apache/flink-connector-aws
>
> Samrat
>
> On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com> wrote:
>
> > Hi Dong Lin,
> >
> > Aws Glue Data catalog is vendor specific and in future we will get such
> > type of implementation from different providers. We should
> > definitely externalize these catalog libraries similar to flink
> connectors.
> > I am thinking of creating
> > flink-catalog similar to flink-connector under the root (flink). glue
> > catalog can be one of modules under the flink-catalog . Please suggest if
> > there is a better structure we can create for catalogs.
> >
> >
> > It is mentioned in the FLIP that there will be two types of SdkHttpClient
> >> supported based on the catalog option http-client.type. Is
> >> http-client.type
> >> a public config for the GlueCatalog? If yes, can we add this config to
> the
> >> "Configurations" section and explain how users should choose the client
> >> type?
> >
> >
> > yes http-client.type is public config for the GlueCatalog. By default
> > client-type will be `urlconnection` , if user don't specify any
> connection
> > type.
> > I have updated the FLIP-277[1] #configuration section with all the
> configs
> > . Please review it again .
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> >
> > Samrat
> >
> > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com> wrote:
> >
> >> Hi Yuxia,
> >>
> >> Thank you for reviewing the flip and putting forward your observations
> >> and comments.
> >>
> >> 1: I noticed there's a YAML part in the section of "Using the Catalog",
> >>> what do you mean by that? Do you mean how to use glue catalog in sql
> >>> client? If so, just for your information, it's not supported to use
> yaml
> >>> envrioment file in sql client[2].
> >>
> >>
> >> Thank you for attaching the jira ticket [1] . I missed the changes.
> >> There is a provision to register catalog directly through factory
> resources
> >> .
> >> - GenericInMemoryCatalog is defined through
> >>
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> >> - HiveCatalog is defined through
> >> path
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> >> Similarly on the vendor specific module for Aws Glue we can define it.
> >>
> >> 2: Seems there's a typo in "Design#views" part, it contains "listTables"
> >>> which I think shouldn't be contained.
> >>
> >>
> >> oh yes 😅 ! fixed it now thanks for pointing it out.
> >>
> >>
> >> Also, I'm curious about how to list views using Glue API. Is there an
> >>> on-hand api to list views directly or we need to list the tables and
> then
> >>> filter the views using the table-kind?
> >>
> >>
> >> yes there is no in-hand api for list views directly , we need to list
> all
> >> tables and then filter the views based on attribute tableKind which is a
> >> part of table object in api response.
> >>
> >>
> >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String.
> >>> It seems the char's size will lose, is it possible to have a better
> mapping
> >>> which won't loss the size of char type?
> >>
> >>
> >> Thanks for pointing this out ! I have updated the flip with the correct
> >> type. Initilially i mapped chartype , varchar type to string but
> updated it
> >> to directly map to the same type .
> >>
> >>
> >>
> >>> 4: About the "Flink CatalogFunction mapping with Glue Function" part,
> >>> how do we map the function language in Flink's CatalogFunction.
> >>
> >>
> >> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute
> >> for function language. Here is how aws hive compatible metastore is
> mapping
> >> hive function to glue function[2]. We will append a prefix of Language
> in
> >> the function name itself indicating the language. I see this has been
> >> already done for the Hive Catalog [3]. We are thinking of implementing
> it
> >> in the same way.
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> >> [2]
> >>
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> >> [3]
> >>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> >>
> >> Samrat
> >>
> >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com> wrote:
> >>
> >>> Hi Samrat,
> >>>
> >>> Thanks for the FLIP!
> >>>
> >>> Since this is the first proposal for adding a vendor-specific catalog
> >>> library in Flink, I think maybe we should also externalize those
> catalog
> >>> libraries similar to how we are externalizing connector libraries. It
> is
> >>> likely that we might want to add catalogs for other vectors in the
> >>> future.
> >>> Externalizing those catalogs can make Flink development more scalable
> in
> >>> the long term.
> >>>
> >>> It is mentioned in the FLIP that there will be two types of
> SdkHttpClient
> >>> supported based on the catalog option http-client.type. Is
> >>> http-client.type
> >>> a public config for the GlueCatalog? If yes, can we add this config to
> >>> the
> >>> "Configurations" section and explain how users should choose the client
> >>> type?
> >>>
> >>> Regards,
> >>> Dong
> >>>
> >>>
> >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi everyone,
> >>> >
> >>> > I would like to open a discussion[1] on providing GlueCatalog support
> >>> > in Flink.
> >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> only
> >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> would
> >>> like
> >>> > to introduce GlueCatalog in Flink offering another option for users
> >>> which
> >>> > will be persistent in nature. Aws Glue data catalog is a centralized
> >>> data
> >>> > catalog in AWS cloud that provides integrations with many different
> >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> glue
> >>> and
> >>> > create strong integration with other services in the cloud.
> >>> >
> >>> > [1]
> >>> >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> >>> >
> >>> > [2]
> >>> >
> >>> >
> >>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> >>> >
> >>> > [3]
> >>> >
> >>> >
> >>>
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> >>> >
> >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> >>> >
> >>> > Bests
> >>> > Samrat
> >>> >
> >>>
> >>
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi Dong Lin,

Since this is the first proposal for adding a vendor-specific catalog
> library in Flink, I think maybe we should also externalize those catalog
> libraries similar to how we are externalizing connector libraries. It is
> likely that we might want to add catalogs for other vectors in the future.
> Externalizing those catalogs can make Flink development more scalable in
> the long term.

Initially i mis-interpretted externalising the catalogs, There already
exists an externalised connector for aws [1].
Are you referring to creating a similar external repo for catalogs or will
it be better to add it in flink-connector-aws[1] ?

[1] https://github.com/apache/flink-connector-aws

Samrat

On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <de...@gmail.com> wrote:

> Hi Dong Lin,
>
> Aws Glue Data catalog is vendor specific and in future we will get such
> type of implementation from different providers. We should
> definitely externalize these catalog libraries similar to flink connectors.
> I am thinking of creating
> flink-catalog similar to flink-connector under the root (flink). glue
> catalog can be one of modules under the flink-catalog . Please suggest if
> there is a better structure we can create for catalogs.
>
>
> It is mentioned in the FLIP that there will be two types of SdkHttpClient
>> supported based on the catalog option http-client.type. Is
>> http-client.type
>> a public config for the GlueCatalog? If yes, can we add this config to the
>> "Configurations" section and explain how users should choose the client
>> type?
>
>
> yes http-client.type is public config for the GlueCatalog. By default
> client-type will be `urlconnection` , if user don't specify any connection
> type.
> I have updated the FLIP-277[1] #configuration section with all the configs
> . Please review it again .
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>
> Samrat
>
> On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com> wrote:
>
>> Hi Yuxia,
>>
>> Thank you for reviewing the flip and putting forward your observations
>> and comments.
>>
>> 1: I noticed there's a YAML part in the section of "Using the Catalog",
>>> what do you mean by that? Do you mean how to use glue catalog in sql
>>> client? If so, just for your information, it's not supported to use yaml
>>> envrioment file in sql client[2].
>>
>>
>> Thank you for attaching the jira ticket [1] . I missed the changes.
>> There is a provision to register catalog directly through factory resources
>> .
>> - GenericInMemoryCatalog is defined through
>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> - HiveCatalog is defined through
>> path  `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> Similarly on the vendor specific module for Aws Glue we can define it.
>>
>> 2: Seems there's a typo in "Design#views" part, it contains "listTables"
>>> which I think shouldn't be contained.
>>
>>
>> oh yes 😅 ! fixed it now thanks for pointing it out.
>>
>>
>> Also, I'm curious about how to list views using Glue API. Is there an
>>> on-hand api to list views directly or we need to list the tables and then
>>> filter the views using the table-kind?
>>
>>
>> yes there is no in-hand api for list views directly , we need to list all
>> tables and then filter the views based on attribute tableKind which is a
>> part of table object in api response.
>>
>>
>> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String.
>>> It seems the char's size will lose, is it possible to have a better mapping
>>> which won't loss the size of char type?
>>
>>
>> Thanks for pointing this out ! I have updated the flip with the correct
>> type. Initilially i mapped chartype , varchar type to string but updated it
>> to directly map to the same type .
>>
>>
>>
>>> 4: About the "Flink CatalogFunction mapping with Glue Function" part,
>>> how do we map the function language in Flink's CatalogFunction.
>>
>>
>> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute
>> for function language. Here is how aws hive compatible metastore is mapping
>> hive function to glue function[2]. We will append a prefix of Language in
>> the function name itself indicating the language. I see this has been
>> already done for the Hive Catalog [3]. We are thinking of implementing it
>> in the same way.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-22540
>> [2]
>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>> [3]
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>
>> Samrat
>>
>> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com> wrote:
>>
>>> Hi Samrat,
>>>
>>> Thanks for the FLIP!
>>>
>>> Since this is the first proposal for adding a vendor-specific catalog
>>> library in Flink, I think maybe we should also externalize those catalog
>>> libraries similar to how we are externalizing connector libraries. It is
>>> likely that we might want to add catalogs for other vectors in the
>>> future.
>>> Externalizing those catalogs can make Flink development more scalable in
>>> the long term.
>>>
>>> It is mentioned in the FLIP that there will be two types of SdkHttpClient
>>> supported based on the catalog option http-client.type. Is
>>> http-client.type
>>> a public config for the GlueCatalog? If yes, can we add this config to
>>> the
>>> "Configurations" section and explain how users should choose the client
>>> type?
>>>
>>> Regards,
>>> Dong
>>>
>>>
>>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com>
>>> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I would like to open a discussion[1] on providing GlueCatalog support
>>> > in Flink.
>>> > Currently, Flink offers 3 major types of catalog[2]. Out of which only
>>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We would
>>> like
>>> > to introduce GlueCatalog in Flink offering another option for users
>>> which
>>> > will be persistent in nature. Aws Glue data catalog is a centralized
>>> data
>>> > catalog in AWS cloud that provides integrations with many different
>>> > connectors[3]. Flink GlueCatalog can use the features provided by glue
>>> and
>>> > create strong integration with other services in the cloud.
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>> >
>>> > [2]
>>> >
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>> >
>>> > [3]
>>> >
>>> >
>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>> >
>>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>> >
>>> > Bests
>>> > Samrat
>>> >
>>>
>>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi Dong Lin,

Aws Glue Data catalog is vendor specific and in future we will get such
type of implementation from different providers. We should
definitely externalize these catalog libraries similar to flink connectors.
I am thinking of creating
flink-catalog similar to flink-connector under the root (flink). glue
catalog can be one of modules under the flink-catalog . Please suggest if
there is a better structure we can create for catalogs.


It is mentioned in the FLIP that there will be two types of SdkHttpClient
> supported based on the catalog option http-client.type. Is http-client.type
> a public config for the GlueCatalog? If yes, can we add this config to the
> "Configurations" section and explain how users should choose the client
> type?


yes http-client.type is public config for the GlueCatalog. By default
client-type will be `urlconnection` , if user don't specify any connection
type.
I have updated the FLIP-277[1] #configuration section with all the configs
. Please review it again .

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink

Samrat

On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <de...@gmail.com> wrote:

> Hi Yuxia,
>
> Thank you for reviewing the flip and putting forward your observations and
> comments.
>
> 1: I noticed there's a YAML part in the section of "Using the Catalog",
>> what do you mean by that? Do you mean how to use glue catalog in sql
>> client? If so, just for your information, it's not supported to use yaml
>> envrioment file in sql client[2].
>
>
> Thank you for attaching the jira ticket [1] . I missed the changes.
> There is a provision to register catalog directly through factory resources
> .
> - GenericInMemoryCatalog is defined through
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> - HiveCatalog is defined through
> path  `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> Similarly on the vendor specific module for Aws Glue we can define it.
>
> 2: Seems there's a typo in "Design#views" part, it contains "listTables"
>> which I think shouldn't be contained.
>
>
> oh yes 😅 ! fixed it now thanks for pointing it out.
>
>
> Also, I'm curious about how to list views using Glue API. Is there an
>> on-hand api to list views directly or we need to list the tables and then
>> filter the views using the table-kind?
>
>
> yes there is no in-hand api for list views directly , we need to list all
> tables and then filter the views based on attribute tableKind which is a
> part of table object in api response.
>
>
> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String. It
>> seems the char's size will lose, is it possible to have a better mapping
>> which won't loss the size of char type?
>
>
> Thanks for pointing this out ! I have updated the flip with the correct
> type. Initilially i mapped chartype , varchar type to string but updated it
> to directly map to the same type .
>
>
>
>> 4: About the "Flink CatalogFunction mapping with Glue Function" part, how
>> do we map the function language in Flink's CatalogFunction.
>
>
> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute for
> function language. Here is how aws hive compatible metastore is mapping
> hive function to glue function[2]. We will append a prefix of Language in
> the function name itself indicating the language. I see this has been
> already done for the Hive Catalog [3]. We are thinking of implementing it
> in the same way.
>
> [1] https://issues.apache.org/jira/browse/FLINK-22540
> [2]
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> [3]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>
> Samrat
>
> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com> wrote:
>
>> Hi Samrat,
>>
>> Thanks for the FLIP!
>>
>> Since this is the first proposal for adding a vendor-specific catalog
>> library in Flink, I think maybe we should also externalize those catalog
>> libraries similar to how we are externalizing connector libraries. It is
>> likely that we might want to add catalogs for other vectors in the future.
>> Externalizing those catalogs can make Flink development more scalable in
>> the long term.
>>
>> It is mentioned in the FLIP that there will be two types of SdkHttpClient
>> supported based on the catalog option http-client.type. Is
>> http-client.type
>> a public config for the GlueCatalog? If yes, can we add this config to the
>> "Configurations" section and explain how users should choose the client
>> type?
>>
>> Regards,
>> Dong
>>
>>
>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to open a discussion[1] on providing GlueCatalog support
>> > in Flink.
>> > Currently, Flink offers 3 major types of catalog[2]. Out of which only
>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We would
>> like
>> > to introduce GlueCatalog in Flink offering another option for users
>> which
>> > will be persistent in nature. Aws Glue data catalog is a centralized
>> data
>> > catalog in AWS cloud that provides integrations with many different
>> > connectors[3]. Flink GlueCatalog can use the features provided by glue
>> and
>> > create strong integration with other services in the cloud.
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>> >
>> > [2]
>> >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>> >
>> > [3]
>> >
>> >
>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>> >
>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>> >
>> > Bests
>> > Samrat
>> >
>>
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Samrat Deb <de...@gmail.com>.
Hi Yuxia,

Thank you for reviewing the flip and putting forward your observations and
comments.

1: I noticed there's a YAML part in the section of "Using the Catalog",
> what do you mean by that? Do you mean how to use glue catalog in sql
> client? If so, just for your information, it's not supported to use yaml
> envrioment file in sql client[2].


Thank you for attaching the jira ticket [1] . I missed the changes.
There is a provision to register catalog directly through factory resources
.
- GenericInMemoryCatalog is defined through
`flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
- HiveCatalog is defined through
path  `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
Similarly on the vendor specific module for Aws Glue we can define it.

2: Seems there's a typo in "Design#views" part, it contains "listTables"
> which I think shouldn't be contained.


oh yes 😅 ! fixed it now thanks for pointing it out.


Also, I'm curious about how to list views using Glue API. Is there an
> on-hand api to list views directly or we need to list the tables and then
> filter the views using the table-kind?


yes there is no in-hand api for list views directly , we need to list all
tables and then filter the views based on attribute tableKind which is a
part of table object in api response.


3: In "Flink Glue DataType Mapping" part, CharType is mapped to String. It
> seems the char's size will lose, is it possible to have a better mapping
> which won't loss the size of char type?


Thanks for pointing this out ! I have updated the flip with the correct
type. Initilially i mapped chartype , varchar type to string but updated it
to directly map to the same type .



> 4: About the "Flink CatalogFunction mapping with Glue Function" part, how
> do we map the function language in Flink's CatalogFunction.


Glue Api (UserDefinedFunctionInput) doesn't support specific attribute for
function language. Here is how aws hive compatible metastore is mapping
hive function to glue function[2]. We will append a prefix of Language in
the function name itself indicating the language. I see this has been
already done for the Hive Catalog [3]. We are thinking of implementing it
in the same way.

[1] https://issues.apache.org/jira/browse/FLINK-22540
[2]
https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
[3]
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415

Samrat

On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <li...@gmail.com> wrote:

> Hi Samrat,
>
> Thanks for the FLIP!
>
> Since this is the first proposal for adding a vendor-specific catalog
> library in Flink, I think maybe we should also externalize those catalog
> libraries similar to how we are externalizing connector libraries. It is
> likely that we might want to add catalogs for other vectors in the future.
> Externalizing those catalogs can make Flink development more scalable in
> the long term.
>
> It is mentioned in the FLIP that there will be two types of SdkHttpClient
> supported based on the catalog option http-client.type. Is http-client.type
> a public config for the GlueCatalog? If yes, can we add this config to the
> "Configurations" section and explain how users should choose the client
> type?
>
> Regards,
> Dong
>
>
> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I would like to open a discussion[1] on providing GlueCatalog support
> > in Flink.
> > Currently, Flink offers 3 major types of catalog[2]. Out of which only
> > HiveCatalog is a persistent catalog backed by Hive Metastore. We would
> like
> > to introduce GlueCatalog in Flink offering another option for users which
> > will be persistent in nature. Aws Glue data catalog is a centralized data
> > catalog in AWS cloud that provides integrations with many different
> > connectors[3]. Flink GlueCatalog can use the features provided by glue
> and
> > create strong integration with other services in the cloud.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> >
> > [3]
> >
> >
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> >
> > Bests
> > Samrat
> >
>

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Posted by Dong Lin <li...@gmail.com>.
Hi Samrat,

Thanks for the FLIP!

Since this is the first proposal for adding a vendor-specific catalog
library in Flink, I think maybe we should also externalize those catalog
libraries similar to how we are externalizing connector libraries. It is
likely that we might want to add catalogs for other vectors in the future.
Externalizing those catalogs can make Flink development more scalable in
the long term.

It is mentioned in the FLIP that there will be two types of SdkHttpClient
supported based on the catalog option http-client.type. Is http-client.type
a public config for the GlueCatalog? If yes, can we add this config to the
"Configurations" section and explain how users should choose the client
type?

Regards,
Dong


On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <de...@gmail.com> wrote:

> Hi everyone,
>
> I would like to open a discussion[1] on providing GlueCatalog support
> in Flink.
> Currently, Flink offers 3 major types of catalog[2]. Out of which only
> HiveCatalog is a persistent catalog backed by Hive Metastore. We would like
> to introduce GlueCatalog in Flink offering another option for users which
> will be persistent in nature. Aws Glue data catalog is a centralized data
> catalog in AWS cloud that provides integrations with many different
> connectors[3]. Flink GlueCatalog can use the features provided by glue and
> create strong integration with other services in the cloud.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>
> [3]
>
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>
> [4] https://issues.apache.org/jira/browse/FLINK-29549
>
> Bests
> Samrat
>