You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2019/06/13 22:12:00 UTC

[jira] [Commented] (FLINK-12771) Support ConnectorCatalogTable in HiveCatalog

    [ https://issues.apache.org/jira/browse/FLINK-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863503#comment-16863503 ] 

Xuefu Zhang commented on FLINK-12771:
-------------------------------------

> [~bdaw] commented on PR #8720:
> I don't think this feature makes any sense. This makes the HiveCatalog also an instance of InMemoryCatalog.
>
> I think if a table cannot be converted to properties (as is the case for ConnectorCatalogTable an exception should be thrown. We 
> should allow storing those kind of tables only in an in memory catalog.

Hi [~bdaw], thanks for sharing your concern. Other than the fact that there is some overlap with an in-memory catalog, could you elaborate why the feature doesn't make sense?

In my humble opinion, this feature makes many senses:

1. Catalog API defined an API that make an implementation expect to handle this type of table. The implementation can choose to how to handle it. Throwing an exception in one implementation is one of the options, but it's also reasonable that another one to handle differently.

2. {{ConnectorCatalogTable}} is temporary in nature, and Hive catalog supports temporary objects (tables, functions). Supporting {{ConnectorCatalogTable}} in {{HiveCatalog}} is a natural fit.

3. Not being able to serialize {{ConnectorCatalogTable}} shouldn't be a reason why it should be rejected by {{HiveCatalog}}. Serialization is possible if we try hard enough. However, the nature of the table requires no serialization.

4. Supporting {{ConnectorCatalogTable}} brings better usability as user doesn't have to register two catalogs and juggling between them. The JIRA description has particularly stressed on this.

5. A catalog is not limited in supporting just one type of tables. {{HiveCatalog}} already supports both generic tables and native Hive tables. With this feature, it also supports inline tables.

The only downside as I see is the introduced complexity. Again this is just implementation details and a user shouldn't care.

> Support ConnectorCatalogTable in HiveCatalog
> --------------------------------------------
>
>                 Key: FLINK-12771
>                 URL: https://issues.apache.org/jira/browse/FLINK-12771
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Hive
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently {{HiveCatalog}} does not support {{ConnectorCatalogTable}}. There's a major drawback on this when it comes to real use cases, that is when Table API users set a {{HiveCatalog}} as their default catalog (which is very likely), they cannot create or use any inline table sources/sinks with their default catalog any more. It's really inconvenient for Table API users to use Flink for exploration, experiment, and production.
> There are several workaround in this case. E.g. users have to switch their default catalog, but that misses our original intention of having a default {{HiveCatalog}}; or users can register their inline source/sinks to Flink's default catalog which is a in memory catalog, but that not only require users to type full path of a table but also requires users to be aware of the Flink's default catalog, default db, and their names. In short, none of the workaround seems to be reasonable and user friendly.
> From another perspective, Hive has the concept of temporary tables that are stored in memory of Hive metastore client and are removed when client is shut down. In Flink, {{ConnectorCatalogTable}} can be seen as a type of session-based temporary table, and {{HiveCatalog}} (potentially any catalog implementations) can store it in memory. By introducing the concept of temp table, we could greatly eliminate frictions for users and raise their experience and productivity.
> Thus, we propose adding a simple in memory map for {{ConnectorCatalogTable}} in {{HiveCatalog}} to allow users create and use inline source/sink when their default catalog is a {{HiveCatalog}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)