You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/05 09:02:00 UTC

[jira] [Commented] (HUDI-1194) Reorganize HoodieHiveClient and make it fully support Hive Metastore API

    [ https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393730#comment-17393730 ] 

ASF GitHub Bot commented on HUDI-1194:
--------------------------------------

zhedoubushishi edited a comment on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-893290649


   > Sorry for the delay in responding. Similar PR was in progress and has been merged #2879.
   > We can close this PR.
   
   Yes since it's duplicated work, closed this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1194
>                 URL: https://issues.apache.org/jira/browse/HUDI-1194
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive functionalities. One is through Hive JDBC, one is through Hive Metastore API. One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to control whether use Hive JDBC or not. However, this parameter does not accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the methods in HoodieHiveClient will use JDBC, and few methods in HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore API.
> Here is a table shows that what will actually be used when setting use_jdbc to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. resolving null partition hive sync issue and supporting ALTER_TABLE cascade with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite several the APIs through Hive Driver.}}
> {{And we introduce a new parameter }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)