You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Yuanbin Cheng (JIRA)" <ji...@apache.org> on 2019/08/02 00:51:00 UTC

[jira] [Commented] (IMPALA-8778) Support read/write Apache Hudi tables

    [ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898457#comment-16898457 ] 

Yuanbin Cheng commented on IMPALA-8778:
---------------------------------------

[~vinoth]

I have read the code in the Apache Impala that related to the HdfsTable. For now, because Hudi partitioning is compatible with Hive partitioning.

So currently, my thought is changing the partition loading part of the coed in Apache Impala. It is the loadFileMetadataForPartitions method in the HdfsTable class.

This method group the path of partitions and for every path create a `FileMetadataLoader` and then parallel call the load method.

Here is the load method in the FileMetadataLoader

[https://github.com/apache/impala/blob/9ee4a5e1940afa47227a92e0f6fba6d4c9909f63/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L129]

Since the Impala didn't use the InputFormat classes as Hive, I think I need to modify this load partition method to teach the Impala how to load the Hudi table.

Do you have any idea about how to load the latest version of the Hudi dataset without using the InputFormat as Hive, or any related code about the Hive metadata in Hudi may help a lot? 

Another thing is that I have created a draft change in Impala's Gerrit.

[https://gerrit.cloudera.org/#/c/13948/]

Current I just add the `HoodieInputFormat` as a VALID_INPUT_FORMAT which will make the Impala read the Hudi as the regular Parquet table.

I am struggling to add some tests in the Impala to verify that this change can actually make the Impala successfully read the Hudi data, it seems that I need to add Hudi dependencies in the test set and set some data for testing. 

> Support read/write Apache Hudi tables
> -------------------------------------
>
>                 Key: IMPALA-8778
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8778
>             Project: IMPALA
>          Issue Type: New Feature
>            Reporter: Yuanbin Cheng
>            Assignee: Yuanbin Cheng
>            Priority: Major
>
> Apache Impala currently not support Apache Hudi, cannot even pull metadata from Hive.
> Related issue: 
> [https://github.com/apache/incubator-hudi/issues/179] 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org