You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/04/10 17:56:00 UTC

[jira] [Commented] (HUDI-2524) Certify Hive sync on cloud platforms

    [ https://issues.apache.org/jira/browse/HUDI-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520210#comment-17520210 ] 

Raymond Xu commented on HUDI-2524:
----------------------------------

Previous failure was due to not selecting "Use for Spark table metadata" in EMR setup.

Tested master commit 15c264535ffbe73eac4dee6f15e0d2e65a8d4c78 on EMR 6.5 (Hadoop 3.2.1, Hive 3.1.2, Spark 3.1.2) and hive sync worked, with and without MDT.


> Certify Hive sync on cloud platforms
> ------------------------------------
>
>                 Key: HUDI-2524
>                 URL: https://issues.apache.org/jira/browse/HUDI-2524
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Assignee: Raymond Xu
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> For instance, hive sync should work seamlessly not just with Apache Hive but also EMR Hive.
> EMR 6.x has Hive 3.1.2, and the later versions of EMR 5.x has Hive 2.3.x. While the HiveSyncTool is known to work with Hive 2.3.x. 
> The scope of the ticket is to verify that the hive sync through Hudi works with EMR Hive 3.1.x as well.
> We can refer to [https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html] for hive sync properties.
> The purpose of this verification is that hudi-hive-sync has Hive 2.3.1 as compile-time dependency, so we need to check if the hive APIs used by the sync tool are compatible with Hive 3.1.x. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)