You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/03/28 18:07:00 UTC

[jira] [Commented] (HUDI-2524) Certify Hive sync on cloud platforms

    [ https://issues.apache.org/jira/browse/HUDI-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513545#comment-17513545 ] 

Raymond Xu commented on HUDI-2524:
----------------------------------

{code:java}
// 

Exception in thread "main" org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:42)
    at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncMeta(DeltaSync.java:704)
    at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:623)
    at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:327)
    at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:193)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:191)
    at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:530)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:863)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:938)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:947)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing stocks20220328t175931
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:40)
    ... 19 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table stocks20220328t175931
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:412)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:230)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:150)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:138)
    ... 20 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to get all partitions for table rxusandbox.stocks20220328t175931
    at org.apache.hudi.hive.HoodieHiveClient.getAllPartitions(HoodieHiveClient.java:160)
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:388)
    ... 23 more
Caused by: NoSuchObjectException(message:rxusandbox.stocks20220328t175931 table not found)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.read(ThriftHiveMetastore.java:64527)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.read(ThriftHiveMetastore.java:64494)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.read(ThriftHiveMetastore.java:64428)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions(ThriftHiveMetastore.java:1998)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions(ThriftHiveMetastore.java:1983)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:1058)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:184)
    at com.sun.proxy.$Proxy89.listPartitions(Unknown Source)
    at org.apache.hudi.hive.HoodieHiveClient.getAllPartitions(HoodieHiveClient.java:155)
    ... 24 more{code}

> Certify Hive sync on cloud platforms
> ------------------------------------
>
>                 Key: HUDI-2524
>                 URL: https://issues.apache.org/jira/browse/HUDI-2524
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Assignee: Raymond Xu
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> For instance, hive sync should work seamlessly not just with Apache Hive but also EMR Hive.
> EMR 6.x has Hive 3.1.2, and the later versions of EMR 5.x has Hive 2.3.x. While the HiveSyncTool is known to work with Hive 2.3.x. 
> The scope of the ticket is to verify that the hive sync through Hudi works with EMR Hive 3.1.x as well.
> We can refer to [https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html] for hive sync properties.
> The purpose of this verification is that hudi-hive-sync has Hive 2.3.1 as compile-time dependency, so we need to check if the hive APIs used by the sync tool are compatible with Hive 3.1.x. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)