You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/07 05:33:12 UTC
[GitHub] [hudi] wosow edited a comment on issue #2409: [SUPPORT] Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables
wosow edited a comment on issue #2409:
URL: https://github.com/apache/hudi/issues/2409#issuecomment-755894869
> It is indeed a MOR table.Can you check your driver logs. You might find some exceptions around registering _rt table. You can look for logs around the log message
>
> "Trying to sync hoodie table "
error as follows:
there is no sql about creating _rt table , only _ro table
----------------------------------------------------------------------------------------------------------------------------------------------
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 371
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 337
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 404
21/01/07 13:23:05 INFO BlockManagerInfo: Removed broadcast_16_piece0 on bigdatadev03:18850 in memory (size: 73.0 KB, free: 2.5 GB)
21/01/07 13:23:05 INFO BlockManagerInfo: Removed broadcast_16_piece0 on bigdatadev02:6815 in memory (size: 73.0 KB, free: 3.5 GB)
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 333
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 418
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 385
21/01/07 13:23:05 INFO ContextCleaner: Cleaned accumulator 410
21/01/07 13:23:08 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
21/01/07 13:23:08 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
21/01/07 13:23:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
21/01/07 13:23:10 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
21/01/07 13:23:10 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
21/01/07 13:23:10 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
21/01/07 13:23:10 INFO ObjectStore: Initialized ObjectStore
21/01/07 13:23:11 INFO HiveMetaStore: Added admin role in metastore
21/01/07 13:23:11 INFO HiveMetaStore: Added public role in metastore
21/01/07 13:23:11 INFO HiveMetaStore: No user is added in admin role, since config is empty
21/01/07 13:23:11 INFO HiveMetaStore: 0: get_all_databases
21/01/07 13:23:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
21/01/07 13:23:11 INFO HiveMetaStore: 0: get_functions: db=default pat=*
21/01/07 13:23:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
21/01/07 13:23:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
21/01/07 13:23:11 INFO HiveMetaStore: 0: get_functions: db=dw pat=*
21/01/07 13:23:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=dw pat=*
21/01/07 13:23:11 INFO HiveSyncTool: Trying to sync hoodie table api_trade_ro with base path /data/stream/mor/api_trade of type MERGE_ON_READ
21/01/07 13:23:11 INFO HiveMetaStore: 0: get_table : db=ads tbl=api_trade_ro
21/01/07 13:23:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=ads tbl=api_trade_ro
21/01/07 13:23:11 INFO HoodieHiveClient: Found the last compaction commit as Option{val=null}
21/01/07 13:23:11 INFO HoodieHiveClient: Found the last delta commit Option{val=[20210107132154__deltacommit__COMPLETED]}
21/01/07 13:23:12 INFO HoodieHiveClient: Reading schema from /data/stream/mor/api_trade/dt=2021-01/350a9a01-538c-4a7e-8c17-09d2cdc85073-0_0-20-85_20210107132154.parquet
21/01/07 13:23:12 INFO HiveSyncTool: Hive table api_trade_ro is not found. Creating it
21/01/07 13:23:12 INFO HoodieHiveClient: Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `ads`.`api_trade_ro`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `topic` string, `kafka_partition` string, `kafka_timestamp` string, `kafka_offset` string, `current_time` string, `kafka_key` string, `kafka_value` string, `modified` string, `created` string, `batch_time` string) PARTITIONED BY (`dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/data/stream/mor/api_trade'
21/01/07 13:23:12 INFO HoodieHiveClient: Executing SQL CREATE EXTERNAL TABLE IF NOT EXISTS `ads`.`api_trade_ro`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `topic` string, `kafka_partition` string, `kafka_timestamp` string, `kafka_offset` string, `current_time` string, `kafka_key` string, `kafka_value` string, `modified` string, `created` string, `batch_time` string) PARTITIONED BY (`dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/data/stream/mor/api_trade'
21/01/07 13:24:09 INFO HiveSyncTool: Schema sync complete. Syncing partitions for api_trade_ro
21/01/07 13:24:09 INFO HiveSyncTool: Last commit time synced was found to be null
21/01/07 13:24:09 INFO HoodieHiveClient: Last commit time synced is not known, listing all partitions in /data/stream/mor/api_trade,FS :DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-398057260_1, ugi=root (auth:SIMPLE)]]
21/01/07 13:24:09 INFO HiveSyncTool: Storage partitions scan complete. Found 1
21/01/07 13:24:09 INFO HiveMetaStore: 0: get_partitions : db=ads tbl=api_trade_ro
21/01/07 13:24:09 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_partitions : db=ads tbl=api_trade_ro
21/01/07 13:24:10 INFO HiveSyncTool: New Partitions [dt=2021-01]
21/01/07 13:24:10 INFO HoodieHiveClient: Adding partitions 1 to table api_trade_ro
21/01/07 13:24:10 INFO HoodieHiveClient: Executing SQL ALTER TABLE `ads`.`api_trade_ro` ADD IF NOT EXISTS PARTITION (`dt`='2021-01') LOCATION '/data/stream/mor/api_trade/dt=2021-01'
21/01/07 13:24:33 INFO HiveSyncTool: Changed Partitions []
21/01/07 13:24:33 INFO HoodieHiveClient: No partitions to change for api_trade_ro
21/01/07 13:24:33 INFO HiveMetaStore: 0: get_table : db=ads tbl=api_trade_ro
21/01/07 13:24:33 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=ads tbl=api_trade_ro
21/01/07 13:24:33 ERROR HiveSyncTool: Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20210107132154
at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:658)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:128)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:229)
at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:279)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:184)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at com.chin.dmp.stream.mor.ApiTradeStream$$anonfun$1.apply(ApiTradeStream.scala:196)
at com.chin.dmp.stream.mor.ApiTradeStream$$anonfun$1.apply(ApiTradeStream.scala:163)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
Caused by: NoSuchObjectException(message:ads.api_trade_ro table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1808)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1778)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy41.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1208)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy42.getTable(Unknown Source)
at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:654)
... 48 more
21/01/07 13:24:33 INFO HiveMetaStore: 0: Shutting down the object store...
21/01/07 13:24:33 INFO audit: ugi=root ip=unknown-ip-addr cmd=Shutting down the object store...
21/01/07 13:24:33 INFO HiveMetaStore: 0: Metastore shutdown complete.
21/01/07 13:24:33 INFO audit: ugi=root ip=unknown-ip-addr cmd=Metastore shutdown complete.
21/01/07 13:24:33 INFO DefaultSource: Constructing hoodie (as parquet) data source with options :Map(hoodie.datasource.write.insert.drop.duplicates -> false, hoodie.datasource.hive_sync.database -> ads, hoodie.insert.shuffle.parallelism -> 10, path -> /data/stream/mor/api_trade, hoodie.datasource.write.precombine.field -> modified, hoodie.datasource.hive_sync.partition_fields -> dt, hoodie.datasource.write.payload.class -> org.apache.hudi.common.model.OverwriteWithLatestAvroPayload, hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.MultiPartKeysValueExtractor, hoodie.datasource.write.streaming.retry.interval.ms -> 2000, hoodie.datasource.hive_sync.table -> api_trade, hoodie.index.type -> GLOBAL_BLOOM, hoodie.datasource.write.streaming.ignore.failed.batch -> true, hoodie.datasource.write.operation -> upsert, hoodie.datasource.hive_sync.enable -> true, hoodie.datasource.write.recordkey.field -> id, hoodie.table.name -> api_trade, hoodie.datasource.hive_sy
nc.jdbcurl -> jdbc:hive2://0.0.0.0:10000, hoodie.datasource.write.table.type -> MERGE_ON_READ, hoodie.datasource.write.hive_style_partitioning -> true, hoodie.datasource.query.type -> snapshot, hoodie.bloom.index.update.partition.path -
----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org