You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/06 06:43:43 UTC
[GitHub] [hudi] xqy179 opened a new issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found
xqy179 opened a new issue #2234:
URL: https://github.com/apache/hudi/issues/2234
**Describe the problem you faced**
I write a hudi table with spark datasource api! The table contains the flowing three fields "year,month,day", I use "year,month,day" as partition keys, and partition value extractor uses "org.apache.hudi.hive.MultiPartKeysValueExtractor", config HIVE_SYNC_ENABLED_OPT_KEY to be "true" ! After I finish writing data to storage[hdfs/s3], Hudi will sync the table partitions to hive table automatically! The problem is happened in the HiveSync processing! By the way, my hudi version is 0.6.1!
**Additional context**
After I trace the code! I think this is a bug in the follow **'syncPartitions'** code. eg, the hive/hudi table have contain partition 'year=2020/month=01/day=05', when you write a new partition 'year=2020/month=05/day=01' to the table, it will throw error! Because in the **getPartitionEvents** method logic, 'year=2020/month=01/day=05' will transfer to be "01, 05, 2020" and 'year=2020/month=05/day=01' will transfer to be "01, 05, 2020" too, So the new partition 'year=2020/month=05/day=01' is treated as a update event actually it is a new partition, and the flowing processing will update the table partitions using **"ALTER TABLE `orders` PARTITION (`year`='2020',`month`='11',`day`='01') SET LOCATION ..."**! The update a not existed partition will throw error!
I think I can comment the flowing two line codes and not affect other features to fix the error, Can I ?
```
List<PartitionEvent> getPartitionEvents()
...
//Collections.sort(hivePartitionValues);
...
//Collections.sort(storagePartitionValues);
...
```
```
/**
* Iterate over the storage partitions and find if there are any new partitions that need to be added or updated.
* Generate a list of PartitionEvent based on the changes required.
*/
List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions, List<String> partitionStoragePartitions) {
Map<String, String> paths = new HashMap<>();
for (Partition tablePartition : tablePartitions) {
List<String> hivePartitionValues = tablePartition.getValues();
Collections.sort(hivePartitionValues); //**Maybe there is a bug Here!**
String fullTablePartitionPath =
Path.getPathWithoutSchemeAndAuthority(new Path(tablePartition.getSd().getLocation())).toUri().getPath();
paths.put(String.join(", ", hivePartitionValues), fullTablePartitionPath);
}
List<PartitionEvent> events = new ArrayList<>();
for (String storagePartition : partitionStoragePartitions) {
Path storagePartitionPath = FSUtils.getPartitionPath(syncConfig.basePath, storagePartition);
String fullStoragePartitionPath = Path.getPathWithoutSchemeAndAuthority(storagePartitionPath).toUri().getPath();
// Check if the partition values or if hdfs path is the same
List<String> storagePartitionValues = partitionValueExtractor.extractPartitionValuesInPath(storagePartition);
Collections.sort(storagePartitionValues);//**Maybe there is a bug Here!**
if (!storagePartitionValues.isEmpty()) {
String storageValue = String.join(", ", storagePartitionValues);
if (!paths.containsKey(storageValue)) {
events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
} else if (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
}
}
}
return events;
}
```
**Stacktrace**
```20/11/05 17:56:17 ERROR HiveSyncTool: Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table xtable
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:228)
at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:278)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:183)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at com.pupu.bigdata.wrangling.utils.HudiUtils$.write_hudi_table(HudiUtils.scala:89)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.writeDataToHudiTbl(StageToOdsHudi.scala:356)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.execute_cur_hour_etl(StageToOdsHudi.scala:264)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:96)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:84)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.main(StageToOdsHudi.scala:84)
at com.pupu.bigdata.wrangling.ods.StageToOdsHudi.main(StageToOdsHudi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE ` xtable` PARTITION (`year`='2020',`month`='11',`day`='01') SET LOCATION 's3://*/year=2020/month=11/day=01'
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)
at org.apache.hudi.hive.HoodieHiveClient.updatePartitionsToTable(HoodieHiveClient.java:160)
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:185)
... 41 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {year=2020, month=11, day=01}
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:486)
... 43 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {year=2020, month=11, day=01}
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy37.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found {year=2020, month=11, day=01}
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getPartition(BaseSemanticAnalyzer.java:1736)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1515)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1479)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableLocation(DDLSemanticAnalyzer.java:1567)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:303)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
... 26 more```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found
Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #2234:
URL: https://github.com/apache/hudi/issues/2234
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2234:
URL: https://github.com/apache/hudi/issues/2234#issuecomment-724304103
@xqy179 : This looks like a bug. Good catch !! I have opened a jira : https://issues.apache.org/jira/browse/HUDI-1383 . Can you open the PR with that Jira ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2234:
URL: https://github.com/apache/hudi/issues/2234#issuecomment-744702342
Closing this as we have a jira and PR for this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org