You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/06 06:43:43 UTC

[GitHub] [hudi] xqy179 opened a new issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found

xqy179 opened a new issue #2234:
URL: https://github.com/apache/hudi/issues/2234


   **Describe the problem you faced**
   
   I write a hudi table with spark datasource api!  The table contains the flowing three fields "year,month,day", I use "year,month,day" as  partition keys, and partition value extractor uses "org.apache.hudi.hive.MultiPartKeysValueExtractor", config HIVE_SYNC_ENABLED_OPT_KEY to be "true"  !  After I finish writing data to storage[hdfs/s3],  Hudi will sync the table partitions to hive table automatically! The problem is happened  in the HiveSync processing!   By the way, my hudi version is 0.6.1!
   
   
   
   **Additional context**
   After I trace the code! I think this is a bug in the follow **'syncPartitions'** code. eg, the hive/hudi table have contain partition 'year=2020/month=01/day=05',  when you write a new partition 'year=2020/month=05/day=01' to the table,  it will throw error!  Because  in the **getPartitionEvents** method logic,  'year=2020/month=01/day=05' will transfer to be "01, 05, 2020" and 'year=2020/month=05/day=01'  will transfer to be "01, 05, 2020" too, So the new partition 'year=2020/month=05/day=01' is treated as a update event actually it is a new partition,  and the flowing processing will  update the table partitions using **"ALTER TABLE `orders` PARTITION (`year`='2020',`month`='11',`day`='01') SET LOCATION ..."**!  The update a not existed partition will throw error!
   
   
   I think I can comment the flowing two line codes and not affect other features to fix the error, Can I ?
   ```
     List<PartitionEvent>  getPartitionEvents()
   ...
   //Collections.sort(hivePartitionValues);
   ...
   //Collections.sort(storagePartitionValues);
   ...
   ```
   
   ```
   /**
      * Iterate over the storage partitions and find if there are any new partitions that need to be added or updated.
      * Generate a list of PartitionEvent based on the changes required.
      */
     List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions, List<String> partitionStoragePartitions) {
       Map<String, String> paths = new HashMap<>();
       for (Partition tablePartition : tablePartitions) {
         List<String> hivePartitionValues = tablePartition.getValues();
         Collections.sort(hivePartitionValues);  //**Maybe there is a bug Here!**
         String fullTablePartitionPath =
             Path.getPathWithoutSchemeAndAuthority(new Path(tablePartition.getSd().getLocation())).toUri().getPath();
         paths.put(String.join(", ", hivePartitionValues), fullTablePartitionPath);
       }
   
       List<PartitionEvent> events = new ArrayList<>();
       for (String storagePartition : partitionStoragePartitions) {
         Path storagePartitionPath = FSUtils.getPartitionPath(syncConfig.basePath, storagePartition);
         String fullStoragePartitionPath = Path.getPathWithoutSchemeAndAuthority(storagePartitionPath).toUri().getPath();
         // Check if the partition values or if hdfs path is the same
         List<String> storagePartitionValues = partitionValueExtractor.extractPartitionValuesInPath(storagePartition);
         Collections.sort(storagePartitionValues);//**Maybe there is a bug Here!**
   
         if (!storagePartitionValues.isEmpty()) {
           String storageValue = String.join(", ", storagePartitionValues);
           if (!paths.containsKey(storageValue)) {
             events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
           } else if (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
             events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
           }
         }
       }
       return events;
     }
   ```
   
   
   
   **Stacktrace**
   
   ```20/11/05 17:56:17 ERROR HiveSyncTool: Got runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table  xtable
   	at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
   	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:228)
   	at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:278)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:183)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at com.pupu.bigdata.wrangling.utils.HudiUtils$.write_hudi_table(HudiUtils.scala:89)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.writeDataToHudiTbl(StageToOdsHudi.scala:356)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.execute_cur_hour_etl(StageToOdsHudi.scala:264)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:96)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:84)
   	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.main(StageToOdsHudi.scala:84)
   	at com.pupu.bigdata.wrangling.ods.StageToOdsHudi.main(StageToOdsHudi.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE ` xtable` PARTITION (`year`='2020',`month`='11',`day`='01') SET LOCATION 's3://*/year=2020/month=11/day=01'
   	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)
   	at org.apache.hudi.hive.HoodieHiveClient.updatePartitionsToTable(HoodieHiveClient.java:160)
   	at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:185)
   	... 41 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {year=2020, month=11, day=01}
   	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
   	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
   	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
   	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:486)
   	... 43 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {year=2020, month=11, day=01}
   	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
   	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
   	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
   	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
   	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
   	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
   	at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
   	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
   	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
   	at com.sun.proxy.$Proxy37.executeStatementAsync(Unknown Source)
   	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
   	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
   	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
   	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
   	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
   	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found {year=2020, month=11, day=01}
   	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getPartition(BaseSemanticAnalyzer.java:1736)
   	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1515)
   	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1479)
   	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableLocation(DDLSemanticAnalyzer.java:1567)
   	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:303)
   	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
   	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
   	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
   	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
   	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
   	... 26 more```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #2234:
URL: https://github.com/apache/hudi/issues/2234


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2234:
URL: https://github.com/apache/hudi/issues/2234#issuecomment-724304103


   @xqy179 : This looks like a bug. Good catch !! I have opened a jira : https://issues.apache.org/jira/browse/HUDI-1383 . Can you open the PR with that Jira ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #2234: HiveSyncTable error: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2234:
URL: https://github.com/apache/hudi/issues/2234#issuecomment-744702342


   Closing this as we have a jira and PR for this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org