You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/25 10:03:30 UTC

[GitHub] [hudi] KnightChess opened a new issue, #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

KnightChess opened a new issue, #7057:
URL: https://github.com/apache/hudi/issues/7057

   env: 
   hudi 0.11.0
   spark 3.2.0
   action:
   spark sql insert overwrite
   
   Suppose we have a timeline, and have multi writer job with occ
   00:01  001.replacecommit.inflight
   00:01  001.replacecommit.request
   00:10  002.replacecommit.inflight
   00:10  002.replacecommit.request
   00:10  003.replacecommit.inflight
   00:10  003.replacecommit.request
   
   Task execution：
   `001 instance` which is product by a history failed task, now, we have two running job, which product `002` and `003 instance`
   in the same time:
   1.`002's job` has already finished `CleanerUtils.rollbackFailedWrites(xxx)` in postCommit use autoCommitClean. And the default heartbeat expire time is  2min, the `001 instance` file will be delete in this job.
   
   2.at the same time, `003's job` will execute `BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded`, and it will throw `HoodieException: Error getting all file groups in pending clustering`. Because it will read instance detail from `001 instance` file in cache active timeline, but it has be deleted by `002's job`.
   
   something like we need reload active timeline when `setPartitionToReplaceFileIds`, but it still can't resolve this question, because can not guarantee it is the latest. 
   `updateIndexAndCommitIfNeeded` I think need to be moved to commit meta after `beginTransaction`. I don’t know if it is appropriate to modify in this way, and it may be stretch the holding time of holding locks.
   
   ```shell
   org.apache.hudi.exception.HoodieException: Error getting all file groups in pending clustering
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:113)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:102)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:93)
   	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.<init>(HoodieMetadataFileSystemView.java:44)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:166)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:259)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:111)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:110)
   	at org.apache.hudi.table.HoodieTable.getHoodieView(HoodieTable.java:310)
   	at org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:547)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:246)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:653)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:316)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:165)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
   	at org.apache.spark.sql.hudi.catalog.HoodieV1WriteBuilder$$anon$1$$anon$2.insert(HoodieInternalV2Table.scala:116)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1(V1FallbackWriters.scala:79)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1$(V1FallbackWriters.scala:78)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.writeWithV1(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run(V1FallbackWriters.scala:66)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run$(V1FallbackWriters.scala:65)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.run(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:113)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:431)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:569)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:563)
   	at scala.collection.Iterator.foreach(Iterator.scala:943)
   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:563)
   	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
   	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
   	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:229)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:748)
   Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from hdfs://xxx/.hoodie/20221024110608265.replacecommit.requested
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:761)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:266)
   	at org.apache.hudi.common.util.ClusteringUtils.getRequestedReplaceMetadata(ClusteringUtils.java:90)
   	at org.apache.hudi.common.util.ClusteringUtils.getClusteringPlan(ClusteringUtils.java:106)
   	at org.apache.hudi.common.util.ClusteringUtils.lambda$getAllPendingClusteringPlans$0(ClusteringUtils.java:69)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:129)
   	... 105 more
   Caused by: java.io.FileNotFoundException: File does not exist: /user/prod_dd_cdm/dd_cdm_dev/hive/dd_cdm_dev/dwd_trip_mkt_soul_carbon_coin_get_di/.hoodie/20221024110608265.replacecommit.requested
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2045)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:760)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:432)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
   
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1316)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1301)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1289)
   	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1614)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:359)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:354)
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:367)
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:837)
   	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:460)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:758)
   	... 117 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] weimingdiit commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.

weimingdiit commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1554337442

   @nsivabalan @KnightChess  ok,  I'll add some details info  to reproduce the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1296640318

   `when init table`
   org.apache.hudi.exception.HoodieException: Error getting all file groups in pending clustering
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:113)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:102)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:93)
   	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.<init>(HoodieMetadataFileSystemView.java:44)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:166)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:259)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:111)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:110)
   	at org.apache.hudi.table.HoodieTable.getHoodieView(HoodieTable.java:310)
   	at org.apache.hudi.table.HoodieSparkTable.create(HoodieSparkTable.java:92)
   	at org.apache.hudi.client.SparkRDDWriteClient.doInitTable(SparkRDDWriteClient.java:437)
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1463)
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1497)
   	at org.apache.hudi.client.SparkRDDWriteClient.insertOverwrite(SparkRDDWriteClient.java:206)
   	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:215)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:309)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:165)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
   	at org.apache.spark.sql.hudi.catalog.HoodieV1WriteBuilder$$anon$1$$anon$2.insert(HoodieInternalV2Table.scala:116)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1(V1FallbackWriters.scala:79)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1$(V1FallbackWriters.scala:78)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.writeWithV1(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run(V1FallbackWriters.scala:66)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run$(V1FallbackWriters.scala:65)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.run(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:113)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:431)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:569)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:563)
   	at scala.collection.Iterator.foreach(Iterator.scala:943)
   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:563)
   	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
   	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
   	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:229)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:748)
   Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /xxxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:761)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:266)
   	at org.apache.hudi.common.util.ClusteringUtils.getRequestedReplaceMetadata(ClusteringUtils.java:90)
   	at org.apache.hudi.common.util.ClusteringUtils.getClusteringPlan(ClusteringUtils.java:106)
   	at org.apache.hudi.common.util.ClusteringUtils.lambda$getAllPendingClusteringPlans$0(ClusteringUtils.java:69)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:129)
   	... 107 more
   Caused by: java.io.FileNotFoundException: File does not exist: /xxxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2045)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:760)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:432)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
   
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1316)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1301)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1289)
   	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1614)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:359)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:354)
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:367)
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:837)
   	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:460)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:758)
   	... 119 more
   Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /xxxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2045)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:760)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:432)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
   
   	at org.apache.hadoop.ipc.Client.call(Client.java:1512)
   	at org.apache.hadoop.ipc.Client.call(Client.java:1450)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   	at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259)
   	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:275)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:122)
   	at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1314)
   	... 129 more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1296617289

   `error in runTableServicesInline`
   ```shell
   org.apache.hudi.exception.HoodieException: Error getting all file groups in pending clustering
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:113)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:108)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:102)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:93)
   	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.<init>(HoodieMetadataFileSystemView.java:44)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.createInMemoryFileSystemView(FileSystemViewManager.java:166)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$createViewManager$5fcdabfe$1(FileSystemViewManager.java:259)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.lambda$getFileSystemView$1(FileSystemViewManager.java:111)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	at org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:110)
   	at org.apache.hudi.table.HoodieTable.getHoodieView(HoodieTable.java:310)
   	at org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:547)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:246)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:655)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:318)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:165)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
   	at org.apache.spark.sql.hudi.catalog.HoodieV1WriteBuilder$$anon$1$$anon$2.insert(HoodieInternalV2Table.scala:116)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1(V1FallbackWriters.scala:79)
   	at org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1$(V1FallbackWriters.scala:78)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.writeWithV1(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run(V1FallbackWriters.scala:66)
   	at org.apache.spark.sql.execution.datasources.v2.V1FallbackWriters.run$(V1FallbackWriters.scala:65)
   	at org.apache.spark.sql.execution.datasources.v2.OverwriteByExpressionExecV1.run(V1FallbackWriters.scala:51)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:113)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:431)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:569)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:563)
   	at scala.collection.Iterator.foreach(Iterator.scala:943)
   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:563)
   	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
   	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
   	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:229)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:748)
   Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /xxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:761)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:266)
   	at org.apache.hudi.common.util.ClusteringUtils.getRequestedReplaceMetadata(ClusteringUtils.java:90)
   	at org.apache.hudi.common.util.ClusteringUtils.getClusteringPlan(ClusteringUtils.java:106)
   	at org.apache.hudi.common.util.ClusteringUtils.lambda$getAllPendingClusteringPlans$0(ClusteringUtils.java:69)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
   	at org.apache.hudi.common.util.ClusteringUtils.getAllFileGroupsInPendingClusteringPlans(ClusteringUtils.java:129)
   	... 105 more
   Caused by: java.io.FileNotFoundException: File does not exist: /xxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2045)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:760)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:432)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
   
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1316)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1301)
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1289)
   	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1614)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:359)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:354)
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:367)
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:837)
   	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:460)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:758)
   	... 117 more
   Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: xxx/.hoodie/20221029151452396.replacecommit.requested
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2045)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:760)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:432)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
   
   	at org.apache.hadoop.ipc.Client.call(Client.java:1512)
   	at org.apache.hadoop.ipc.Client.call(Client.java:1450)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   	at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259)
   	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:275)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:122)
   	at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1314)
   	... 127 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "jjtjiang (via GitHub)" <gi...@apache.org>.

jjtjiang commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1846470553

   @ad1happy2go 
    i also face this problem  . 
   version : hudi 0.12.3
   how to   reproduce the issue:  just use the insert overwirte sql when  insert a big table .
   here is my case: 
   row: 1148000000  (if the rows is smaller .eg 1000000 ,there will can't reproduce this issue)
    ddl :` create table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208(
     `_hoodie_is_deleted` BOOLEAN,
     `t_pre_combine_field` long,
       order_type int , 
       order_no int , 
       profile_no int , 
       profile_type string , 
       profile_cat string , 
       u_version string , 
       order_line_no int , 
       profile_c string , 
       profile_i int , 
       profile_f decimal(20,8) , 
       profile_d timestamp , 
       active string , 
       entry_datetime timestamp , 
       entry_id int , 
       h_version int )
   USING hudi
   TBLPROPERTIES (
     'hoodie.write.concurrency.mode'='optimistic_concurrency_control' ,
     'hoodie.cleaner.policy.failed.writes'='LAZY',
     'hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider',
     'hoodie.write.lock.filesystem.expire'= 5,
     'primaryKey' = 'order_no,profile_type,profile_no,order_type,profile_cat',
     'type' = 'cow',
     'preCombineField' = 't_pre_combine_field')
   CLUSTERED BY ( 
     order_no,profile_type,profile_no,order_type,profile_cat) 
   INTO 2 BUCKETS;`
   
   sql: `
   insert
   	overwrite table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208
   select
   	false ,
   	1,
   	order_type ,
   	order_no ,
   	profile_no ,
   	profile_type ,
   	profile_cat ,
   	u_version ,
   	order_line_no ,
   	profile_c ,
   	profile_i ,
   	profile_f ,
   	profile_d ,
   	active ,
   	entry_datetime ,
   	entry_id ,
   	h_version
   from
   	 temp_db.ods_cis_dbo_history_profile_tmp ; `
   insert
   	overwrite table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208
   select
   	false ,
   	1,
   	order_type ,
   	order_no ,
   	profile_no ,
   	profile_type ,
   	profile_cat ,
   	u_version ,
   	order_line_no ,
   	profile_c ,
   	profile_i ,
   	profile_f ,
   	profile_d ,
   	active ,
   	entry_datetime ,
   	entry_id ,
   	h_version
   from
   	 temp_db.ods_cis_dbo_history_profile_tmp ; 
   ./hoodie  dir file list:
   .hoodie/.aux
   .hoodie/.heartbeat
   .hoodie/.schema
   .hoodie/.temp
   .hoodie/20231207055239027.replacecommit
   .hoodie/20231207055239027.replacecommit.inflight
   .hoodie/20231207055239027.replacecommit.requested
   .hoodie/20231207084620796.replacecommit
   .hoodie/20231207084620796.replacecommit.inflight
   .hoodie/20231207084620796.replacecommit.requested
   .hoodie/20231207100918624.rollback
   .hoodie/20231207100918624.rollback.inflight
   .hoodie/20231207100918624.rollback.requested
   .hoodie/20231207100923823.rollback
   .hoodie/20231207100923823.rollback.inflight
   .hoodie/20231207100923823.rollback.requested
   .hoodie/20231207102003686.replacecommit
   .hoodie/20231207102003686.replacecommit.inflight
   .hoodie/20231207102003686.replacecommit.requested
   .hoodie/archived
   .hoodie/hoodie.properties
   .hoodie/metadata
   
   
   we cant see there is no file 20231207071610343.replacecommit.requested  . but  the program needs to find this file. so it failed .this make me wonder.
   
   hoodie.properties:
   hoodie.table.precombine.field=t_pre_combine_field
   hoodie.datasource.write.drop.partition.columns=false
   hoodie.table.type=COPY_ON_WRITE
   hoodie.archivelog.folder=archived
   hoodie.timeline.layout.version=1
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   hoodie.table.recordkey.fields=order_no,profile_type,profile_no,order_type,profile_cat
   hoodie.database.name=temp_db
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.table.name=ods_cis_corp_history_profile_hudi_t1_20231207
   hoodie.datasource.write.hive_style_partitioning=true
   hoodie.table.checksum=2702244832
   hoodie.table.create.schema={"type"\:"record","name"\:"ods_cis_corp_history_profile_hudi_t1_20231207_record","namespace"\:"hoodie.ods_cis_corp_history_profile_hudi_t1_20231207","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"_hoodie_is_deleted","type"\:["boolean","null"]},{"name"\:"t_pre_combine_field","type"\:["long","null"]},{"name"\:"order_type","type"\:["int","null"]},{"name"\:"order_no","type"\:["int","null"]},{"name"\:"profile_no","type"\:["int","null"]},{"name"\:"profile_type","type"\:["string","null"]},{"name"\:"profile_cat","type"\:["string","null"]},{"name"\:"u_version","type"\:["string","null"]},{"name"\:"order_line_no","type"\:["int","null"]},{"name"\:"profile_c","type"\:["string","null"]},{"name"\:"profile_i",
 "type"\:["int","null"]},{"name"\:"profile_f","type"\:[{"type"\:"fixed","name"\:"fixed","namespace"\:"hoodie.ods_cis_corp_history_profile_hudi_t1_20231207.ods_cis_corp_history_profile_hudi_t1_20231207_record.profile_f","size"\:9,"logicalType"\:"decimal","precision"\:20,"scale"\:8},"null"]},{"name"\:"profile_d","type"\:[{"type"\:"long","logicalType"\:"timestamp-micros"},"null"]},{"name"\:"active","type"\:["string","null"]},{"name"\:"entry_datetime","type"\:[{"type"\:"long","logicalType"\:"timestamp-micros"},"null"]},{"name"\:"entry_id","type"\:["int","null"]},{"name"\:"h_version","type"\:["int","null"]}]}
   
   more logs
   [hudi.log](https://github.com/apache/hudi/files/13608380/hudi.log)
    see the attachment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "cbomgit (via GitHub)" <gi...@apache.org>.

cbomgit commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1973287877

   We are currently using 0.11.0. We've had an upgrade on our plate for a bit. It may be wise for us to prioritize then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1296651817

   mark: update error Stack in issue desc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1892033786

   The internal method we use for fixing the issue is quite aggressive; we directly catch and handle exceptions during the reading process. Our version now is 0.13.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1554322383

   @nsivabalan yes, we still have this question in 0.13.x, @weimingdiit can you cc that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1607380704

   @weimingdiit Did you got a chance to provide more information here to reproduce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.

nsivabalan commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1542762155

   hey @KnightChess : sorry, we did not get to triage this. Are you still facing issues. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "subash-metica (via GitHub)" <gi...@apache.org>.

subash-metica commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1859967411

   Hi @KnightChess , @xushiyan and @jjtjiang - have you found a fix for this issue or a workaround ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "cbomgit (via GitHub)" <gi...@apache.org>.

cbomgit commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1973206032

   Any update on a root cause/fix? We are facing a similar issue suddenly. We have multi-writer with OCC. Each writer writes distinct partitions and uses insert_overwrite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] KnightChess commented on issue #7057: [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering

Posted by GitBox <gi...@apache.org>.

KnightChess commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1296615528

   sorry, I give the wrong analyze log, there has three scenes will cause this error. the first log which I analyze can not found, set replace file to result. I will update if I found it.
   I give some other scenes log.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "subash-metica (via GitHub)" <gi...@apache.org>.

subash-metica commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1973269182

   @cbomgit - What is the version you are using ? 
   
   Unfortunately, I had to not use multi-writer at this point to circumvent this problem. Not sure whether this exists in 0.14 version as well - I might upgrade it and test it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [SUPPORT] [OCC] HoodieException: Error getting all file groups in pending clustering [hudi]

Posted by "subash-metica (via GitHub)" <gi...@apache.org>.

subash-metica commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1878374530

   Hi,
   I am facing the issue again, the problem is happening for random instances and no common pattern I could see. 
   
   Hudi version: 0.13.1
   
   The error stack trace,
   
   ```
   org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://<dataset-location>/.hoodie/20240103002413315.replacecommit.requested
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:310) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.util.ClusteringUtils.getRequestedReplaceMetadata(ClusteringUtils.java:93) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.util.ClusteringUtils.getClusteringPlan(ClusteringUtils.java:109) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$getInflightTimelineExcludeCompactionAndClustering$7(BaseHoodieTableServiceClient.java:595) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178) ~[?:?]
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
   	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
   	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
   	at org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.<init>(HoodieDefaultTimeline.java:58) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.filter(HoodieDefaultTimeline.java:236) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieTableServiceClient.getInflightTimelineExcludeCompactionAndClustering(BaseHoodieTableServiceClient.java:593) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieTableServiceClient.getInstantsToRollback(BaseHoodieTableServiceClient.java:737) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:706) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommitWithTime$97cdbdca$1(BaseHoodieWriteClient.java:844) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:156) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieWriteClient.startCommitWithTime(BaseHoodieWriteClient.java:843) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.client.BaseHoodieWriteClient.startCommitWithTime(BaseHoodieWriteClient.java:836) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:371) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:151) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530) ~[spark-catalyst_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) ~[spark-sql_2.12-3.4.1-amzn-0.jar:3.4.1-amzn-0]
   
   Caused by: java.io.FileNotFoundException: No such file or directory 's3:<dataset-location>/.hoodie/20240103002413315.replacecommit.requested'
   	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:556) ~[emrfs-hadoop-assembly-2.58.0.jar:?]
   	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:988) ~[emrfs-hadoop-assembly-2.58.0.jar:?]
   	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:980) ~[emrfs-hadoop-assembly-2.58.0.jar:?]
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:983) ~[hadoop-client-api-3.3.3-amzn-5.jar:?]
   	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:197) ~[emrfs-hadoop-assembly-2.58.0.jar:?]
   	at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:476) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:821) ~[hudi-spark3-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
   	... 94 more
   ```
   
   While looking at `timeline show incomplete`, below is the view
   <img width="674" alt="Screenshot 2024-01-05 at 14 41 40" src="https://github.com/apache/hudi/assets/142811976/21099107-4570-408a-b5aa-958efcf2f31d">
   
   
   This instant happened randomly, nothing changes on our side - we have a job which dumps the data as Hudi periodically every hour. Suddenly, at one execution - the replacecommit.inflight and replacecommit.requested timestamp is different (not sure how it (could) happen)
   As a workaround, I renamed the replacecommit.requested to be same as .inflight timestamp and reran for that partition to fix the data. It worked fine but not sure how to reproduce the same unfortunately.
   
   Could it be a race condition ? or any alternative for this issue ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org