You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/13 13:48:15 UTC

[GitHub] [hudi] rubenssoto opened a new issue #2944: [SUPPORT] Failed to archive commits

rubenssoto opened a new issue #2944:
URL: https://github.com/apache/hudi/issues/2944


   Hello guys,
   
   I have a lot of Hudi jobs, one job failed yesterday and after some hours started to show this error:
   
   Apache Hudi 0.8
   EMR 6.2
   Apache Spark 3.0
   
   
   Hudi Options:
   
   Map(hoodie.datasource.hive_sync.database -> raw_freshchat, 
   hoodie.parquet.small.file.limit -> 402653184, 
   hoodie.copyonwrite.record.size.estimate -> 1024, 
   hoodie.datasource.hive_sync.support_timestamp -> true, 
   hoodie.datasource.write.precombine.field -> linecreatedtimestamp, 
   hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.NonPartitionedExtractor, hoodie.parquet.max.file.size -> 419430400, 
   hoodie.parquet.block.size -> 402653184, 
   hoodie.datasource.hive_sync.table -> conversation_created, 
   hoodie.datasource.write.operation -> upsert, 
   hoodie.datasource.hive_sync.enable -> false, 
   hoodie.datasource.write.recordkey.field -> conversation_id,created_at, 
   hoodie.table.name -> conversation_created, 
   hoodie.datasource.hive_sync.jdbcurl -> jdbc:hive2://ip-10-0-92-222.us-west-2.compute.internal:10000, hoodie.datasource.write.hive_style_partitioning -> false, 
   hoodie.datasource.write.table.name -> conversation_created, 
   hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.NonpartitionedKeyGenerator, hoodie.keep.max.commits -> 38, 
   hoodie.upsert.shuffle.parallelism -> 100, 
   hoodie.cleaner.commits.retained -> 36, 
   hoodie.keep.min.commits -> 37, hoodie.clean.async -> true)
   
   `21/05/13 13:21:00 ERROR HoodieTimelineArchiveLog: Failed to archive commits, .commit file: 20210504033941.rollback.inflight
   java.io.IOException: Not an Avro data file
   	at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
   	at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
   	at org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:479)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:223)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
   	at hudiwriter.HudiWriter.merge(HudiWriter.scala:98)
   	at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:39)
   	at jobs.TableProcessor.start(TableProcessor.scala:88)
   	at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
   	at scala.util.Success.$anonfun$map$1(Try.scala:255)
   	at scala.util.Success.map(Try.scala:213)
   	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
   	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
   	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
   	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
   	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
   	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
   21/05/13 13:21:00 INFO S3NativeFileSystem: Opening 's3://bucket/freshchat/response_time/.hoodie/20210512023309.clean.inflight' for reading
   21/05/13 13:21:00 INFO S3NativeFileSystem: Opening 's3://bucket/freshchat/response_time/.hoodie/20210512023309.clean' for reading`
   
   
   
   
   Coudl you help me?
   
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] garyli1019 commented on issue #2944: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.

garyli1019 commented on issue #2944:
URL: https://github.com/apache/hudi/issues/2944#issuecomment-840988053


   This issue should be fixed by https://github.com/apache/hudi/pull/2677 , will be released with 0.9.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] garyli1019 closed issue #2944: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.

garyli1019 closed issue #2944:
URL: https://github.com/apache/hudi/issues/2944


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] garyli1019 commented on issue #2944: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.

garyli1019 commented on issue #2944:
URL: https://github.com/apache/hudi/issues/2944#issuecomment-840618239


   Hello @rubenssoto , you can manually remove 20210504033941.rollback.inflight this file to resume your pipeline for now or set `hoodie.fail.on.timeline.archiving=false` to temporarily resolve this issue. I created a ticket to track this and will resolve it before next release. https://issues.apache.org/jira/browse/HUDI-1901


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org