You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/29 13:45:05 UTC

[GitHub] [hudi] codejoyan opened a new issue #3559: [SUPPORT] Failed to archive commits

codejoyan opened a new issue #3559:
URL: https://github.com/apache/hudi/issues/3559


   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : 2.4.7
   
   * Hive version : 2.3.7
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   **Issue Description**
   I am trying to do bulk_insert to a COW table in batches. After a 1st successful batch of bulk_insert, when I run the 2nd batch I run into the error "Failed to archive commits" (stack trace below). Below are the configurations used. As I read in https://github.com/apache/hudi/issues/2515, setting hoodie.datasource.write.row.writer.enable=false is not helping.
   I also see in older .commit metadata and that seems to cause the problem. How do I get around this?
   
     "extraMetadata" : {
       "schema" : null
     },
   
   **Configurations used**
   option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col1,col2,col3").
               option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "part_col_1,part_col_2").
               option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col7").
               option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.keygen.ComplexKeyGenerator").
               option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, "COPY_ON_WRITE").
               option(DataSourceWriteOptions.OPERATION_OPT_KEY, "bulk_insert").
               option(DataSourceWriteOptions.ENABLE_ROW_WRITER_OPT_KEY, "true").
               option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true").
               option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, srcSchema).
               option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, targetTableName).
               option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "part_col_1,part_col_2").
               option(DataSourceWriteOptions.HIVE_USE_JDBC_OPT_KEY, "false").
               option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, "thrift://host1:9083,thrift://host2:9083,thrift://host3:9083").
               option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[MultiPartKeysValueExtractor].getName).
               option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, "true").
               option(HoodieWriteConfig.TABLE_NAME, targetTableName).
               mode(SaveMode.Append).
               save(targetBasePath)
   
   **Stack Trace**
   21/08/29 12:34:25 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: User class threw exception: java.lang.Exception: Query execution failed in loadBaseTable
   	at com.walmart.finwb.salesbaseload.LoadTargetTableUtil$.loadBaseTable(LoadTargetTableUtil.scala:78)
   	at com.walmart.finwb.salesbaseload.LoadTargetTableUtil$.main(LoadTargetTableUtil.scala:37)
   	at com.walmart.finwb.salesbaseload.LoadTargetTableUtil.main(LoadTargetTableUtil.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:686)
   Caused by: org.apache.hudi.exception.HoodieException: Failed to archive commits
   	at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:83)
   	at org.apache.hudi.internal.HoodieDataSourceInternalWriter.commit(HoodieDataSourceInternalWriter.java:83)
   	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:696)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:281)
   	at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:295)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:127)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:696)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:310)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
   	at com.walmart.finwb.salesbaseload.LoadTargetTableUtil$.$anonfun$loadBaseTable$1(LoadTargetTableUtil.scala:72)
   	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
   	at com.walmart.finwb.salesbaseload.LoadTargetTableUtil$.loadBaseTable(LoadTargetTableUtil.scala:49)
   	... 7 more
   Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive commits
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:322)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:138)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:426)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:188)
   	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:171)
   	at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:80)
   	... 47 more
   Caused by: java.lang.NullPointerException: null of string of map of union in field extraMetadata of org.apache.hudi.avro.model.HoodieCommitMetadata of union in field hoodieCommitMetadata of org.apache.hudi.avro.model.HoodieArchivedMetaEntry
   	at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:145)
   	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
   	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
   	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
   	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:106)
   	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:97)
   	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:164)
   	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:142)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.writeToFile(HoodieTimelineArchiveLog.java:361)
   	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
   	... 52 more
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3559: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3559:
URL: https://github.com/apache/hudi/issues/3559


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #3559: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #3559:
URL: https://github.com/apache/hudi/issues/3559#issuecomment-926268715


   0.9.0 should fix this. @codejoyan let me know if that helps!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3559: [SUPPORT] Failed to archive commits

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3559:
URL: https://github.com/apache/hudi/issues/3559#issuecomment-946367293


   This was fixed in 090. closing it out. If you run into any issues, do reach out to us. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org