You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/07 22:55:29 UTC

[GitHub] [hudi] parisni opened a new issue, #5785: [SUPPORT] S3 flush broken: S3A streams are not Syncable See HADOOP-17597

parisni opened a new issue, #5785:
URL: https://github.com/apache/hudi/issues/5785

   Hi, 
   
   hudi 0.11.0
   spark 3.2.1
   hadoop 3.3.1
   
   I suddenly encounter the blocking error below, which prevent from writing anything to hudi:
   
   apparently hflush is not supported by s3
   
   https://github.com/apache/hudi/blob/a68e1dc2dba475b9a63779f3afa0b5c558a7cd3b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java#L259-L261
   
   ```java
       // NOTE : the following API call makes sure that the data is flushed to disk on DataNodes (akin to POSIX fsync())
       // See more details here : https://issues.apache.org/jira/browse/HDFS-744
       output.hsync();
   ```
   The hadoop doc https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html#UnsupportedOperationException_.E2.80.9CS3A_streams_are_not_Syncable._See_HADOOP-17597..E2.80.9D
   says to turn `fs.s3a.downgrade.syncable.exceptions` to `true`, but this has no effect.
   
   ```
   - S3A streams are not Syncable. See HADOOP-17597.
   java.lang.UnsupportedOperationException: S3A streams are not Syncable. See HADOOP-17597.
   	at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.handleSyncableInvocation(S3ABlockOutputStream.java:656)
   	at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.hsync(S3ABlockOutputStream.java:649)
   	at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:145)
   	at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:145)
   	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.flush(HoodieLogFormatWriter.java:261)
   	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.closeStream(HoodieLogFormatWriter.java:247)
   	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.close(HoodieLogFormatWriter.java:242)
   	at org.apache.hudi.client.HoodieTimelineArchiver.close(HoodieTimelineArchiver.java:143)
   	at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:181)
   	at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:907)
   	at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:629)
   	at org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:534)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:650)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:303)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on issue #5785: [SUPPORT] S3 flush broken: S3A streams are not Syncable See HADOOP-17597

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #5785:
URL: https://github.com/apache/hudi/issues/5785#issuecomment-1149279091

   my bad: I am creating a `AbstractHiveSyncHoodieClient`, and things goes wrong 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni closed issue #5785: [SUPPORT] S3 flush broken: S3A streams are not Syncable See HADOOP-17597

Posted by GitBox <gi...@apache.org>.
parisni closed issue #5785: [SUPPORT] S3 flush broken: S3A streams are not Syncable See HADOOP-17597
URL: https://github.com/apache/hudi/issues/5785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org