You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2018/01/08 14:27:00 UTC

[jira] [Resolved] (SPARK-18883) FileNotFoundException on _temporary directory

     [ https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran resolved SPARK-18883.
------------------------------------
    Resolution: Won't Fix

I'm going to close as a WONTFIX, because the solution is "don't use FileOutputCommitter to write to eventually consistent object stores". It expects newly created files and directories to exist, but with eventual consistency listings can break that expectation.

with HADOOP-13345/s3guard turned on you don't get the inconsistency, but you still get awful commit times and weak failure recovery: the expectation "rename() is fast and atomic" is also broken.

Best to wait for a version of the Hadoop JARs containing the HADOOP-13786 committer, which is explicitly designed to work with S3.

> FileNotFoundException on _temporary directory 
> ----------------------------------------------
>
>                 Key: SPARK-18883
>                 URL: https://issues.apache.org/jira/browse/SPARK-18883
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: We're on a CDH 5.7, Hadoop 2.6.
>            Reporter: Mathieu DESPRIEE
>
> I'm experiencing the following exception, usually after some time with heavy load :
> {code}
> 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.FileNotFoundException: File hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
>         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
>         at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
>         at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
>         at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
>         at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>         at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>         at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>         at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>         at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>         at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>         at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>         at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
>         at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>         at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>         at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>         at com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
>         at com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
>         at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
>         at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
>         at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
>         at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>         at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> Looks similar to [SPARK-18512] although it's not the same environment : no streaming, no S3 here. Final path in stack different.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org