You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/05/05 06:06:00 UTC
[jira] [Commented] (SPARK-39103) SparkContext.addFile trigger backend exception if it tries to add empty Hadoop directory

    [ https://issues.apache.org/jira/browse/SPARK-39103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532072#comment-17532072 ] 

Apache Spark commented on SPARK-39103:
--------------------------------------

User 'daijyc' has created a pull request for this issue:
https://github.com/apache/spark/pull/36453

> SparkContext.addFile trigger backend exception if it tries to add empty Hadoop directory
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-39103
>                 URL: https://issues.apache.org/jira/browse/SPARK-39103
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Daniel Dai
>            Priority: Minor
>
> _spark.sparkContext.addFile("hdfs://xxxx/empty_dir", true)_ will result a backend failure with a vague error message:
> {code:java}
> java.nio.file.NoSuchFileException: /data/nvme1n1/nm-local-dir/usercache/jdai/appcache/application_1650999286466_24833/spark-c9e4de1a-5932-4c9b-bcac-138c8770960d/9368705081651698621617_cache
> 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 	at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
> 	at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
> 	at java.nio.file.Files.copy(Files.java:1274)
> 	at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:734)
> 	at org.apache.spark.util.Utils$.copyFile(Utils.scala:705)
> 	at org.apache.spark.util.Utils$.fetchFile(Utils.scala:542)
> 	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$4(Executor.scala:935)
> 	at org.apache.spark.executor.Executor.$anonfun$updateDependencies$4$adapted(Executor.scala:932)
> 	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
> 	at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> 	at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> 	at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
> 	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:932)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens when executor localize the resource, it will invoke org.apache.spark.util.Utils.fetchFile, inside the method, Spark will first invoke *doFetchFile* to first download the file from HDFS to staging, then invoke *copyFile* to copy the staging file to final destination. However, if it is an empty HDFS folder, doFetchFile invokes fetchHcfsFile, which skips creating the empty staging folder. However, *copyFile* would assume the staging folder exist, thus trigger the exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org