You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pratik Malani (Jira)" <ji...@apache.org> on 2023/07/11 13:33:00 UTC
[jira] [Commented] (SPARK-33782) Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

    [ https://issues.apache.org/jira/browse/SPARK-33782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742006#comment-17742006 ] 

Pratik Malani commented on SPARK-33782:
---------------------------------------

Hi [~pralabhkumar] 

The latest update in the SparkSubmit.scala is causing the FileNotFoundException.
The below mentioned jar is present at the said location, but the Files.copy statement in the SparkSubmit.scala is causing the issue.
Can you please help to check what could be possible cause?
{code:java}
Files  local:///opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar from /opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar to /opt/spark/work-dir/./database-scripts-1.1-SNAPSHOT.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/database-scripts-1.1-SNAPSHOT.jar
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
        at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
        at java.nio.file.Files.copy(Files.java:1274)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$14(SparkSubmit.scala:437)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at org.apache.spark.deploy.SparkSubmit.downloadResourcesToCurrentDirectory$1(SparkSubmit.scala:424)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$17(SparkSubmit.scala:449)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:449)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
 {code}

> Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33782
>                 URL: https://issues.apache.org/jira/browse/SPARK-33782
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Hyukjin Kwon
>            Assignee: Pralabh Kumar
>            Priority: Major
>             Fix For: 3.4.0
>
>
> In Yarn cluster modes, the passed files are able to be accessed in the current working directory. Looks like this is not the case in Kubernates cluset mode.
> By doing this, users can, for example, leverage PEX to manage Python dependences in Apache Spark:
> {code}
> pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
> PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
> {code}
> See also https://github.com/apache/spark/pull/30735/files#r540935585.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org