You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Furcy Pin (JIRA)" <ji...@apache.org> on 2018/08/24 16:28:00 UTC
[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

    [ https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591868#comment-16591868 ] 

Furcy Pin commented on SPARK-10795:
-----------------------------------

Hi, I came across this ticket with the same issue: my yarn job was failing with an error {code:java}java.io.FileNotFoundException: File does not exist{code} for some file called *__spark_conf__.zip* or *pyspark.zip* on hdfs, in the staging directory.

For me too, the files where uploaded correctly on hdfs, and the error happened at shutdown, because something was trying to read them after the staging directory had been wiped.

Thanks to Carlos Bribiescas's comment, I found out that I had left a 
{code:java}
SparkSession.builder.master("local[4]"){code}
in my code. After removing it everything worked like a charm.

I suggest creating a new ticket to add a check with a nice error message when the users make such kind of mistakes and close this ticket when it's done.


> FileNotFoundException while deploying pyspark job on cluster
> ------------------------------------------------------------
>
>                 Key: SPARK-10795
>                 URL: https://issues.apache.org/jira/browse/SPARK-10795
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: EMR 
>            Reporter: Harshit
>            Priority: Major
>
> I am trying to run simple spark job using pyspark, it works as standalone , but while I deploy over cluster it fails.
> Events :
> 2015-09-24 10:38:49,602 INFO  [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-xxxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> Above uploading resource file is successfull , I manually checked file is present in above specified path , but after a while I face following error :
> Diagnostics: File does not exist: hdfs://ip-xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip
> java.io.FileNotFoundException: File does not exist: hdfs://ip-1xxx.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1439967440341_0461/pyspark.zip



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org