You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Juliet Hougland (JIRA)" <ji...@apache.org> on 2015/07/11 00:41:05 UTC

[jira] [Comment Edited] (SPARK-8646) PySpark does not run on YARN

    [ https://issues.apache.org/jira/browse/SPARK-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623008#comment-14623008 ] 

Juliet Hougland edited comment on SPARK-8646 at 7/10/15 10:40 PM:
------------------------------------------------------------------

[~lianhuiwang] I just uploaded the log files from using the verbose flag. I think I may have important clues as to where the problem lies. Instead of using '--master yarn-client' as part of my spark-submit command, I parse my own cli arg in my main class to get the spark master and initialize a configuration with it. If I add --master yarn-client in addition to my normal master specification, the job runs fine.

The following command works in Spark 1.3 but not in 1.4:
    $SPARK_HOME/bin/spark-submit --verbose outofstock/data_transform.py \
    hdfs://foe-dev/DEMO_DATA/FACT_POS     hdfs:/user/juliet/ex4/ yarn-client

If I add the --master yarn-client parameter to the command it works. Specifically:
    $SPARK_HOME/bin/spark-submit --verbose --master yarn-client outofstock/data_transform.py \
    hdfs://foe-dev/DEMO_DATA/FACT_POS     hdfs:/user/juliet/ex4/ yarn-client


was (Author: juliet):
[~lianhuiwang] I just uploaded the log files from using --verbose. I think I may have important clues as to where the problem lies. Instead of using '--master yarn-client' as part of my spark-submit command, I parse my own cli arg in my main class to get the spark master and initialize a configuration with it. If I add --master yarn-client in addition to my normal master specification, the job runs fine.

The following command works in Spark 1.3 but not in 1.4:
    $SPARK_HOME/bin/spark-submit --verbose outofstock/data_transform.py \
    hdfs://foe-dev/DEMO_DATA/FACT_POS     hdfs:/user/juliet/ex4/ yarn-client

If I add the --master yarn-client parameter to the command it works. Specifically:
    $SPARK_HOME/bin/spark-submit --verbose --master yarn-client outofstock/data_transform.py \
    hdfs://foe-dev/DEMO_DATA/FACT_POS     hdfs:/user/juliet/ex4/ yarn-client

> PySpark does not run on YARN
> ----------------------------
>
>                 Key: SPARK-8646
>                 URL: https://issues.apache.org/jira/browse/SPARK-8646
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.4.0
>         Environment: SPARK_HOME=local/path/to/spark1.4install/dir
> also with
> SPARK_HOME=local/path/to/spark1.4install/dir
> PYTHONPATH=$SPARK_HOME/python/lib
> Spark apps are submitted with the command:
> $SPARK_HOME/bin/spark-submit outofstock/data_transform.py hdfs://foe-dev/DEMO_DATA/FACT_POS hdfs:/user/juliet/ex/ yarn-client
> data_transform contains a main method, and the rest of the args are parsed in my own code.
>            Reporter: Juliet Hougland
>         Attachments: executor.log, pi-test.log, spark1.4-SPARK_HOME-set-PYTHONPATH-set.log, spark1.4-SPARK_HOME-set-inline-HADOOP_CONF_DIR.log, spark1.4-SPARK_HOME-set.log, spark1.4-verbose.log, verbose-executor.log
>
>
> Running pyspark jobs result in a "no module named pyspark" when run in yarn-client mode in spark 1.4.
> [I believe this JIRA represents the change that introduced this error.| https://issues.apache.org/jira/browse/SPARK-6869 ]
> This does not represent a binary compatible change to spark. Scripts that worked on previous spark versions (ie comands the use spark-submit) should continue to work without modification between minor versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org