You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ángel Álvarez (JIRA)" <ji...@apache.org> on 2014/11/20 16:34:33 UTC

[jira] [Commented] (SPARK-1825) Windows Spark fails to work with Linux YARN

    [ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219490#comment-14219490 ] 

Ángel Álvarez commented on SPARK-1825:
--------------------------------------

I've had the following problems to make Windows+Pyspark+YARN work properly:

1. net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py

FIX? Comment the "net.topology.script.file.name" property configuration in the file core-site.xml.

2. Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  /yarn/nm/usercache/bigdata/filecache/63/spark-assembly-1.1.0-hadoop2.3.0-cdh5.0.1.jar

FIX? Add the environment variable SPARK_YARN_USER_ENV to my client (Eclipse) launch configuration. Assign this value to the env var:

PYTHONPATH=/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.2.1-src.zip	


is there any way to do it simpler? am I do it something wrong?

> Windows Spark fails to work with Linux YARN
> -------------------------------------------
>
>                 Key: SPARK-1825
>                 URL: https://issues.apache.org/jira/browse/SPARK-1825
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Taeyun Kim
>             Fix For: 1.2.0
>
>         Attachments: SPARK-1825.patch
>
>
> Windows Spark fails to work with Linux YARN.
> This is a cross-platform problem.
> This error occurs when 'yarn-client' mode is used.
> (yarn-cluster/yarn-standalone mode was not tested.)
> On YARN side, Hadoop 2.4.0 resolved the issue as follows:
> https://issues.apache.org/jira/browse/YARN-1824
> But Spark YARN module does not incorporate the new YARN API yet, so problem persists for Spark.
> First, the following source files should be changed:
> - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
> - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
> Change is as follows:
> - Replace .$() to .$$()
> - Replace File.pathSeparator for Environment.CLASSPATH.name to ApplicationConstants.CLASS_PATH_SEPARATOR (import org.apache.hadoop.yarn.api.ApplicationConstants is required for this)
> Unless the above are applied, launch_container.sh will contain invalid shell script statements(since they will contain Windows-specific separators), and job will fail.
> Also, the following symptom should also be fixed (I could not find the relevant source code):
> - SPARK_HOME environment variable is copied straight to launch_container.sh. It should be changed to the path format for the server OS, or, the better, a separate environment variable or a configuration variable should be created.
> - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after the above change is applied. maybe I missed a few lines.
> I'm not sure whether this is all, since I'm new to both Spark and YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org