You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sebastian YEPES FERNANDEZ (JIRA)" <ji...@apache.org> on 2015/12/09 11:06:10 UTC

[jira] [Created] (SPARK-12239) ​SparkR - Not distributing SparkR module in YARN

Sebastian YEPES FERNANDEZ created SPARK-12239:
-------------------------------------------------

             Summary: ​SparkR - Not distributing SparkR module in YARN
                 Key: SPARK-12239
                 URL: https://issues.apache.org/jira/browse/SPARK-12239
             Project: Spark
          Issue Type: Bug
          Components: SparkR, YARN
    Affects Versions: 1.5.2, 1.5.3
            Reporter: Sebastian YEPES FERNANDEZ
            Priority: Critical


Hello,

I am trying to use the SparkR in a YARN environment and I have encountered the following problem:

Every thing work correctly when using bin/sparkR, but if I try running the same jobs using sparkR directly through R it does not work.

I have managed to track down what is causing the problem, when sparkR is launched through R the "SparkR" module is not distributed to the worker nodes.
I have tried working around this issue using the setting "spark.yarn.dist.archives", but it does not work as it deploys the file/extracted folder with the extension ".zip" and workers are actually looking for a folder with the name "sparkr"


Is there currently any way to make this work?



{code}
# spark-defaults.conf
spark.yarn.dist.archives                     /opt/apps/spark/R/lib/sparkr.zip

# R
library(SparkR, lib.loc="/opt/apps/spark/R/lib/")
sc <- sparkR.init(appName="SparkR", master="yarn-client", sparkEnvir=list(spark.executor.instances="1"))
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)
head(df)

15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
{code}


Container stderr:
{code}
15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.7 KB, free 530.0 MB)
15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002/sparkr/SparkR/worker/daemon.R': No such file or directory
15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:545)
	at java.net.ServerSocket.accept(ServerSocket.java:513)
	at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426)
{code}


Worker Node that runned the Container:
{code}
# ls -la /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002
total 71M
drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:04 .
drwx--x--- 7 yarn hadoop 4.0K Dec  9 09:04 ..
-rw-r--r-- 1 yarn hadoop  110 Dec  9 09:03 container_tokens
-rw-r--r-- 1 yarn hadoop   12 Dec  9 09:03 .container_tokens.crc
-rwx------ 1 yarn hadoop  736 Dec  9 09:03 default_container_executor_session.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 .default_container_executor_session.sh.crc
-rwx------ 1 yarn hadoop  790 Dec  9 09:03 default_container_executor.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 .default_container_executor.sh.crc
-rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
-rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:04 kafka-clients-0.8.2.2.jar
-rwx------ 1 yarn hadoop 6.0K Dec  9 09:03 launch_container.sh
-rw-r--r-- 1 yarn hadoop   56 Dec  9 09:03 .launch_container.sh.crc
-rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:04 spark-cassandra-connector_2.10-1.5.0-M3.jar
-rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:04 spark-csv-assembly-1.3.0.jar
lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:03 __spark__.jar -> /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:03 sparkr.zip -> /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip
-rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:04 spark-streaming_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:04 spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  48M Dec  9 09:04 sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar
drwx--x--- 2 yarn hadoop   46 Dec  9 09:04 tmp
{code}


*Working case:*
{code}
# sparkR --master yarn-client --num-executors 1

df <- createDataFrame(sqlContext, faithful)
head(df)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55
{code}

Worker Node that runned the Container:
{code}
# ls -la /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1170/container_e44_1445706872927_1170_01_000002/
total 71M
drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:14 .
drwx--x--- 6 yarn hadoop 4.0K Dec  9 09:14 ..
-rw-r--r-- 1 yarn hadoop  110 Dec  9 09:14 container_tokens
-rw-r--r-- 1 yarn hadoop   12 Dec  9 09:14 .container_tokens.crc
-rwx------ 1 yarn hadoop  736 Dec  9 09:14 default_container_executor_session.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:14 .default_container_executor_session.sh.crc
-rwx------ 1 yarn hadoop  790 Dec  9 09:14 default_container_executor.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:14 .default_container_executor.sh.crc
-rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:14 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
-rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:14 kafka-clients-0.8.2.2.jar
-rwx------ 1 yarn hadoop 6.3K Dec  9 09:14 launch_container.sh
-rw-r--r-- 1 yarn hadoop   60 Dec  9 09:14 .launch_container.sh.crc
-rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:14 spark-cassandra-connector_2.10-1.5.0-M3.jar
-rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:14 spark-csv-assembly-1.3.0.jar
lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:14 __spark__.jar -> /hadoop/hdfs/disk05/hadoop/yarn/local/usercache/spark/filecache/368/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:14 sparkr -> /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/filecache/367/sparkr.zip
-rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:14 spark-streaming_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:14 spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  48M Dec  9 09:14 sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar
drwx--x--- 2 yarn hadoop   46 Dec  9 09:14 tmp
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org