You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by _soumya_ <so...@gmail.com> on 2014/07/19 00:44:22 UTC

Running Spark/YARN on AWS EMR - Issues finding file on hdfs?

I'm stumped with this one. I'm using YARN on EMR to distribute my spark job.
While it seems initially, the job is starting up fine - the Spark Executor
nodes are having trouble pulling the jars from the location on hdfs that the
master just put the files on. 

[hadoop@ip-172-16-2-167 ~]$
SPARK_JAR=./spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar
./spark/bin/spark-class org.apache.spark.deploy.yarn.Client --jar
/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar --class
com.evocalize.rickshaw.spark.applications.GenerateAssetContent --args
yarn-standalone --num-workers 3 --master-memory 2g --worker-memory 2g
--worker-cores 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/.versions/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: This client is deprecated and will be removed in a future version
of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--num-workers is deprecated. Use --num-executors instead.
--master-memory is deprecated. Use --driver-memory instead.
--worker-memory is deprecated. Use --executor-memory instead.
--worker-cores is deprecated. Use --executor-cores instead.
14/07/18 22:27:50 INFO client.RMProxy: Connecting to ResourceManager at
/172.16.2.167:9022
14/07/18 22:27:51 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 2
14/07/18 22:27:51 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/07/18 22:27:51 INFO yarn.Client: Max mem capabililty of a single resource
in this cluster 3072
14/07/18 22:27:51 INFO yarn.Client: Preparing Local resources
14/07/18 22:27:53 INFO yarn.Client: Uploading
file:/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar to
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/rickshaw-spark-0.0.1-SNAPSHOT.jar
14/07/18 22:27:57 INFO yarn.Client: Uploading
file:/home/hadoop/spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar to
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar
14/07/18 22:27:59 INFO yarn.Client: Setting up the launch environment
14/07/18 22:27:59 INFO yarn.Client: Setting up container launch context
14/07/18 22:27:59 INFO yarn.Client: Command for starting the Spark
ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx2048m,
-Djava.io.tmpdir=$PWD/tmp, 
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ApplicationMaster, --class,
com.evocalize.rickshaw.spark.applications.GenerateAssetContent, --jar ,
/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar,  --args 
'yarn-standalone' , --executor-memory, 2048, --executor-cores, 1,
--num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
14/07/18 22:27:59 INFO yarn.Client: Submitting application to ASM
14/07/18 22:27:59 INFO impl.YarnClientImpl: Submitted application
application_1405713259773_0014
...
14/07/18 22:28:23 INFO yarn.Client: Application report from ASM:
	 application identifier: application_1405713259773_0014
	 appId: 14
	 clientToAMToken: null
	 appDiagnostics: Application application_1405713259773_0014 failed 2 times
due to AM Container for appattempt_1405713259773_0014_000002 exited with 
exitCode: -1000 due to: File does not exist:
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
	 appMasterHost: N/A
	 appQueue: default
	 appMasterRpcPort: -1
	 appStartTime: 1405722479547
	 yarnAppState: FAILED
	 distributedFinalState: FAILED
	 appTrackingUrl:
ip-172-16-2-167.us-west-1.compute.internal:9026/cluster/app/application_1405713259773_0014
	 appUser: hadoop

-----

I tried to ls the file on the location - it doesn't exist either - although
Spark could have cleaned that up before exiting. I verified that on EMR, all
ports are open between each other so this can't be a port issue. What am I
missing? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Running Spark/YARN on AWS EMR - Issues finding file on hdfs?

Posted by jaredtims <ja...@yahoo.com>.
Any resolution to this? Im having the same problem.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214p22918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: Running Spark/YARN on AWS EMR - Issues finding file on hdfs?

Posted by neeraj <ne...@infosys.com>.
I'm trying to get some workaround for this issue.

Thanks and Regards,
Neeraj Garg

From: H4ml3t [via Apache Spark User List] [mailto:ml-node+s1001560n16379h10@n3.nabble.com]
Sent: Tuesday, October 14, 2014 6:53 PM
To: Neeraj Garg02
Subject: Re: Running Spark/YARN on AWS EMR - Issues finding file on hdfs?

I'm having the same problem, seems we are the only two in the world, did you get to resolve it?
________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214p16379.html
To start a new topic under Apache Spark User List, email ml-node+s1001560n1h92@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark User List, click here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bmVlcmFqX2dhcmcwMkBpbmZvc3lzLmNvbXwxfDE4ODIyNTA3OTE=>.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
every reasonable precaution to minimize this risk, but is not liable for any damage
you may sustain as a result of any virus in this e-mail. You should carry out your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214p16381.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.