You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Philip Ogren <ph...@oracle.com> on 2014/01/30 20:21:03 UTC
various questions about yarn-standalone vs. yarn-client
I have a few questions about yarn-standalone and yarn-client deployment
modes that are described on the Launching Spark on YARN
<http://spark.incubator.apache.org/docs/latest/running-on-yarn.html> page.
1) Can someone give me a basic conceptual overview? I am struggling
with understanding the difference between yarn-standalone and
yarn-client deployment modes. I understand that yarn-standalone runs on
the name node and that yarn-client can be run from a remote machine -
but otherwise don't understand how they are different. It seems like
having yarn-client is the obvious better approach because it can run
from anywhere - but presumably, there is some advantage to having
yarn-standalone (otherwise, why not just run yarn-client on the name
node or from a remote machine.) I'm also curious to know what
"standalone" refers to here.
2) I was able to run the SparkPi in yarn-client mode from a simple scala
main method by providing only SPARK_JAR and SPARK_YARN_APP_JAR
environment variables and by putting the various *-site.xml files on my
classpath. That is, I didn't call run-example - just called my Scala
app directly. We've had troubles duplicating this success on our own
app and are in the process of applying the patch detailed here:
https://github.com/apache/incubator-spark/pull/371
However, one think that I think I learned is that Spark doesn't have to
be installed on the name node. Is that correct? Should I need to have
Spark installed at all either on my remote machine or on the name node?
It would be great if all that was needed were the SPARK_JAR and the
SPARK_YARN_APP_JAR.
3) Finally, is it possible to pre-stage the assembly jar files so they
don't need to be copied over every time I start a new Spark job in
yarn-client mode? Any advice here is appreciated.
Thanks!
Philip