You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Florian Kaspar <fl...@onelogic.de> on 2015/10/14 19:01:54 UTC

Programmatically connect to remote YARN in yarn-client mode

Hey everyone,

we are working on a project running on Spark. Currently we connect to a 
remote Spark-Cluster in Standalone mode to obtain the SparkContext using

new JavaSparkContext(new SparkConf().setAppName("<AppName>").setMaster("spark://<remoteClusterAddress>:7077"));

Currently, we try to connect to a remote (!) YARN cluster instead. This 
should also happen programmatically. We use the Spark context for the 
whole lifetime of a Web-Application.
spark-submit is essentially no good option for us because we want to 
have a local Spark Driver connecting to a remote cluster with the driver 
interacting with other locally deployed modules.
Can anyone tell me how to create a Spark context programmatically 
connecting to a remote YARN cluster?
Tutorials online seem to have the precondition that the current 
application runs inside the cluster but not remotely.
Thank you in advance for your support!

Kind regards
Florian

-- 
Florian Kaspar

ONE LOGIC

Dr. Hans-Kapfinger-Str. 3, DE 94032 Passau
T +49 851 22590 25
florian.kaspar@onelogic.de
www.onelogic.de


ONE LOGIC GmbH, HRB 7780 Amtsgericht Passau
Geschäftsführung Andreas Böhm, Prof. Dr. Andreas Pfeifer

Re: Programmatically connect to remote YARN in yarn-client mode

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Wed, Oct 14, 2015 at 10:29 AM, Florian Kaspar <florian.kaspar@onelogic.de
> wrote:

> so it is possible to simply copy the YARN configuration from the remote
> cluster to the local machine (assuming, the local machine can resolve the
> YARN host etc.) and just letting Spark do the rest?
>

Yes, that should be all.

-- 
Marcelo

Re: Programmatically connect to remote YARN in yarn-client mode

Posted by Florian Kaspar <fl...@onelogic.de>.

Thank you, Marcelo,

so it is possible to simply copy the YARN configuration from the remote 
cluster to the local machine (assuming, the local machine can resolve 
the YARN host etc.) and just letting Spark do the rest?
This would actually be great!
Our "local" machine is just another virtual machine running in the same 
environment, connected to the cluster via a virtual network.

Cheers
Florian

Am 14.10.2015 um 19:13 schrieb Marcelo Vanzin:
> On Wed, Oct 14, 2015 at 10:01 AM, Florian Kaspar
> <fl...@onelogic.de> wrote:
>> we are working on a project running on Spark. Currently we connect to a remote Spark-Cluster in Standalone mode to obtain the SparkContext using
>>
>> new JavaSparkContext(new SparkConf().setAppName("<AppName>").setMaster("spark://<remoteClusterAddress>:7077"));
>> Can anyone tell me how to create a Spark context programmatically connecting to a remote YARN cluster?
> You should be able to replace the standalone URL with "yarn-client",
> and it should work, assuming you have the HADOOP_CONF_DIR (or
> YARN_CONF_DIR) env variable pointing at a valid YARN configuration.
>
> Note that if the machine running this code is far from the cluster
> performance might not be that great.
>

-- 
Florian Kaspar

ONE LOGIC

Dr. Hans-Kapfinger-Str. 3, DE 94032 Passau
T +49 851 22590 25
florian.kaspar@onelogic.de
www.onelogic.de

ONE LOGIC GmbH, HRB 7780 Amtsgericht Passau
Geschäftsführung Andreas Böhm, Prof. Dr. Andreas Pfeifer

Re: Programmatically connect to remote YARN in yarn-client mode

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Wed, Oct 14, 2015 at 10:01 AM, Florian Kaspar
<fl...@onelogic.de> wrote:
> we are working on a project running on Spark. Currently we connect to a remote Spark-Cluster in Standalone mode to obtain the SparkContext using
>
> new JavaSparkContext(new SparkConf().setAppName("<AppName>").setMaster("spark://<remoteClusterAddress>:7077"));

> Can anyone tell me how to create a Spark context programmatically connecting to a remote YARN cluster?

You should be able to replace the standalone URL with "yarn-client",
and it should work, assuming you have the HADOOP_CONF_DIR (or
YARN_CONF_DIR) env variable pointing at a valid YARN configuration.

Note that if the machine running this code is far from the cluster
performance might not be that great.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org