You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andreas Fritzler <an...@gmail.com> on 2015/08/17 09:34:17 UTC

Programmatically create SparkContext on YARN

Hi all,

when runnig the Spark cluster in standalone mode I am able to create the
Spark context from Java via the following code snippet:

SparkConf conf = new SparkConf()
>    .setAppName("MySparkApp")
>    .setMaster("spark://SPARK_MASTER:7077")
>    .setJars(jars);
> JavaSparkContext sc = new JavaSparkContext(conf);


As soon as I'm done with my processing, I can just close it via

> sc.stop();
>
Now my question: Is the same also possible when running Spark on YARN? I
currently don't see how this should be possible without submitting your
application as a packaged jar file. Is there a way to get this kind of
interactivity from within your Scala/Java code?

Regards,
Andrea

Re: Programmatically create SparkContext on YARN

Posted by Andreas Fritzler <an...@gmail.com>.

Hi Andrew,

Thanks a lot for your response. I am aware of the '--master' flag in the
spark-submit command. However I would like to create the SparkContext
inside my coding.

Maybe I should elaborate a little bit further: I would like to reuse e.g.
the result of any Spark computation inside my code.

Here is the SparkPi example:

String[] jars = new String[1];
>
>   jars[0] = System.getProperty("user.dir") +
>> "/target/SparkPi-1.0-SNAPSHOT.jar";
>
>
>>   SparkConf conf = new SparkConf()
>
>   .setAppName("JavaSparkPi")
>
>   .setMaster("spark://SPARK_HOST:7077")
>
>   .setJars(jars);
>
>       JavaSparkContext sc = new JavaSparkContext(conf);
>
>
>>       int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
>
>       int n = 1000000 * slices;
>
>       List<Integer> l = new ArrayList<Integer>(n);
>
>       for (int i = 0; i < n; i++) {
>
>         l.add(i);
>
>       }
>
>
>>       JavaRDD<Integer> dataSet = sc.parallelize(l, slices);
>
>
>>       int *count* = dataSet.map(new Function<Integer, Integer>() {
>
>         @Override
>
>         public Integer call(Integer integer) {
>
>           double x = Math.random() * 2 - 1;
>
>           double y = Math.random() * 2 - 1;
>
>           return (x * x + y * y < 1) ? 1 : 0;
>
>         }
>
>       }).reduce(new Function2<Integer, Integer, Integer>() {
>
>         @Override
>
>         public Integer call(Integer integer, Integer integer2) {
>
>           return integer + integer2;
>
>         }
>
>       });
>
>       System.out.println("Pi is roughly " + 4.0 * *count* / n);
>
>
>>       sc.stop();
>
>
As you can see, I can reuse the result (count) in my coding directly.

So my goal would be to resuse this kind of implementation in YARN mode
(client/cluster mode). However, I didn't really find a solution how to do
that, since I always have to submit my Spark code via spark-submit.

What if I want to run this code as part of a web application which renders
the result as web page?

-- Andreas

On Tue, Aug 18, 2015 at 10:50 PM, Andrew Or <an...@databricks.com> wrote:

> Hi Andreas,
>
> I believe the distinction is not between standalone and YARN mode, but
> between client and cluster mode.
>
> In client mode, your Spark submit JVM runs your driver code. In cluster
> mode, one of the workers (or NodeManagers if you're using YARN) in the
> cluster runs your driver code. In the latter case, it doesn't really make
> sense to call `setMaster` in your driver because Spark needs to know which
> cluster you're submitting the application to.
>
> Instead, the recommended way is to set the master through the `--master`
> flag in the command line, e.g.
>
> $ bin/spark-submit
>     --master spark://1.2.3.4:7077
>     --class some.user.Clazz
>     --name "My app name"
>     --jars lib1.jar,lib2.jar
>     --deploy-mode cluster
>     app.jar
>
> Both YARN and standalone modes support client and cluster modes, and the
> spark-submit script is the common interface through which you can launch
> your application. In other words, you shouldn't have to do anything more
> than providing a different value to `--master` to use YARN.
>
> -Andrew
>
> 2015-08-17 0:34 GMT-07:00 Andreas Fritzler <an...@gmail.com>:
>
>> Hi all,
>>
>> when runnig the Spark cluster in standalone mode I am able to create the
>> Spark context from Java via the following code snippet:
>>
>> SparkConf conf = new SparkConf()
>>>    .setAppName("MySparkApp")
>>>    .setMaster("spark://SPARK_MASTER:7077")
>>>    .setJars(jars);
>>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>
>> As soon as I'm done with my processing, I can just close it via
>>
>>> sc.stop();
>>>
>> Now my question: Is the same also possible when running Spark on YARN? I
>> currently don't see how this should be possible without submitting your
>> application as a packaged jar file. Is there a way to get this kind of
>> interactivity from within your Scala/Java code?
>>
>> Regards,
>> Andrea
>>
>
>

Re: Programmatically create SparkContext on YARN

Posted by Andrew Or <an...@databricks.com>.

Hi Andreas,

I believe the distinction is not between standalone and YARN mode, but
between client and cluster mode.

In client mode, your Spark submit JVM runs your driver code. In cluster
mode, one of the workers (or NodeManagers if you're using YARN) in the
cluster runs your driver code. In the latter case, it doesn't really make
sense to call `setMaster` in your driver because Spark needs to know which
cluster you're submitting the application to.

Instead, the recommended way is to set the master through the `--master`
flag in the command line, e.g.

$ bin/spark-submit
    --master spark://1.2.3.4:7077
    --class some.user.Clazz
    --name "My app name"
    --jars lib1.jar,lib2.jar
    --deploy-mode cluster
    app.jar

Both YARN and standalone modes support client and cluster modes, and the
spark-submit script is the common interface through which you can launch
your application. In other words, you shouldn't have to do anything more
than providing a different value to `--master` to use YARN.

-Andrew

2015-08-17 0:34 GMT-07:00 Andreas Fritzler <an...@gmail.com>:

> Hi all,
>
> when runnig the Spark cluster in standalone mode I am able to create the
> Spark context from Java via the following code snippet:
>
> SparkConf conf = new SparkConf()
>>    .setAppName("MySparkApp")
>>    .setMaster("spark://SPARK_MASTER:7077")
>>    .setJars(jars);
>> JavaSparkContext sc = new JavaSparkContext(conf);
>
>
> As soon as I'm done with my processing, I can just close it via
>
>> sc.stop();
>>
> Now my question: Is the same also possible when running Spark on YARN? I
> currently don't see how this should be possible without submitting your
> application as a packaged jar file. Is there a way to get this kind of
> interactivity from within your Scala/Java code?
>
> Regards,
> Andrea
>