You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/08/06 08:44:04 UTC
[jira] [Resolved] (SPARK-9515) Creating JavaSparkContext with yarn-cluster mode throws NPE

     [ https://issues.apache.org/jira/browse/SPARK-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-9515.
------------------------------
    Resolution: Not A Problem

[~tenstriker] I gave a pretty clear reason why this is considered a problem in your app and not Spark. You're merely arguing that you don't want to use spark-submit for your own reasons, but this does not make it a Spark issue. Please don't reopen issues unless you argue that the question is different than the one already answered. I am going to close this.

> Creating JavaSparkContext with yarn-cluster mode throws NPE
> -----------------------------------------------------------
>
>                 Key: SPARK-9515
>                 URL: https://issues.apache.org/jira/browse/SPARK-9515
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.3.1
>            Reporter: nirav patel
>
> I have spark application that runs agains YARN cluster. I run spark application as part of my web application. I can't use spark-submit script. Way I run it is `java -cp myApp.jar com.myapp.Application` which in turn initiate JavaSparkContext. It used to work with spark 1.0.2 and standalone cluster but now with 1.3.1 and yarn its failing.
> Caused by: java.lang.NullPointerException
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
> 	at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
> 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
> EDIT:
> I got it working with yarn-client mode however I want to test it out with yarn-cluster mode as well.
> Application design is, we create singleton SparkContext object and preload few RDDs in memory when our spring-boot application(tomcat container) starts. That allows us to submit subsequent spark jobs without overhead of creating new sparkContext and RDDs. It performs excellent for our SLA. We are serving real-time GLM in ms with that. I hope this is a reason enough why we can't use spark-submit script to submit a job.
> Code is pretty simple. This is how we create sparkContext
> SparkConf conf = new SparkConf().setAppName(appName.toString()).setMaster("yarn-client");
> conf.set("spark.eventLog.enabled", "true");
> conf.set("spark.executor.extraClassPath", "/opt/mapr/hbase/hbase-0.98.12/lib/*");
> conf.set("spark.cores.max", sparkCoreMax);
> conf.set("spark.executor.memory", sparkExecMem);
> conf.set("spark.executor.extraJavaOptions", executorJavaOPts);
> conf.set("spark.akka.threads", sparkDriverThreads);
> JavaSparkContext sparkContext = new JavaSparkContext(conf);
> This is how we actually run sprig-boot app.
> java -Dloader.path=myspringbootapp.jar,/spark/spark-1.3.1/lib,/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn -XX:PermSize=512m -XX:MaxPermSize=512m -Xms1024m -jar myspringbootapp.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org