You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 肥肥 <19...@qq.com> on 2014/04/23 12:05:26 UTC
SparkException: env SPARK_YARN_APP_JAR is not set
I have a small program, which I can launch successfully by yarn client with yarn-standalon mode.
the command look like this:
(javac javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest.java)
(jar cvf loadtest.jar LoadTest.class)
SPARK_JAR=assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar ./bin/spark-class org.apache.spark.deploy.yarn.Client --jar /opt/mytest/loadtest.jar --class LoadTest --args yarn-standalone --num-workers 2 --master-memory 2g --worker-memory 2g --worker-cores 1
the program LoadTest.java:
public class LoadTest {
static final String USER = "root";
public static void main(String[] args) {
System.setProperty("user.name", USER);
System.setProperty("HADOOP_USER_NAME", USER);
System.setProperty("spark.executor.memory", "7g");
JavaSparkContext sc = new JavaSparkContext(args[0], "LoadTest", System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(LoadTest.class));
String file = "file:/opt/mytest/123.data";
JavaRDD<String> data1 = sc.textFile(file, 2);
long c1=data1.count();
System.out.println("1============"+c1);
}
}
BUT due to my other pragram's need, I must have it run with command of "java". So I add “environment” parameter to JavaSparkContext(). Followed is The ERROR I get:
Exception in thread "main" org.apache.spark.SparkException: env SPARK_YARN_APP_JAR is not set
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:125)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:200)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:93)
at LoadTest.main(LoadTest.java:37)
the program LoadTest.java:
public class LoadTest {
static final String USER = "root";
public static void main(String[] args) {
System.setProperty("user.name", USER);
System.setProperty("HADOOP_USER_NAME", USER);
System.setProperty("spark.executor.memory", "7g");
Map<String, String> env = new HashMap<String, String>();
env.put("SPARK_YARN_APP_JAR", "file:/opt/mytest/loadtest.jar");
env.put("SPARK_WORKER_INSTANCES", "2" );
env.put("SPARK_WORKER_CORES", "1");
env.put("SPARK_WORKER_MEMORY", "2G");
env.put("SPARK_MASTER_MEMORY", "2G");
env.put("SPARK_YARN_APP_NAME", "LoadTest");
env.put("SPARK_YARN_DIST_ARCHIVES", "file:/opt/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar");
JavaSparkContext sc = new JavaSparkContext("yarn-client", "LoadTest", System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(LoadTest.class), env);
String file = "file:/opt/mytest/123.dna";
JavaRDD<String> data1 = sc.textFile(file, 2);//.cache();
long c1=data1.count();
System.out.println("1============"+c1);
}
}
the command:
javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest.java
jar cvf loadtest.jar LoadTest.class
nohup java -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest >> loadTest.log 2>&1 &
What did I miss?? Or I did it in wrong way??
Re: SparkException: env SPARK_YARN_APP_JAR is not set
Posted by phoenix bai <mi...@gmail.com>.
according to the code, SPARK_YARN_APP_JAR is retrieved from system
variables.
and the key-value pairs you pass through to JavaSparkContext is isolated
from system variables.
so, you maybe should try setting it through System.setProperty().
thanks
On Wed, Apr 23, 2014 at 6:05 PM, 肥肥 <19...@qq.com> wrote:
> I have a small program, which I can launch successfully by yarn client
> with yarn-standalon mode.
>
> the command look like this:
> (javac javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar
> LoadTest.java)
> (jar cvf loadtest.jar LoadTest.class)
> SPARK_JAR=assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar
> ./bin/spark-class org.apache.spark.deploy.yarn.Client --jar
> /opt/mytest/loadtest.jar --class LoadTest --args yarn-standalone
> --num-workers 2 --master-memory 2g --worker-memory 2g --worker-cores 1
>
> the program LoadTest.java:
> public class LoadTest {
> static final String USER = "root";
> public static void main(String[] args) {
> System.setProperty("user.name", USER);
> System.setProperty("HADOOP_USER_NAME", USER);
> System.setProperty("spark.executor.memory", "7g");
> JavaSparkContext sc = new JavaSparkContext(args[0], "LoadTest",
> System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(LoadTest.class));
> String file = "file:/opt/mytest/123.data";
> JavaRDD<String> data1 = sc.textFile(file, 2);
> long c1=data1.count();
> System.out.println("1============"+c1);
> }
> }
>
> BUT due to my other pragram's need, I must have it run with command of
> "java". So I add “environment” parameter to JavaSparkContext(). Followed is
> The ERROR I get:
> Exception in thread "main" org.apache.spark.SparkException: env
> SPARK_YARN_APP_JAR is not set
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:49)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:125)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:200)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
> at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:93)
> at LoadTest.main(LoadTest.java:37)
>
> the program LoadTest.java:
> public class LoadTest {
>
> static final String USER = "root";
> public static void main(String[] args) {
> System.setProperty("user.name", USER);
> System.setProperty("HADOOP_USER_NAME", USER);
> System.setProperty("spark.executor.memory", "7g");
>
> Map<String, String> env = new HashMap<String, String>();
> env.put("SPARK_YARN_APP_JAR", "file:/opt/mytest/loadtest.jar");
> env.put("SPARK_WORKER_INSTANCES", "2" );
> env.put("SPARK_WORKER_CORES", "1");
> env.put("SPARK_WORKER_MEMORY", "2G");
> env.put("SPARK_MASTER_MEMORY", "2G");
> env.put("SPARK_YARN_APP_NAME", "LoadTest");
> env.put("SPARK_YARN_DIST_ARCHIVES",
> "file:/opt/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar");
> JavaSparkContext sc = new JavaSparkContext("yarn-client",
> "LoadTest", System.getenv("SPARK_HOME"),
> JavaSparkContext.jarOfClass(LoadTest.class), env);
> String file = "file:/opt/mytest/123.dna";
> JavaRDD<String> data1 = sc.textFile(file, 2);//.cache();
>
> long c1=data1.count();
> System.out.println("1============"+c1);
> }
> }
>
> the command:
> javac -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest.java
> jar cvf loadtest.jar LoadTest.class
> nohup java -classpath .:jars/spark-assembly-0.9.1-hadoop2.2.0.jar LoadTest
> >> loadTest.log 2>&1 &
>
> What did I miss?? Or I did it in wrong way??
>