You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yeikel Santana <em...@yeikel.com> on 2018/04/01 06:45:11 UTC
What is the purpose of CoarseGrainedScheduler and how can I disable it?
Hi ,
This is probably not a spark issue, and more a configuration that I am
missing. Any help would be appreciated.
I am running Spark from a docker template with the following configuration:
version: '2'
services:
master:
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.master.Master -h master
hostname: master
environment:
MASTER: spark://master:7077
SPARK_CONF_DIR: /conf
SPARK_PUBLIC_DNS: localhost
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7006
- 7077
- 6066
ports:
- 4040:4040
- 6066:6066
- 7077:7077
- 8080:8080
worker:
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.worker.Worker
spark://master:7077
hostname: worker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 1g
SPARK_WORKER_PORT: 8881
SPARK_WORKER_WEBUI_PORT: 8081
SPARK_PUBLIC_DNS: localhost
links:
- master
expose:
- 7012
- 7013
- 7014
- 7015
- 7016
- 8881
ports:
- 8081:8081
And I have the following simple Java program:
SparkConf conf = new
SparkConf().setMaster("spark://localhost:7077").setAppName("Word Count
Sample App");
conf.set("spark.dynamicAllocation.enabled","false");
String file = "test.txt";
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> textFile = sc.textFile("src/main/resources/" + file);
JavaPairRDD<String, Integer> counts = textFile.flatMap(s ->
Arrays.asList(s.split("[ ,]")).iterator()).mapToPair(word -> new
Tuple2<>(word, 1)).reduceByKey((a, b) -> a + b);counts.foreach(p ->
System.out.println(p));
System.out.println("Total words: " + counts.count());
counts.saveAsTextFile(file + "out.txt");
The problem that I am having is that at runtime , Java is calling the
following command
Spark Executor Command: "/usr/jdk1.8.0_131/bin/java" "-cp"
"/conf:/usr/spark-2.3.0/jars/*:/usr/hadoop-2.8.3/etc/hadoop/:/usr/hadoop-2.8
.3/etc/hadoop/*:/usr/hadoop-2.8.3/share/hadoop/common/lib/*:/usr/hadoop-2.8.
3/share/hadoop/common/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/*:/usr/hadoop-2.
8.3/share/hadoop/hdfs/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/lib/*:/usr/h
adoop-2.8.3/share/hadoop/yarn/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/lib
/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/*:/usr/hadoop-2.8.3/share/hadoop
/tools/lib/*" "-Xmx1024M" "-Dspark.driver.port=59906"
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
"spark://CoarseGrainedScheduler@yeikel-pc:59906" "--executor-id" "6"
"--hostname" "172.19.0.3" "--cores" "2" "--app-id" "app-20180401005243-0000"
"--worker-url" "spark://Worker@172.19.0.3:8881"
Which results in
Caused by: java.io.IOException: Failed to connect to yeikel-pc:59906
at
org.apache.spark.network.client.TransportClientFactory.createClient(Transpor
tClientFactory.java:245)
at
org.apache.spark.network.client.TransportClientFactory.createClient(Transpor
tClientFactory.java:187)
at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
17)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: yeikel-pc
Can I overwrite the "--driver-url" from java? OR how can I disable
CoarseGrainedScheduler?
I tried to set spark.dynamicAllocation.enabled to false but that did not
work.