You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (Jira)" <ji...@apache.org> on 2021/03/23 08:42:00 UTC

[jira] [Commented] (SPARK-34684) Hadoop config could not be successfully serilized from driver pods to executor pods

    [ https://issues.apache.org/jira/browse/SPARK-34684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306895#comment-17306895 ] 

Attila Zsolt Piros commented on SPARK-34684:
--------------------------------------------

> spark pi example job keeps failing where executor do not know how to talk to hdfs

In your example the executor does not need to be able to access HDFS but the driver should.

Have you tried to specify the hdfs url with hostname? hdfs:///tmp/spark-examples_2.12-3.0.125067.jar => hdfs://<hostname>/tmp/spark-examples_2.12-3.0.125067.jar?

Have you tried to create a POD from a simple linux image with hadoop client tools and access HDFS from command line?



> Hadoop config could not be successfully serilized from driver pods to executor pods
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-34684
>                 URL: https://issues.apache.org/jira/browse/SPARK-34684
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1, 3.0.2
>            Reporter: Yue Peng
>            Priority: Major
>
> I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs have been stored into a configmap and mounted to driver. However, spark pi example job keeps failing where executor do not know how to talk to hdfs. I highly suspect that there is a bug causing it, as I manually create a configmap storing hadoop configs and mounted it to executor in template file, which could fix the error. 
>  
> Spark submit command:
> /opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master k8s://https://10.***.18.96:6443 --num-executors 1 --conf spark.kubernetes.namespace=test --conf spark.kubernetes.container.image=**** --conf spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template --conf spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template  --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000
>  
>  
> Error log:
>  
> 21/03/10 06:59:58 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 608 ms (392 ms spent in bootstraps)
> 21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root
> 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root
> 21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to:
> 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to:
> 21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
> 21/03/10 06:59:59 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 130 ms (104 ms spent in bootstraps)
> 21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d
> 21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 MiB
> 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078
> 21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
> 21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor:
> 21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
> 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
> 21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192
> 21/03/10 07:00:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956.
> 21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on 100.64.0.192:37956
> 21/03/10 07:00:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
> 21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1
> 21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 21/03/10 07:00:01 INFO Executor: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587432
> 21/03/10 07:00:01 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 65 ms (58 ms spent in bootstraps)
> 21/03/10 07:00:01 INFO Utils: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar to /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp
> 21/03/10 07:00:01 INFO Utils: Copying /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar
> 21/03/10 07:00:01 INFO Executor: Adding file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader
> 21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
> java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
>  21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2
> 21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> 21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3
> 21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
> java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> 21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4
> 21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3)
> java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org