You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Edwin Biemond (JIRA)" <ji...@apache.org> on 2019/06/03 10:55:00 UTC

[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes

    [ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854470#comment-16854470 ] 

Edwin Biemond commented on SPARK-27927:
---------------------------------------

the 2.4.3 output 
{noformat}

root 1 0 0 09:03 ? 00:00:00 /usr/local/bin/tini -s -- /opt/run.sh /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.38.18 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner oci://code-assets@paasdevsss/pyspark_min.py

root 17 1 0 09:03 ? 00:00:00 /bin/bash /opt/run.sh /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.38.18 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner oci://code-assets@paasdevsss/pyspark_min.py

root 20 17 2 09:03 ? 00:00:30 /usr/local/sparta-server-jre/jdk1.8.0_162/bin/java -cp local:///livy/jars/kryo-2.22.jar:/opt/spark/conf/:/opt/spark/jars/*:/etc/hadoop/conf/ -Xmx15G -Dlog4j.configuration=file:///etc/spark/conf/log4j.properties org.apache.spark.deploy.SparkSubmit --deploy-mode client --conf spark.driver.bindAddress=10.244.38.18 --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner oci://code-assets@paasdevsss/pyspark_min.py


bash-4.2# cat stdout.log
Our Spark version is 2.4.3
Spark context information: <SparkContext master=k8s://https://kubernetes.default.svc:443 appName=hello_world> parallelism=2 python version=3.6


bash-4.2# cat stderr.log
19/06/03 09:33:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/03 09:33:20 INFO SparkContext: Running Spark version 2.4.3
19/06/03 09:33:20 INFO SparkContext: Submitted application: hello_world
19/06/03 09:33:20 INFO SecurityManager: Changing view acls to: root
19/06/03 09:33:20 INFO SecurityManager: Changing modify acls to: root
19/06/03 09:33:20 INFO SecurityManager: Changing view acls groups to:
19/06/03 09:33:20 INFO SecurityManager: Changing modify acls groups to:
19/06/03 09:33:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/06/03 09:33:20 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
19/06/03 09:33:20 INFO SparkEnv: Registering MapOutputTracker
19/06/03 09:33:20 INFO SparkEnv: Registering BlockManagerMaster
19/06/03 09:33:20 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/06/03 09:33:20 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/06/03 09:33:20 INFO DiskBlockManager: Created local directory at /var/data/spark-799160e5-a3a5-4df5-ba9e-a35664ba7d8f/blockmgr-caceaedb-3edc-4dc9-8792-f871f3328f27
19/06/03 09:33:20 INFO MemoryStore: MemoryStore started with capacity 7.8 GB
19/06/03 09:33:20 INFO SparkEnv: Registering OutputCommitCoordinator
19/06/03 09:33:21 INFO log: Logging initialized @9307ms
19/06/03 09:33:21 INFO Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2017-11-21T21:27:37Z, git hash: 82b8fb23f757335bb3329d540ce37a2a2615f0a8
19/06/03 09:33:21 INFO Server: Started @9392ms
19/06/03 09:33:21 INFO AbstractConnector: Started ServerConnector@58e9e261{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/06/03 09:33:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@697da11e{/jobs,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@717b2b90{/jobs/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@c969346{/jobs/job,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@37e8a0b1{/jobs/job/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1f672a77{/stages,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@27d3967e{/stages/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@a1cfb17{/stages/stage,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@65ca8be2{/stages/stage/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@8e3000f{/stages/pool,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69ff3b89{/stages/pool/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3b0dd0d8{/storage,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@aa9c9c0{/storage/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1ada43e2{/storage/rdd,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@ae91043{/storage/rdd/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5fb78ad6{/environment,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1b94bf29{/environment/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@53992aea{/executors,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@11b053d2{/executors/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@72329a08{/executors/threadDump,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4ce11e90{/executors/threadDump/json,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5635a39c{/static,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@23ebdc35{/,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4a320d90{/api,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2687d6b{/jobs/job/kill,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@35c4ddc5{/stages/stage/kill,null,AVAILABLE,@Spark}
19/06/03 09:33:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc:4040
19/06/03 09:33:21 INFO SparkContext: Added file oci://code-assets@paasdevsss/pyspark_min.py at oci://code-assets@paasdevsss/pyspark_min.py with timestamp 1559554401257
19/06/03 09:33:21 INFO Utils: Fetching oci://code-assets@paasdevsss/pyspark_min.py to /var/data/spark-799160e5-a3a5-4df5-ba9e-a35664ba7d8f/spark-66e2cb9e-8f42-476e-b2c5-05665a25bf5c/userFiles-33e3d7c2-8b35-4963-98fc-4b55e9cea2b1/fetchFileTemp3752927275413095055.tmp
19/06/03 09:33:22 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
19/06/03 09:33:22 INFO Version: HV000001: Hibernate Validator 5.2.4.Final
19/06/03 09:33:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
19/06/03 09:33:22 INFO NettyBlockTransferService: Server created on spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc:7079
19/06/03 09:33:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/06/03 09:33:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc, 7079, None)
19/06/03 09:33:22 INFO BlockManagerMasterEndpoint: Registering block manager spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc:7079 with 7.8 GB RAM, BlockManagerId(driver, spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc, 7079, None)
19/06/03 09:33:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc, 7079, None)
19/06/03 09:33:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc.24f2k7cztfza.svc, 7079, None)
19/06/03 09:33:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@c60c92{/metrics/json,null,AVAILABLE,@Spark}
19/06/03 09:33:22 INFO SparkContext: Registered listener oracle.dfcs.spark.listener.JobListener
19/06/03 09:33:22 INFO JobListener: Thread 64 called onApplicationStart...
19/06/03 09:33:22 INFO SparkUIIngressServiceBuilder: Intialize SparkUIIngressService using SparkConf...
19/06/03 09:33:22 INFO SparkUIIngressServiceBuilder: masterURL - https://kubernetes.default.svc:443, nameSpace - 24f2k7cztfza, backendServiceName - spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc, ingressServiceName - spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-ingress, runId - c4c30022-a677-46d0-b993-33887b44f7e1
19/06/03 09:33:22 INFO SparkUIIngressServiceBuilder: Building SparkUIIngressService...
19/06/03 09:33:22 INFO SparkUIIngressServiceBuilder: ---
apiVersion: "extensions/v1beta1"
kind: "Ingress"
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: "/"
nginx.ingress.kubernetes.io/configuration-snippet: "rewrite /sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/(.*)$\
\ /$1 break;\nproxy_set_header Accept-Encoding \"\";\nsub_filter_types text/html\
\ application/javascript;\nsub_filter \"/static/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/static/\"\
;\nsub_filter \"/jobs/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/jobs/\"\
;\nsub_filter \"/stages/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/stages/\"\
;\nsub_filter \"/storage/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/storage/\"\
;\nsub_filter \"/environment/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/environment/\"\
;\nsub_filter \"/executors/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/executors/\"\
;\nsub_filter \"/streaming/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/streaming/\"\
;\nsub_filter \"/SQL/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/SQL/\"\
;\nsub_filter \"/api/\" \"/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/api/\"\
;\nsub_filter \"</head>\" \"<script src='https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/3.6.5/iframeResizer.contentWindow.js'></script></head>\"\
;\nsub_filter_once off;\n"
nginx.ingress.kubernetes.io/proxy-redirect-from: "http://$host/"
nginx.ingress.kubernetes.io/proxy-redirect-to: "$scheme://$host/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1/"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
labels:
app: "spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-ingress"
name: "spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-ingress"
namespace: "24f2k7cztfza"
spec:
rules:
- http:
paths:
- backend:
serviceName: "spark-c4c30022a67746d0b99333887b44f7e1-1559554387686-driver-svc"
servicePort: 4040
path: "/sparkui/c4c30022-a677-46d0-b993-33887b44f7e1"

19/06/03 09:33:22 WARN VersionUsageUtils: The client is using resource type 'ingresses' with unstable version 'v1beta1'
19/06/03 09:33:23 INFO SparkUIIngressServiceBuilder: Creating Ingress Service.
19/06/03 09:33:23 INFO SparkUIIngressServiceBuilder: Created Ingress Service.
19/06/03 09:33:52 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
19/06/03 09:33:52 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse').
19/06/03 09:33:52 INFO SharedState: Warehouse path is 'file:/spark-warehouse'.
19/06/03 09:33:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7d0fde73{/SQL,null,AVAILABLE,@Spark}
19/06/03 09:33:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1d2515b7{/SQL/json,null,AVAILABLE,@Spark}
19/06/03 09:33:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@621b8047{/SQL/execution,null,AVAILABLE,@Spark}
19/06/03 09:33:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5801cf32{/SQL/execution/json,null,AVAILABLE,@Spark}
19/06/03 09:33:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6c86149c{/static/sql,null,AVAILABLE,@Spark}
19/06/03 09:33:53 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/06/03 09:34:54 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.244.43.3:46308) with ID 1
19/06/03 09:34:55 INFO BlockManagerMasterEndpoint: Registering block manager 10.244.43.3:60844 with 8.4 GB RAM, BlockManagerId(1, 10.244.43.3, 60844, None)
{noformat}

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> -----------------------------------------------------------
>
>                 Key: SPARK-27927
>                 URL: https://issues.apache.org/jira/browse/SPARK-27927
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 2.4.3
>         Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>            Reporter: Edwin Biemond
>            Priority: Major
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information: <SparkContext master=k8s://https://kubernetes.default.svc:443 appName=hello_world> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org