You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "test2022123 (Jira)" <ji...@apache.org> on 2022/10/03 02:39:00 UTC

[jira] [Created] (SPARK-40638) RpcOutboxMessage: Ask terminated before connecting successfully

test2022123 created SPARK-40638:
-----------------------------------

             Summary: RpcOutboxMessage: Ask terminated before connecting successfully
                 Key: SPARK-40638
                 URL: https://issues.apache.org/jira/browse/SPARK-40638
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.0
         Environment: mac 12.6

Python 3.8.13

spark-3.3.0-bin-hadoop3

 

docker-compose.yml:
{code:java}
version: '3'services:
  spark-master:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-master
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-master  
    ports:
      - '8080:8080'
      - '7077:7077'
    networks:
      - spark-network
      
      
  spark-worker-1:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-worker-1
    depends_on: 
      - spark-master
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=4g
      - SPARK_WORKER_CORES=8
      - SPARK_WORKER_PORT=6061
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-worker-1
    ports:
      - '14040:4040'
      - '18081:8081'
      - '16061:6061'
    networks:
      - spark-network
      
      
  spark-worker-2:
    image: docker.io/bitnami/spark:3.3
    hostname: spark-worker-2
    depends_on: 
      - spark-worker-1
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=4g
      - SPARK_WORKER_CORES=8
      - SPARK_WORKER_PORT=6062
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
      - SPARK_LOCAL_HOSTNAME=spark-worker-2
    ports:
      - '24040:4040'
      - '28081:8081'    
      - '26062:6062'
    networks:
      - spark-networknetworks:
    spark-network: {code}
 

 
            Reporter: test2022123


{color:#FF0000}*Pyspark submit job stuck and infinitely retry.*{color}



*pyspark job running with:*
{code:java}
$ PYSPARK_PYTHON=python SPARK_HOME="/Users/mike/Tools/spark-3.3.0-bin-hadoop3" pyspark --master spark://spark-master:7077                                                                                                                                          [10:20:25]
Python 3.8.13 (default, Mar 28 2022, 06:16:26)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
22/10/03 10:23:32 WARN Utils: Your hostname, codecan.local resolves to a loopback address: 127.0.0.1; using 192.168.31.31 instead (on interface en5)
22/10/03 10:23:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/03 10:23:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.0
      /_/Using Python version 3.8.13 (default, Mar 28 2022 06:16:26)
Spark context Web UI available at http://192.168.31.31:4040
Spark context available as 'sc' (master = spark://spark-master:7077, app id = app-20221003022333-0000).
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import col
>>> spark.range(0,5).select(col("id").cast("double")).agg({'id': 'sum'}).show()
22/10/03 10:24:24 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:24:39 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:24:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:25:09 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:25:24 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources {code}
*spark-defaults.conf*
{code:java}
spark.driver.port 13333
spark.executor.memory 512m
spark.executor.cores 1
spark.executor.instances 2
spark.cores.max 1
spark.shuffle.service.enabled false
spark.dynamicAllocation.enabled false {code}
 

 

 

 
h1. 
*stderr log page for app-20221003022333-0000/0*
{code:java}
Spark Executor Command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx512M" "-Dspark.driver.port=13333" "-XX:+IgnoreUnrecognizedVMOptions" "--add-opens=java.base/java.lang=ALL-UNNAMED" "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" "--add-opens=java.base/java.lang.reflect=ALL-UNNAMED" "--add-opens=java.base/java.io=ALL-UNNAMED" "--add-opens=java.base/java.net=ALL-UNNAMED" "--add-opens=java.base/java.nio=ALL-UNNAMED" "--add-opens=java.base/java.util=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED" "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" "--add-opens=java.base/sun.nio.cs=ALL-UNNAMED" "--add-opens=java.base/sun.security.action=ALL-UNNAMED" "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED" "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.31.31:13333" "--executor-id" "0" "--hostname" "spark-worker-1" "--cores" "1" "--app-id" "app-20221003022333-0000" "--worker-url" "spark://Worker@spark-worker-1:6061"
========================================

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:424)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:413)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:444)
	at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
	at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:293)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	... 16 more {code}
 

 
h1.  *[|http://spark-worker-1:18081/]* stdout log page for app-20221003022333-0000/0

 
{code:java}
22/10/03 02:23:35 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 107@spark-worker-1
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for TERM
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for HUP
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for INT
22/10/03 02:23:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/03 02:23:35 INFO SecurityManager: Changing view acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing view acls groups to: 
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls groups to: 
22/10/03 02:23:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark, mike); groups with view permissions: Set(); users  with modify permissions: Set(spark, mike); groups with modify permissions: Set()
22/10/03 02:25:35 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/10/03 02:25:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /192.168.31.31:13333 timed out (120000 ms)
22/10/03 02:27:35 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/10/03 02:27:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /192.168.31.31:13333 timed out (120000 ms) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org