You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "test2022123 (Jira)" <ji...@apache.org> on 2022/10/03 02:39:00 UTC
[jira] [Created] (SPARK-40638) RpcOutboxMessage: Ask terminated before connecting successfully
test2022123 created SPARK-40638:
-----------------------------------
Summary: RpcOutboxMessage: Ask terminated before connecting successfully
Key: SPARK-40638
URL: https://issues.apache.org/jira/browse/SPARK-40638
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.3.0
Environment: mac 12.6
Python 3.8.13
spark-3.3.0-bin-hadoop3
docker-compose.yml:
{code:java}
version: '3'services:
spark-master:
image: docker.io/bitnami/spark:3.3
hostname: spark-master
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_LOCAL_HOSTNAME=spark-master
ports:
- '8080:8080'
- '7077:7077'
networks:
- spark-network
spark-worker-1:
image: docker.io/bitnami/spark:3.3
hostname: spark-worker-1
depends_on:
- spark-master
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=4g
- SPARK_WORKER_CORES=8
- SPARK_WORKER_PORT=6061
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_LOCAL_HOSTNAME=spark-worker-1
ports:
- '14040:4040'
- '18081:8081'
- '16061:6061'
networks:
- spark-network
spark-worker-2:
image: docker.io/bitnami/spark:3.3
hostname: spark-worker-2
depends_on:
- spark-worker-1
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=4g
- SPARK_WORKER_CORES=8
- SPARK_WORKER_PORT=6062
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_LOCAL_HOSTNAME=spark-worker-2
ports:
- '24040:4040'
- '28081:8081'
- '26062:6062'
networks:
- spark-networknetworks:
spark-network: {code}
Reporter: test2022123
{color:#FF0000}*Pyspark submit job stuck and infinitely retry.*{color}
*pyspark job running with:*
{code:java}
$ PYSPARK_PYTHON=python SPARK_HOME="/Users/mike/Tools/spark-3.3.0-bin-hadoop3" pyspark --master spark://spark-master:7077 [10:20:25]
Python 3.8.13 (default, Mar 28 2022, 06:16:26)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
22/10/03 10:23:32 WARN Utils: Your hostname, codecan.local resolves to a loopback address: 127.0.0.1; using 192.168.31.31 instead (on interface en5)
22/10/03 10:23:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/03 10:23:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.0
/_/Using Python version 3.8.13 (default, Mar 28 2022 06:16:26)
Spark context Web UI available at http://192.168.31.31:4040
Spark context available as 'sc' (master = spark://spark-master:7077, app id = app-20221003022333-0000).
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import col
>>> spark.range(0,5).select(col("id").cast("double")).agg({'id': 'sum'}).show()
22/10/03 10:24:24 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:24:39 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:24:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:25:09 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
22/10/03 10:25:24 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources {code}
*spark-defaults.conf*
{code:java}
spark.driver.port 13333
spark.executor.memory 512m
spark.executor.cores 1
spark.executor.instances 2
spark.cores.max 1
spark.shuffle.service.enabled false
spark.dynamicAllocation.enabled false {code}
h1.
*stderr log page for app-20221003022333-0000/0*
{code:java}
Spark Executor Command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx512M" "-Dspark.driver.port=13333" "-XX:+IgnoreUnrecognizedVMOptions" "--add-opens=java.base/java.lang=ALL-UNNAMED" "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" "--add-opens=java.base/java.lang.reflect=ALL-UNNAMED" "--add-opens=java.base/java.io=ALL-UNNAMED" "--add-opens=java.base/java.net=ALL-UNNAMED" "--add-opens=java.base/java.nio=ALL-UNNAMED" "--add-opens=java.base/java.util=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED" "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" "--add-opens=java.base/sun.nio.cs=ALL-UNNAMED" "--add-opens=java.base/sun.security.action=ALL-UNNAMED" "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED" "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.31.31:13333" "--executor-id" "0" "--hostname" "spark-worker-1" "--cores" "1" "--app-id" "app-20221003022333-0000" "--worker-url" "spark://Worker@spark-worker-1:6061"
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:424)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:413)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:444)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:293)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 16 more {code}
h1. *[|http://spark-worker-1:18081/]* stdout log page for app-20221003022333-0000/0
{code:java}
22/10/03 02:23:35 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 107@spark-worker-1
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for TERM
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for HUP
22/10/03 02:23:35 INFO SignalUtils: Registering signal handler for INT
22/10/03 02:23:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/03 02:23:35 INFO SecurityManager: Changing view acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls to: spark,mike
22/10/03 02:23:35 INFO SecurityManager: Changing view acls groups to:
22/10/03 02:23:35 INFO SecurityManager: Changing modify acls groups to:
22/10/03 02:23:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, mike); groups with view permissions: Set(); users with modify permissions: Set(spark, mike); groups with modify permissions: Set()
22/10/03 02:25:35 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/10/03 02:25:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /192.168.31.31:13333 timed out (120000 ms)
22/10/03 02:27:35 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/10/03 02:27:35 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /192.168.31.31:13333 timed out (120000 ms) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org