You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/12/07 01:15:00 UTC
[jira] [Resolved] (SPARK-33681) Increase K8s IT timeout to 3 minutes

     [ https://issues.apache.org/jira/browse/SPARK-33681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-33681.
----------------------------------
    Fix Version/s: 2.4.8
                   3.0.2
       Resolution: Fixed

Issue resolved by pull request 30632
[https://github.com/apache/spark/pull/30632]

> Increase K8s IT timeout to 3 minutes
> ------------------------------------
>
>                 Key: SPARK-33681
>                 URL: https://issues.apache.org/jira/browse/SPARK-33681
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Tests
>    Affects Versions: 2.4.7, 3.0.1
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>             Fix For: 3.0.2, 2.4.8
>
>
> We are using 3 minutes in master/branch-3.1. This issue only happens at branch-3.0/branch-2.4
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36905/console
> {code}
> - Run PySpark with memory customization *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 70 times over 2.018373577433333 minutes. Last failure message: "++ id -u
>   + myuid=0
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 0
>   + uidentry=root:x:0:0:root:/root:/bin/bash
>   + set -e
>   + '[' -z root:x:0:0:root:/root:/bin/bash ']'
>   + SPARK_K8S_CMD=driver-py
>   + case "$SPARK_K8S_CMD" in
>   + shift 1
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + grep SPARK_JAVA_OPT_
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' -n /opt/spark/tests/py_container_checks.py ']'
>   + PYTHONPATH='/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:/opt/spark/tests/py_container_checks.py'
>   + PYSPARK_ARGS=
>   + '[' -n 209715200 ']'
>   + PYSPARK_ARGS=209715200
>   + R_ARGS=
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + case "$SPARK_K8S_CMD" in
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/tests/worker_memory_check.py 209715200
>   20/12/07 00:09:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>   20/12/07 00:09:33 INFO SparkContext: Running Spark version 2.4.8-SNAPSHOT
>   20/12/07 00:09:33 INFO SparkContext: Submitted application: PyMemoryTest
>   20/12/07 00:09:33 INFO SecurityManager: Changing view acls to: root
>   20/12/07 00:09:33 INFO SecurityManager: Changing modify acls to: root
>   20/12/07 00:09:33 INFO SecurityManager: Changing view acls groups to: 
>   20/12/07 00:09:33 INFO SecurityManager: Changing modify acls groups to: 
>   20/12/07 00:09:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>   20/12/07 00:09:34 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
>   20/12/07 00:09:34 INFO SparkEnv: Registering MapOutputTracker
>   20/12/07 00:09:34 INFO SparkEnv: Registering BlockManagerMaster
>   20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
>   20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
>   20/12/07 00:09:34 INFO DiskBlockManager: Created local directory at /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/blockmgr-9f6bcf4d-ff41-4b27-8312-0fb23bf4ed1b
>   20/12/07 00:09:34 INFO MemoryStore: MemoryStore started with capacity 546.3 MB
>   20/12/07 00:09:34 INFO SparkEnv: Registering OutputCommitCoordinator
>   20/12/07 00:09:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
>   20/12/07 00:09:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:4040
>   20/12/07 00:09:34 INFO SparkContext: Added file file:///opt/spark/tests/worker_memory_check.py at spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/worker_memory_check.py with timestamp 1607299774831
>   20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/worker_memory_check.py to /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/worker_memory_check.py
>   20/12/07 00:09:34 INFO SparkContext: Added file file:///opt/spark/tests/py_container_checks.py at spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/py_container_checks.py with timestamp 1607299774847
>   20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/py_container_checks.py to /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/py_container_checks.py
>   20/12/07 00:09:36 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
>   20/12/07 00:09:36 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
>   20/12/07 00:09:36 INFO NettyBlockTransferService: Server created on spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079
>   20/12/07 00:09:36 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
>   20/12/07 00:09:36 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
>   20/12/07 00:09:36 INFO BlockManagerMasterEndpoint: Registering block manager spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079 with 546.3 MB RAM, BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
>   20/12/07 00:09:36 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
>   20/12/07 00:09:36 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
>   20/12/07 00:10:06 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
>   20/12/07 00:10:06 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark/work-dir/spark-warehouse').
>   20/12/07 00:10:06 INFO SharedState: Warehouse path is 'file:/opt/spark/work-dir/spark-warehouse'.
>   20/12/07 00:10:07 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
>   20/12/07 00:10:07 INFO SparkContext: Starting job: collect at /opt/spark/tests/worker_memory_check.py:43
>   20/12/07 00:10:07 INFO DAGScheduler: Got job 0 (collect at /opt/spark/tests/worker_memory_check.py:43) with 2 output partitions
>   20/12/07 00:10:07 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /opt/spark/tests/worker_memory_check.py:43)
>   20/12/07 00:10:07 INFO DAGScheduler: Parents of final stage: List()
>   20/12/07 00:10:07 INFO DAGScheduler: Missing parents: List()
>   20/12/07 00:10:07 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at collect at /opt/spark/tests/worker_memory_check.py:43), which has no missing parents
>   20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.5 KB, free 546.3 MB)
>   20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.1 KB, free 546.3 MB)
>   20/12/07 00:10:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079 (size: 3.1 KB, free: 546.3 MB)
>   20/12/07 00:10:07 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
>   20/12/07 00:10:08 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at collect at /opt/spark/tests/worker_memory_check.py:43) (first 15 tasks are for partitions Vector(0, 1))
>   20/12/07 00:10:08 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
>   20/12/07 00:10:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   20/12/07 00:10:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   20/12/07 00:10:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   20/12/07 00:11:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   20/12/07 00:11:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   20/12/07 00:11:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
>   " did not contain "PySpark Worker Memory Check is: True" The application did not complete.. (KubernetesSuite.scala:249)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org