You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/12/07 00:34:00 UTC

[jira] [Created] (SPARK-33681) Increase K8s IT timeout

Dongjoon Hyun created SPARK-33681:
-------------------------------------

             Summary: Increase K8s IT timeout
                 Key: SPARK-33681
                 URL: https://issues.apache.org/jira/browse/SPARK-33681
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes, Tests
    Affects Versions: 2.4.7
            Reporter: Dongjoon Hyun


- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36905/console
{code}
- Run PySpark with memory customization *** FAILED ***
  The code passed to eventually never returned normally. Attempted 70 times over 2.018373577433333 minutes. Last failure message: "++ id -u
  + myuid=0
  ++ id -g
  + mygid=0
  + set +e
  ++ getent passwd 0
  + uidentry=root:x:0:0:root:/root:/bin/bash
  + set -e
  + '[' -z root:x:0:0:root:/root:/bin/bash ']'
  + SPARK_K8S_CMD=driver-py
  + case "$SPARK_K8S_CMD" in
  + shift 1
  + SPARK_CLASSPATH=':/opt/spark/jars/*'
  + env
  + sort -t_ -k4 -n
  + sed 's/[^=]*=\(.*\)/\1/g'
  + grep SPARK_JAVA_OPT_
  + readarray -t SPARK_EXECUTOR_JAVA_OPTS
  + '[' -n '' ']'
  + '[' -n /opt/spark/tests/py_container_checks.py ']'
  + PYTHONPATH='/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:/opt/spark/tests/py_container_checks.py'
  + PYSPARK_ARGS=
  + '[' -n 209715200 ']'
  + PYSPARK_ARGS=209715200
  + R_ARGS=
  + '[' -n '' ']'
  + '[' 3 == 2 ']'
  + '[' 3 == 3 ']'
  ++ python3 -V
  + pyv3='Python 3.7.3'
  + export PYTHON_VERSION=3.7.3
  + PYTHON_VERSION=3.7.3
  + export PYSPARK_PYTHON=python3
  + PYSPARK_PYTHON=python3
  + export PYSPARK_DRIVER_PYTHON=python3
  + PYSPARK_DRIVER_PYTHON=python3
  + '[' -n '' ']'
  + '[' -z ']'
  + case "$SPARK_K8S_CMD" in
  + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
  + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/tests/worker_memory_check.py 209715200
  20/12/07 00:09:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  20/12/07 00:09:33 INFO SparkContext: Running Spark version 2.4.8-SNAPSHOT
  20/12/07 00:09:33 INFO SparkContext: Submitted application: PyMemoryTest
  20/12/07 00:09:33 INFO SecurityManager: Changing view acls to: root
  20/12/07 00:09:33 INFO SecurityManager: Changing modify acls to: root
  20/12/07 00:09:33 INFO SecurityManager: Changing view acls groups to: 
  20/12/07 00:09:33 INFO SecurityManager: Changing modify acls groups to: 
  20/12/07 00:09:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
  20/12/07 00:09:34 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
  20/12/07 00:09:34 INFO SparkEnv: Registering MapOutputTracker
  20/12/07 00:09:34 INFO SparkEnv: Registering BlockManagerMaster
  20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  20/12/07 00:09:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
  20/12/07 00:09:34 INFO DiskBlockManager: Created local directory at /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/blockmgr-9f6bcf4d-ff41-4b27-8312-0fb23bf4ed1b
  20/12/07 00:09:34 INFO MemoryStore: MemoryStore started with capacity 546.3 MB
  20/12/07 00:09:34 INFO SparkEnv: Registering OutputCommitCoordinator
  20/12/07 00:09:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
  20/12/07 00:09:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:4040
  20/12/07 00:09:34 INFO SparkContext: Added file file:///opt/spark/tests/worker_memory_check.py at spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/worker_memory_check.py with timestamp 1607299774831
  20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/worker_memory_check.py to /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/worker_memory_check.py
  20/12/07 00:09:34 INFO SparkContext: Added file file:///opt/spark/tests/py_container_checks.py at spark://spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7078/files/py_container_checks.py with timestamp 1607299774847
  20/12/07 00:09:34 INFO Utils: Copying /opt/spark/tests/py_container_checks.py to /var/data/spark-9950d0f1-8753-441f-97cf-1aa6defd1d0e/spark-8ae1ff4f-2989-43d3-adfe-f26e8ff71ed2/userFiles-cfe3880e-6803-4809-9c01-6f1f582e4481/py_container_checks.py
  20/12/07 00:09:36 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
  20/12/07 00:09:36 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
  20/12/07 00:09:36 INFO NettyBlockTransferService: Server created on spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079
  20/12/07 00:09:36 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
  20/12/07 00:09:36 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
  20/12/07 00:09:36 INFO BlockManagerMasterEndpoint: Registering block manager spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079 with 546.3 MB RAM, BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
  20/12/07 00:09:36 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
  20/12/07 00:09:36 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc, 7079, None)
  20/12/07 00:10:06 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
  20/12/07 00:10:06 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark/work-dir/spark-warehouse').
  20/12/07 00:10:06 INFO SharedState: Warehouse path is 'file:/opt/spark/work-dir/spark-warehouse'.
  20/12/07 00:10:07 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
  20/12/07 00:10:07 INFO SparkContext: Starting job: collect at /opt/spark/tests/worker_memory_check.py:43
  20/12/07 00:10:07 INFO DAGScheduler: Got job 0 (collect at /opt/spark/tests/worker_memory_check.py:43) with 2 output partitions
  20/12/07 00:10:07 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /opt/spark/tests/worker_memory_check.py:43)
  20/12/07 00:10:07 INFO DAGScheduler: Parents of final stage: List()
  20/12/07 00:10:07 INFO DAGScheduler: Missing parents: List()
  20/12/07 00:10:07 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at collect at /opt/spark/tests/worker_memory_check.py:43), which has no missing parents
  20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.5 KB, free 546.3 MB)
  20/12/07 00:10:07 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.1 KB, free 546.3 MB)
  20/12/07 00:10:07 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-test-app-1607299769587-driver-svc.7c85102d1c1d4c8fb5a453963ab5535a.svc:7079 (size: 3.1 KB, free: 546.3 MB)
  20/12/07 00:10:07 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
  20/12/07 00:10:08 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at collect at /opt/spark/tests/worker_memory_check.py:43) (first 15 tasks are for partitions Vector(0, 1))
  20/12/07 00:10:08 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
  20/12/07 00:10:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  20/12/07 00:10:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  20/12/07 00:10:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  20/12/07 00:11:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  20/12/07 00:11:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  20/12/07 00:11:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
  " did not contain "PySpark Worker Memory Check is: True" The application did not complete.. (KubernetesSuite.scala:249)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org