You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "neenu (JIRA)" <ji...@apache.org> on 2018/10/16 11:13:00 UTC
[jira] [Created] (SPARK-25743) New executors are not launched for
kubernetes spark thrift on deleting existing executors
neenu created SPARK-25743:
-----------------------------
Summary: New executors are not launched for kubernetes spark thrift on deleting existing executors
Key: SPARK-25743
URL: https://issues.apache.org/jira/browse/SPARK-25743
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 2.2.0
Environment: Physical lab configurations.
8 baremetal servers,
Each 56 Cores, 384GB RAM, RHEL 7.4
Kernel : 3.10.0-862.9.1.el7.x86_64
redhat-release-server.x86_64 7.4-18.el7
Kubernetes info:
Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Reporter: neenu
Launched spark thrift in kubernetes cluster with dynamic allocation enabled.
Configurations set :
spark.executor.memory=35g
spark.executor.cores=8
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=10
spark.dynamicAllocation.cachedExecutorIdleTimeout=15
spark.driver.memory=10g
spark.driver.cores=4
spark.sql.crossJoin.enabled=true
spark.sql.starJoinOptimization=true
spark.sql.codegen=true
spark.rpc.numRetries=5
spark.rpc.retry.wait=5
spark.sql.broadcastTimeout=1200
spark.network.timeout=1800
spark.dynamicAllocation.maxExecutors=15
spark.kubernetes.allocation.batch.size=2
spark.kubernetes.allocation.batch.delay=9
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kubernetes.node.selector.is_control=false
Tried to run TPCDS queries , on a 1TB parquet snappy data .
Found that as the execution progress, the tasks are done by a single executor ( executor 53 ) and no new executors are getting spawned, even though there is enough resources to spawn more executors.
Tried to manually delete the executor pod 53 and saw that no new executor has been spawned to replace the one which is running.
Attcahed the
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org