You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "neenu (JIRA)" <ji...@apache.org> on 2018/10/16 11:13:00 UTC

[jira] [Created] (SPARK-25743) New executors are not launched for kubernetes spark thrift on deleting existing executors

neenu created SPARK-25743:
-----------------------------

             Summary: New executors are not launched for kubernetes spark thrift on deleting existing executors 
                 Key: SPARK-25743
                 URL: https://issues.apache.org/jira/browse/SPARK-25743
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.2.0
         Environment: Physical lab configurations.

8 baremetal servers, 
Each 56 Cores, 384GB RAM, RHEL 7.4
Kernel : 3.10.0-862.9.1.el7.x86_64
redhat-release-server.x86_64 7.4-18.el7

 

 

Kubernetes info:

Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
            Reporter: neenu


Launched spark thrift in kubernetes cluster with dynamic allocation enabled.

Configurations set : 

spark.executor.memory=35g
spark.executor.cores=8
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=10
spark.dynamicAllocation.cachedExecutorIdleTimeout=15
spark.driver.memory=10g
spark.driver.cores=4
spark.sql.crossJoin.enabled=true
spark.sql.starJoinOptimization=true
spark.sql.codegen=true
spark.rpc.numRetries=5
spark.rpc.retry.wait=5
spark.sql.broadcastTimeout=1200
spark.network.timeout=1800
spark.dynamicAllocation.maxExecutors=15
spark.kubernetes.allocation.batch.size=2
spark.kubernetes.allocation.batch.delay=9
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kubernetes.node.selector.is_control=false

Tried to run TPCDS queries , on a 1TB parquet snappy data . 

Found that as the execution progress, the tasks are done by a single executor ( executor 53 ) and no new executors are getting spawned, even though there is enough resources to spawn more executors.

 

Tried to manually delete the executor pod 53 and saw that no new executor has been spawned to replace the one which is running.

Attcahed the 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org