You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "neenu (JIRA)" <ji...@apache.org> on 2018/10/16 11:14:00 UTC
[jira] [Updated] (SPARK-25743) New executors are not launched for kubernetes spark thrift on deleting existing executors

     [ https://issues.apache.org/jira/browse/SPARK-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

neenu updated SPARK-25743:
--------------------------
    Attachment: driver.log

> New executors are not launched for kubernetes spark thrift on deleting existing executors 
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25743
>                 URL: https://issues.apache.org/jira/browse/SPARK-25743
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.2.0
>         Environment: Physical lab configurations.
> 8 baremetal servers, 
> Each 56 Cores, 384GB RAM, RHEL 7.4
> Kernel : 3.10.0-862.9.1.el7.x86_64
> redhat-release-server.x86_64 7.4-18.el7
>  
>  
> Kubernetes info:
> Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
> Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
>            Reporter: neenu
>            Priority: Major
>         Attachments: driver.log, query_0_correct.sql
>
>
> Launched spark thrift in kubernetes cluster with dynamic allocation enabled.
> Configurations set : 
> spark.executor.memory=35g
> spark.executor.cores=8
> spark.dynamicAllocation.enabled=true
> spark.dynamicAllocation.executorIdleTimeout=10
> spark.dynamicAllocation.cachedExecutorIdleTimeout=15
> spark.driver.memory=10g
> spark.driver.cores=4
> spark.sql.crossJoin.enabled=true
> spark.sql.starJoinOptimization=true
> spark.sql.codegen=true
> spark.rpc.numRetries=5
> spark.rpc.retry.wait=5
> spark.sql.broadcastTimeout=1200
> spark.network.timeout=1800
> spark.dynamicAllocation.maxExecutors=15
> spark.kubernetes.allocation.batch.size=2
> spark.kubernetes.allocation.batch.delay=9
> spark.serializer=org.apache.spark.serializer.KryoSerializer
> spark.kubernetes.node.selector.is_control=false
> Tried to run TPCDS queries , on a 1TB parquet snappy data . 
> Found that as the execution progress, the tasks are done by a single executor ( executor 53 ) and no new executors are getting spawned, even though there is enough resources to spawn more executors.
>  
> Tried to manually delete the executor pod 53 and saw that no new executor has been spawned to replace the one which is running.
> Attcahed the 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org