You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "neenu (JIRA)" <ji...@apache.org> on 2018/10/08 13:29:00 UTC

[jira] [Created] (SPARK-25679) OOM Killed observed for spark thrift executors with dynamic allocation enabled

neenu created SPARK-25679:
-----------------------------

             Summary: OOM Killed observed for spark thrift executors with dynamic allocation enabled 
                 Key: SPARK-25679
                 URL: https://issues.apache.org/jira/browse/SPARK-25679
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.2.0
         Environment: Physical ab configurations.

8 baremetal servers, 
Each 56 Cores, 384GB RAM, RHEL 7.4
Kernel : 3.10.0-862.9.1.el7.x86_64
redhat-release-server.x86_64 7.4-18.el7

 

Spark Thrift server configurations 

driver memory :10GB

driver core :4

executor memory :35GB

executor core :8

 

Kubernetes info:


Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
            Reporter: neenu


Spark thrift executors are getting killed with OOM error , where dynamic allocation is enabled.

Tried to run TPCDS queries , on a 1TB parquet snappy data , where the executor memory was set as 35GB and cores as 8. The max executors set was 100. Saw around 30 executors running at a time.

Since dynamic allocation is enabled , where spark decides the no:of executors being spawned , should there be OOM errors ? Couldn't the spark decide to launch more executors to avoid the same ?

Note : There was enough cluster resources available to launch more executors if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org