You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2021/04/23 18:22:00 UTC

[jira] [Resolved] (SPARK-35046) Wrong memory allocation on standalone mode cluster

     [ https://issues.apache.org/jira/browse/SPARK-35046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean R. Owen resolved SPARK-35046.
----------------------------------
    Resolution: Invalid

> Wrong memory allocation on standalone mode cluster
> --------------------------------------------------
>
>                 Key: SPARK-35046
>                 URL: https://issues.apache.org/jira/browse/SPARK-35046
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Mohamadreza Rostami
>            Priority: Major
>
> I see a bug in executer memory allocation in the standalone cluster, but I can't find which part of the spark code causes this problem. That why's I decided to raise this issue here.
> Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume also you have 2 spark jobs that run on this cluster of workers, and these jobs configs set as below:
> -----------------
> job-1:
> executer-memory: 5g
> executer-CPU: 4
> max-cores: 8
> ------------------
> job-2:
> executer-memory: 6g
> executer-CPU: 4
> max-cores: 8
> ------------------
> In this situation, We expect that if we submit both of these jobs, the first job that submits get  2 executers which each of them has 4 CPU core and 5g memory, and the second job gets only one executer on thirds worker who has 4 CPU core and 6g memory because worker 1 and worker 2 doesn't have enough memory to accept the second job. But surprisingly, we see that one of the first or second workers creates an executor for job-2, and the worker's consuming memory goes beyond what's allocated to that and gets 11g memory from the operating system.
> Is this behavior normal? I think this can cause some undefined behavior problem in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org