You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anbang Hu (JIRA)" <ji...@apache.org> on 2018/09/12 00:59:00 UTC

[jira] [Updated] (SPARK-25410) Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable

     [ https://issues.apache.org/jira/browse/SPARK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anbang Hu updated SPARK-25410:
------------------------------
    Description: 
When deploying on YARN, only {{executorMemory}} is used to launch executors in [YarnAllocator.scala#L529|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L529]:
{code}
              try {
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                updateInternalState()
              } catch {
{code}
However, resource capability requested for each executor is {{executorMemory + memoryOverhead}} in [YarnAllocator.scala#L142|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L142]:

{code:scala}
  // Resource capability requested for each executors
  private[yarn] val resource = Resource.newInstance(executorMemory + memoryOverhead, executorCores)
{code}

This means that the amount of {{memoryOverhead}} will not be used in running the job, hence wasted.

Checking both k8s and Mesos, it looks like they both include overhead memory.
For k8s, in [ExecutorPodFactory.scala#L179|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L179]:

{code}
    val executorContainer = new ContainerBuilder()
      .withName("executor")
      .withImage(executorContainerImage)
      .withImagePullPolicy(imagePullPolicy)
      .withNewResources()
        .addToRequests("memory", executorMemoryQuantity)
        .addToLimits("memory", executorMemoryLimitQuantity)
        .addToRequests("cpu", executorCpuQuantity)
        .endResources()
      .addAllToEnv(executorEnv.asJava)
      .withPorts(requiredPorts.asJava)
      .addToArgs("executor")
      .build()
{code}

For Mesos, in[MesosSchedulerUtils.scala#L374|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L374]:

{code}
  /**
   * Return the amount of memory to allocate to each executor, taking into account
   * container overheads.
   *
   * @param sc SparkContext to use to get `spark.mesos.executor.memoryOverhead` value
   * @return memory requirement as (0.1 * memoryOverhead) or MEMORY_OVERHEAD_MINIMUM
   *         (whichever is larger)
   */
  def executorMemory(sc: SparkContext): Int = {
    sc.conf.getInt("spark.mesos.executor.memoryOverhead",
      math.max(MEMORY_OVERHEAD_FRACTION * sc.executorMemory, MEMORY_OVERHEAD_MINIMUM).toInt) +
      sc.executorMemory
  }
{code}


  was:
When deploying on YARN, only `executorMemory` is used to launch executors: https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L529

However, resource capability requested for each executor is `executorMemory + memoryOverhead`: https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L142

This means that the amount of `memoryOverhead` will not be used in running the job, hence wasted.

 Checking both k8s and Mesos, it looks like they both include overhead memory: https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L179
https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L374






> Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-25410
>                 URL: https://issues.apache.org/jira/browse/SPARK-25410
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.3.1
>            Reporter: Anbang Hu
>            Priority: Major
>
> When deploying on YARN, only {{executorMemory}} is used to launch executors in [YarnAllocator.scala#L529|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L529]:
> {code}
>               try {
>                 new ExecutorRunnable(
>                   Some(container),
>                   conf,
>                   sparkConf,
>                   driverUrl,
>                   executorId,
>                   executorHostname,
>                   executorMemory,
>                   executorCores,
>                   appAttemptId.getApplicationId.toString,
>                   securityMgr,
>                   localResources
>                 ).run()
>                 updateInternalState()
>               } catch {
> {code}
> However, resource capability requested for each executor is {{executorMemory + memoryOverhead}} in [YarnAllocator.scala#L142|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L142]:
> {code:scala}
>   // Resource capability requested for each executors
>   private[yarn] val resource = Resource.newInstance(executorMemory + memoryOverhead, executorCores)
> {code}
> This means that the amount of {{memoryOverhead}} will not be used in running the job, hence wasted.
> Checking both k8s and Mesos, it looks like they both include overhead memory.
> For k8s, in [ExecutorPodFactory.scala#L179|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L179]:
> {code}
>     val executorContainer = new ContainerBuilder()
>       .withName("executor")
>       .withImage(executorContainerImage)
>       .withImagePullPolicy(imagePullPolicy)
>       .withNewResources()
>         .addToRequests("memory", executorMemoryQuantity)
>         .addToLimits("memory", executorMemoryLimitQuantity)
>         .addToRequests("cpu", executorCpuQuantity)
>         .endResources()
>       .addAllToEnv(executorEnv.asJava)
>       .withPorts(requiredPorts.asJava)
>       .addToArgs("executor")
>       .build()
> {code}
> For Mesos, in[MesosSchedulerUtils.scala#L374|https://github.com/apache/spark/blob/18688d370399dcf92f4228db6c7e3cb186804c18/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L374]:
> {code}
>   /**
>    * Return the amount of memory to allocate to each executor, taking into account
>    * container overheads.
>    *
>    * @param sc SparkContext to use to get `spark.mesos.executor.memoryOverhead` value
>    * @return memory requirement as (0.1 * memoryOverhead) or MEMORY_OVERHEAD_MINIMUM
>    *         (whichever is larger)
>    */
>   def executorMemory(sc: SparkContext): Int = {
>     sc.conf.getInt("spark.mesos.executor.memoryOverhead",
>       math.max(MEMORY_OVERHEAD_FRACTION * sc.executorMemory, MEMORY_OVERHEAD_MINIMUM).toInt) +
>       sc.executorMemory
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org