You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by XuTingjun <gi...@git.apache.org> on 2014/12/26 03:34:44 UTC

[GitHub] spark pull request: specify AM core in yarn-client and yarn-cluste...

GitHub user XuTingjun opened a pull request:

    https://github.com/apache/spark/pull/3806

    specify AM core in yarn-client and yarn-cluster mode

    I add some configurations below.
    spark.yarn.am.cores/SPARK_MASTER_CORES/SPARK_DRIVER_CORES for yarn-client mode;
    spark.driver.cores for yarn-cluster mode.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/XuTingjun/spark SPARK1507-3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3806.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3806
    
----
commit 972fa20ff66f24078c98aca0c29ff56b911718ca
Author: Xutingjun <10...@qq.com>
Date:   2014-12-26T02:31:35Z

    specify AM core in yarn-client and yarn-cluster mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68666657
  
    @andrewor14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68672836
  
    @sryza, do you mean "spark.driver.memory" works in yarn-client and yarn-cluster mode, so we should use one configuration maybe named "spark.driver.cores" to set am cores?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by WangTaoTheTonic <gi...@git.apache.org>.
Github user WangTaoTheTonic commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68822983
  
    Sorry but I already file a PR that splits spark.driver.memory in https://github.com/apache/spark/pull/3607.
    
    Could you please check if anyone already did the same work first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3806#discussion_r22447819
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
    @@ -120,6 +121,13 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf)
               amMemory = value
               args = tail
     
    +        case ("--master-cores" | "--driver-cores") :: IntParam(value) :: tail =>
    --- End diff --
    
    no need to add in a deprecated "--master-cores" property


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68673436
  
    Yearh, I agree with you. Later I will fix this. Thanks @sryza


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68125091
  
    @sryza, I am not agree with you. I only add the below code into cluster mode. So the  "--driver-cores" will not work in client mode. 
    OptionAssigner(args.driverCores, YARN, CLUSTER, clOption = "--driver-cores"),



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68673313
  
    I think the best thing would be to split spark.driver.memory into spark.driver.memory and spark.yarn.am.memory, and to have the latter only work for the yarn-client AM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3806#discussion_r22447830
  
    --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala ---
    @@ -70,6 +70,8 @@ private[spark] class YarnClientSchedulerBackend(
           List(
             ("--driver-memory", "SPARK_MASTER_MEMORY", "spark.master.memory"),
             ("--driver-memory", "SPARK_DRIVER_MEMORY", "spark.driver.memory"),
    +        ("--driver-cores", "SPARK_MASTER_CORES", "spark.yarn.am.cores"),
    --- End diff --
    
    No need to add in a deprecated SPARK_MASTER_CORES env var.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun closed the pull request at:

    https://github.com/apache/spark/pull/3806


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-69095103
  
    Hi @XuTingjun there seems to be a certain degree of overlap with work in #3607. Also, my concern with this PR is that it conflates "driver" and "AM" in a few places. For instance, `--driver-cores` should never configure the number of cores used by the AM in client mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3806#discussion_r22616963
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
    @@ -43,8 +44,13 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf)
     
       // Additional memory to allocate to containers
       // For now, use driver's memory overhead as our AM container's memory overhead
    -  val amMemoryOverhead = sparkConf.getInt("spark.yarn.driver.memoryOverhead",
    -    math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN))
    +  val amMemoryOverhead = if (userClass != null) {
    +    sparkConf.getInt("spark.yarn.driver.memoryOverhead",
    +      math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, MEMORY_OVERHEAD_MIN))
    +  } else {
    --- End diff --
    
    indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68119304
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68669501
  
    @XuTingjun ah, that's correct.  Looking more closely, my confusion was stemming from some existing weirdness, which is that setting the "spark.driver.memory" property will set the application master memory even in client mode.  Also, that passing "--driver-memory" as part of the YARN code's ClientArguments (but not to `spark-submit`) will do this as well.
    
    My opinion is that we should fix those, though I suppose it could be argued that it breaks backwards compatibility?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68122279
  
    SPARK_MASTER_CORES uses "master" incorrectly.  The only reason we have a SPARK_MASTER_MEMORY was to preserve backwards capability.
    
    This patch also still appears to use "--driver-cores" and SPARK_DRIVER_CORES 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/3806#issuecomment-68676616
  
    @sryza, I have splited spark.driver.memory into spark.driver.memory and spark.yarn.am.memory. Please have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org