You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2015/10/13 10:41:01 UTC

[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/9095

    [SPARK-11082][YARN] Fix wrong core number when response vcore is less than requested vcore

    This should be guarded out and use response vcore number, this will be happened when use `DefaultResourceCalculator` in capacity scheduler by default.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-11082

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9095
    
----
commit 5fb7413b503a97141776a76413a8d7020f97e027
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-10-13T08:06:20Z

    fix wrong vcore number

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147737702
  
    Yeah, I get it, thanks a lot for your explanation, still from user' point, it may easily get confused, maybe we should document this difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jdesmet <gi...@git.apache.org>.
Github user jdesmet commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-216370906
  
    However, memory reported in yarn ui on the containers seems to largely match with what I declared to use for the spark executors. Also capacity scheduler does have the option to use a resource calculator capable of accounting for cpu utilization. That makes me to (wrongly?) assume that capacity scheduler can take into account (measured?) memory and CPU utilization. 
    
    Sent from my iPhone
    
    > On May 2, 2016, at 10:39 AM, Marcelo Vanzin <no...@github.com> wrote:
    > 
    > why we can't report the correct vCores
    > 
    > @jdesmet Spark is not reporting anything, and that's the part you are confused about. YARN does all its accounting correctly. If Spark were able to influence YARN's accounting, that would be a huge bug in YARN.
    > 
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly or view it on GitHub
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147667700
  
    @srowen , not sure what exactly you mean?
    
    From what I know in `CoarseGrainedSchedulerBackend`, it will manage the executors with cores available, this number of cores is reported by executor when get launched and registered in driver. And executor gets the number of cores through argument specified in launching command, if we specify the wrong cores, driver will also get the wrong cores, that will be different from what we see in the cluster manager's aspect.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41866472
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    memory should never be less then requested.  vcores support was added later though and if its configured off or the scheduler doesn't support it then its possible to get back less.   Like mentioned the defaultResourceCalculator just always returns 1.
    
    There is already a comment on this at matchContainerToRequest.  Is this actually failing or you were just surprised at what you got?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41858120
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    Not sure from Yarn side the granted memory will possibly be less than the requested, I haven't met such problem yet.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147769172
  
    yes its really more a YARN problem then a SPARK problem.  Ideal the YARN side wouldn't show cores at all if you aren't using a scheduler that does cores, but that is kind of hard because you can write your own scheduler that does anything.
    
    I'm fine with documenting but if you look at the running on yarn page it already has the below under important notes:
    
    Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
    
    If you have ideas on making that documentation better I'm fine with it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147767229
  
    There's related discussion about this in https://issues.apache.org/jira/browse/SPARK-6050 and the respective PR (#4818).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147649014
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147734023
  
    Actually YARN doesn't allocate any. The only reason it reports 1 is because cpu scheduling is disabled and its trying to return something reasonable.YARN does not limit you to 1 core.  
    Before the cpu scheduler was available this is the only way to get more then 1 core for your application and if you are on an older version of hadoop you didn't have the cpu scheduler as an option.  Basically if yarn isn't managing then its up to the user to do something reasonable with that resource.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147651353
  
      [Test build #43639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43639/consoleFull) for   PR 9095 at commit [`5fb7413`](https://github.com/apache/spark/commit/5fb7413b503a97141776a76413a8d7020f97e027).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147656497
  
      [Test build #43639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43639/console) for   PR 9095 at commit [`5fb7413`](https://github.com/apache/spark/commit/5fb7413b503a97141776a76413a8d7020f97e027).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41856996
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    @srowen , we already had such defensive code for memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147898407
  
    Thanks a lot @tgravescs and @vanzin , looks like it is a intention to do such way, greatly appreciate your explanation, I will close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147665208
  
    CC @sryza @vanzin seems reasonable to make sure it's actually allocating what YARN said it could?
    
    Is this really the extent of the assumption though? it seems like Spark is otherwise, elsewhere assuming the number of cores it wanted was the number of cores it got.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao closed the pull request at:

    https://github.com/apache/spark/pull/9095


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147721566
  
    So actually against this change. It breaks backwards compatibility and I think the current behavior is what we want.
    
    @jerryshao  why do you think this is a problem?
    
    If YARN doesn't schedule for cores then the options are to limit it to what it gives you (which is 1 simply as a default since it isn't managing them) or allow SPARK to go ahead and use what the user asked for.  The way it is now (without this patch)  it allows spark to use more then 1 since the scheduler can't schedule them.  Its up to the user to do something reasonable.  Otherwise there is no way to allow spark to use more then 1 core with the DefaultResourceCalculator which I think would be a limitation.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147668124
  
    Gotcha. This is probably my ignorance/misunderstanding then. As long as this is the only place the fact that the requested amount wasn't the same as the granted amount.
    
    Does the same thing happen with  memory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147649000
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jdesmet <gi...@git.apache.org>.
Github user jdesmet commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-215998052
  
    From a user point of view the closure of this issue as-is is unacceptable. I cannot understand why one would allow wrong job accounting for the executors as reported in Yarn. This could affect the integrity of an entire cluster due to over scheduling. 
    
    Part of the discussion goes about explaining how to fix it with a different resource scheduler - of which I do not understand the details - but there was no documentation to be found. 
    
    I am looking at a pretty big cluster for a pretty big company with a lot of yarn scheduled jobs running on it - this worries me. It is pretty common to have executors running with 32 vcores or more, and when running with that much on one node - I have to be sure that yarn does not schedule anything else in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147727971
  
    If user want to set executor cores more than 1, user should choose dominant scheduler calculator, that will keep consistent both in spark and yarn side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147727046
  
    But from yarn's side actually only allocated 1 vcores, whereas in the driver side, it notified with more than 1 cores when executor get registered, this is not consistent and break the semantic of "resource", driver will schedule more than 1 tasks to this executor simultaneously, but the actual parallelism is only 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41872358
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    It will not fail, just made me quite confused when looking at the cores I set is different from what displayed in yarn side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147656903
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41866396
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -414,7 +418,7 @@ private[yarn] class YarnAllocator(
             executorId,
             executorHostname,
             executorMemory,
    -        executorCores,
    +        container.getResource.getVirtualCores,
    --- End diff --
    
    does this actually fail when using executorCores? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-216305430
  
    > why we can't report the correct vCores
    
    @jdesmet Spark is not reporting anything, and that's the part you are confused about. YARN does all its accounting correctly. If Spark were able to influence YARN's accounting, that would be a huge bug in YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jdesmet <gi...@git.apache.org>.
Github user jdesmet commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-216016788
  
    Humbly, I think I understood what this PR was about. I probably (still) do not understand some of the reasoning as to why we can't report the correct vCores even if the default resource calculator does not support it, and vCores is not used. The thread seemed to suggest it is possible, and was actually attempted in some modifications that were undone. Don't take it as I am saying it's wrong, it is probably just that you have a better understanding of it. However nothing against documenting it further?
    
    Also as to confirm - making sure I am not misunderstanding anything, as per the threads and documentation, to get it to work based on vCore resource allocation, following steps need to be accomplished:
    
    1.  Use the CapacityScheduler: in `conf/yarn-site.xml`, set `yarn.resourcemanager.scheduler.class` to `org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler`.
    2.  Modify the resource-calculator, to one that supports using vCores: set `yarn.scheduler.capacity.resource-calculator` to `org.apache.hadoop.yarn.util.resource.DominantResourceCalculator`. 
    
    Probably we need to file a bug to get the hadoop documentation fixed from `DefaultResourseCalculator` to `DefaultResourceCalculator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-216407536
  
    @jdesmet , by default if cpu scheduling is not enabled in yarn, what you saw on yarn's web UI about vcore usage (1 per container) is actually meaningless, I think that makes you confuse because what you specified is a different number, but there in yarn it only shows 1 core.
    
    This is only a yarn ui issue that is quite misleading if cpu scheduling is not enabled, internally in yarn's scheduling all the resource accounting is correct. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147734624
  
    Sometimes its not up to the user what scheduler they user.  Like in our case cluster admins choose what its running and users just use it.  They have to use whatever scheduler is provided.  If the cluster admins want to enforce cpu usage then they need to enable cpu scheduling. If cpu scheduling isn't on then they have to go smack users that abuse it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-216000549
  
    @jdesmet you did not understand what this PR was about. Nothing you're saying is affected by this PR. Accounting of core usage in YARN is not changed. Please read the whole discussion and linked PRs to understand why this doesn't affect any accounting at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9095#issuecomment-147656906
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43639/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by jdesmet <gi...@git.apache.org>.
Github user jdesmet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r61675971
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    Note that wrong vcore accounting in Yarn can affect system integrity due to over-scheduling the CPU. It is mandatory to have this working correctly if spark has to play a nice citizen on yarn (together with other scheduled apps or itself).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11082][YARN] Fix wrong core number when...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9095#discussion_r41857332
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -395,6 +395,10 @@ private[yarn] class YarnAllocator(
           val executorId = executorIdCounter.toString
     
           assert(container.getResource.getMemory >= resource.getMemory)
    --- End diff --
    
    True, I mean, should we expect there is a case where the granted memory is less than requested as well? and allow or handle it? right now it's rejected, so I expect it can't happen. But then again the code seemed to assume that (sort of) about vcores too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org