You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2017/08/24 21:37:42 UTC

[GitHub] spark pull request #19046: [SPARK-18769][yarn] Limit resource requests based...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/19046

    [SPARK-18769][yarn] Limit resource requests based on RM's available resources.

    This change limits the number of resource requests Spark will make to the
    YARN RM based on the number of available resources the RM advertises. This
    lets Spark "play nice" by reducing load on the RM when making requests,
    and also slightly speeding up the allocation code when the driver thinks
    it needs a lot of containers but the RM doesn't have enough free resources.
    
    Tested with added unit test, and by monitoring the allocator behavior with
    dynamic allocation on a spark-shell session.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-18769

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19046.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19046
    
----
commit 35d57a2d2989024884f7e63906e94e826a2c3801
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2017-08-23T22:56:17Z

    [SPARK-18769][yarn] Limit resource requests based on RM's available resources.
    
    This change limits the number of resource requests Spark will make to the
    YARN RM based on the number of available resources the RM advertises. This
    lets Spark "play nice" by reducing load on the RM when making requests,
    and also slightly speeding up the allocation code when the driver thinks
    it needs a lot of containers but the RM doesn't have enough free resources.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Unfortunately that isn't clear to me as to the cause or what the issue is with the yarn side.
    I'm not sure what he means by "AM is releasing and then acquiring the reservations again and again until it has enough to run all tasks that it needs"
    
    Also do you know if "Due to the sustained backlog trigger doubling the request size each time it floods the scheduler with requests."  Is talking about the exponential increase in the container reqeusts that spark does?  ie if we did all the requests at once up front would it be better.
    
    I would also like to know what issue this causes in yarn.
    
    His last statement is also not always true.  MR AM only takes headroom into account with slowstart and preempting reduces to run maps.  It may very well be that you are using slow start and if you have it configured very aggressive (meaning start reduces very early) it could be spread out quite a bit but you are also possibly wasting resources for the tradeoff of maybe finishing sooner depending on the job.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    I would like to clarify why we are doing this. The jira has some discussion on it but I would like to know exactly what we are improving/fixing?    
    
    If we do this change, I definitely want a config for it and prefer off by default.  As I mentioned in the jira if you don't ask for containers then when containers are freed up you might miss them because yarn gives them to someone else already asking. This can be an issue on a very busy cluster and I see many apps getting worse performance.
    
    The reasons I've seen mentioned in the jira are for memory in the AM and events in the spark driver. This change isn't going to help anything if your cluster is actually large enough to get say 50000 containers.  Memory is still going to be used. Hence why I want to define what the problem is.
    
    I'm also not sure why we just don't ask for all the containers up front. I was going to take a look at this to see if any info originally but never got around to it. I think it was just being cautious on the first implementation.   Asking up front all at once would help with us having to ask for containers on every heartbeat, and I assume help the event processing issues they mention in the driver (but I'm not sure since there was no details).  The only exception is if you are cancelling requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81108/testReport)** for PR 19046 at commit [`03477f5`](https://github.com/apache/spark/commit/03477f5282ba00b40915bae32dc5bd48c946e27c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    > This could just be an adhoc queue but the spark users would lose out to tez/mapreduce users. I'm pretty positive this will hurt spark users on some of our cluster so would want performance numbers to prove it doesn't. Otherwise would want the **config** to turn it off. Another way to possibly help this problem would be to ask for a certain percentage over the actual limit.
    
    I think this is the only concern that @tgravescs has? @vanzin would it be possible to make this new logic configurable?
    
    We have seen Spark's aggressive reservation requests cause certain problems, like yarn preemption doesn't kick in etc. It would be great to have this fix in.
    
    Thank you!



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81104/testReport)** for PR 19046 at commit [`35d57a2`](https://github.com/apache/spark/commit/35d57a2d2989024884f7e63906e94e826a2c3801).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    @tgravescs, here's quote from Wilfred Spiegelenburg - hope it answers both of your questions.
    
    > The behaviour I discussed earlier around the Spark AM reservations is not optimal. It turns out that the AM is releasing and then acquiring the reservations again and again until it has enough to run all tasks that it needs. This seems to be triggered by getting a container assigned to the application by the scheduler. Due to the sustained backlog trigger doubling the request size each time it floods the scheduler with requests. This issue will be logged as an internal jira at first. The next steps will be to discuss that behaviour with the Spark team with the goal of making it behave better on the cluster. The MR AM does behave better in this respect as it takes into account the available resources for the application via what is called "headroom". The Spark AM does not do this. 
    
    thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    It might help if you can give an exact scenario where you see the issue and perhaps configs if those matter.  meaning do you have like some set of minimum containers, a small idle timeout, etc..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19046: [SPARK-18769][yarn] Limit resource requests based...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin closed the pull request at:

    https://github.com/apache/spark/pull/19046


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    I'm going to close this; when I find some free time I might take a closer at the issue described in Wilfred's message.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    I'm in general not a fan of adding more and more config options; the end result is that most people won't enable it until they run into problems, and to me that's too late.
    
    I'm still waiting for feedback from our YARN team, to understand why it can't be fixed in YARN instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    > The MR AM does something similar.
    
    Can you be more specific here, which exact code are you referring to?  MR am does do some headroom calculation for things like slow start and preempting reducers to run maps.  I don't think it does it otherwise.
    
    From the comment:
    Maps are scheduled as soon as their requests are received. Reduces are 
      added to the pending and are ramped up (added to scheduled) based 
      on completed maps and current availability in the cluster.
    
    Spark doesn't have slow start, (doesn't start next stage tasks before previous stage has finished) so if you look at the code you see:  
    //if all maps are assigned, then ramp up all reduces irrespective of the headroom
    
    > Why? The only downside you mention is that if the cluster is completely full, then another app may be making requests and may get a container sooner than Spark. That sounds like an edge case where you really should be isolating those apps into separate queues if such resource contention is a problem.
    
    I disagree. We run some clusters very busy and I think if you don't ask for the containers up front, you will rarely get the available headroom to ask for more. Whereas if you have some container requests in there when other containers free up your application will get them since you are next in line.  users should not have to use different dedicated queues for this to work.  This could just be an adhoc queue but the spark users would lose out to tez/mapreduce users.   I'm pretty positive this will hurt spark users on some of our cluster so would want performance numbers to prove it doesn't.  Otherwise would want the config to turn it off.  Another way to possibly help this problem would be to ask for a certain percentage over the actual limit.
    
    > If all of those containers are actually allocated to apps, or your application's queue actually limits the number of containers your app can get to a much smaller number, then yes it can help.
    
     If the problem is the RM memory issue, then how does this make it better?  I still do 50,000 container requests, sure they do eventually get allocated if the cluster is large enough but I still had to do that request and RM had to handle it.  Yes when they get allocated they do go away so I guess it could be a bit better, but either way the RM has to handle the worse case so it has to handle 50,000 container requests from possibly more then 1 app.  I don't see this helping anything for large clusters.
    
    > Wouldn't that mean that YARN would actually allocate resources you don't need? 
    
    No you ask immediately for all the containers you are going to need for that stage.  You would start them unless of course by the time you get them you don't need them anymore.    If you happen to have tasks that finish fast, then you would have to cancel the container requests or give them back, but if they don't you aren't wasting anything.  I'm not sure how much it matters, we ramp up the ask pretty quickly, but if that is causing a lot of overhead in the spark driver then perhaps we should switch to do that or do some analysis on it.   
    
    Again that comes down to the actual problem we are trying to solve here.   Do you know was the problem purely the total # of container requests, the rate at which it was making requests, or other?  What was it causing on the RM?  memory usage, scheduler slow down? 
    
    this feels to me like a hack for small clusters.  If you have a small cluster, limit the max # of containers to ask for in the first place.  I do that even on our large clusters and personally would be ok with changing the default in Spark.  If you are still seeing a lot of yarn issues with it, then I would be ok putting this in, perhaps with the change to ask for a few over your limit but still would want a config for it unless we had some pretty positive performance numbers on a large cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    there are other options, like change the default for max executors to something reasonable. 
    
    I'm not sure if its related or not but we  are also looking at adding a config to limit # of tasks in parallel.  When that gets implemented it also limits the # of containers needed. https://github.com/apache/spark/pull/18950
    
    Oh I think the other reason for the exponential increase in dynamic allocation was for those really short tasks when you wouldn't need all the containers because they finish so quickly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    I was told that the preemption issue was fixed in YARN (that's YARN-6210); so I don't think there's a need for this code currently (just use a fixed YARN if you want reliable preemption).
    
    Wilfred is on vacation so I can ask him details, but:
    
    > It turns out that the AM is releasing and then acquiring the reservations again and again until it has enough to run all tasks that it needs.
    
    That could be an issue (don't remember details of the Spark allocator code), but at the same time it sounds like a different issue than this was trying to fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    @Tagar   can you be more specific about the problems you are seeing?  how does this affect preemption?  Why don't you see the same issues on MapReduce/Tez?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81104/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81105/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Yeah if we are releasing and they reacquiring right away over and over again that would be bad, but I don't know when we would do that so more details would be great if possible.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    This has been a long time request from our YARN team; they've had users run into issues with YARN and traced it back to Spark making a large number of container requests. You can argue that it's an issue in YARN and you'd be correct, but it doesn't mean Spark cannot try to help.
    
    The MR AM does [something similar](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java).
    
    >  looking at adding a config to limit # of tasks in parallel.
    
    That could help because it would limit the number of containers Spark would request indirectly, but it hardly feels like a substitute for this, since I expect very few people to use that.
    
    > I definitely want a config for it and prefer off by default
    
    Why? The only downside you mention is that if the cluster is completely full, then another app may be making requests and may get a container sooner than YARN. That sounds like an edge case where you really should be isolating those apps into separate queues if such resource contention is a problem.
    
    One thing that would probably be ok is to use `spark.dynamicAllocation.maxExecutors` as the upper limity if it's explicitly set, instead of this. But the default for that config is `Int.MaxValue`, so by default I think it makes sense to have a way to automatically limit these requests without users having to make guesses about what values they have to set.
    
    > This change isn't going to help anything if your cluster is actually large enough to get say 50000 containers. 
    
    If all of those containers are actually allocated to apps, or your application's queue actually limit the number of containers your app can get to a much smaller number, then yes it can help.
    
    > I'm also not sure why we just don't ask for all the containers up front.
    
    Wouldn't that mean that YARN would actually allocate resources you don't need? That sounds to me like a middle between dynamic allocation and fixed allocation, where you allocate the containers but don't actually start executors, and I'm not sure what that would help with.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    BTW, in spite of the above, if you really feel strongly about it I might just drop this and tell the YARN folks to make their code scale better. But I really don't see the downsides you seem worried about here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81105/testReport)** for PR 19046 at commit [`c356734`](https://github.com/apache/spark/commit/c356734be82146e258b275d15926da82cd4c359f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81108/testReport)** for PR 19046 at commit [`03477f5`](https://github.com/apache/spark/commit/03477f5282ba00b40915bae32dc5bd48c946e27c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81105/testReport)** for PR 19046 at commit [`c356734`](https://github.com/apache/spark/commit/c356734be82146e258b275d15926da82cd4c359f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    As I said, I do believe this is first and foremost a YARN problem and this was just trying to make Spark not trigger that. The things you're worried about can be adjusted (e.g. instead of using the current headroom, keep track of what's the maximum headroom Spark has seen as an approximation of cluster size), but ultimately it's still a YARN problem.
    
    I've asked the YARN guys who requested to take a look and clarify what the actual problem they want fixed is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    **[Test build #81104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81104/testReport)** for PR 19046 at commit [`35d57a2`](https://github.com/apache/spark/commit/35d57a2d2989024884f7e63906e94e826a2c3801).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81108/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org