You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by nishkamravi2 <gi...@git.apache.org> on 2014/07/13 07:42:07 UTC

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

GitHub user nishkamravi2 opened a pull request:

    https://github.com/apache/spark/pull/1391

    Modify default YARN memory_overhead-- from an additive constant to a multiplier

    Related to https://github.com/apache/spark/pull/894/ and https://issues.apache.org/jira/browse/SPARK-2398 Experiments show that memory_overhead grows with container size. The multiplier has been experimentally obtained and can potentially be improved over time. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nishkamravi2/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1391
    
----
commit 681b36f5fb63e14dc89e17813894227be9e2324f
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-05-08T07:05:33Z

    Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
    
    The prefix "file:" is missing in the string inserted as key in HashMap

commit 5108700230fd70b995e76598f49bdf328c971e77
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-03T22:25:22Z

    Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)

commit 6b840f017870207d23e75de224710971ada0b3d0
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-03T22:34:02Z

    Undo the fix for SPARK-1758 (the problem is fixed)

commit df2aeb179fca4fc893803c72a657317f5b5539d7
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-09T19:02:59Z

    Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)

commit eb663ca20c73f9c467192c95fc528c6f55f202be
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-09T19:04:39Z

    Merge branch 'master' of https://github.com/apache/spark

commit 5423a03ddf4d747db7261d08a64e32f44e8be95e
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-10T20:06:07Z

    Merge branch 'master' of https://github.com/apache/spark

commit 3bf8fad85813037504189cf1323d381fefb6dfbe
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-16T05:47:00Z

    Merge branch 'master' of https://github.com/apache/spark

commit 2b630f94079b82df3ebae2b26a3743112afcd526
Author: nravi <nr...@c1704.halxg.cloudera.com>
Date:   2014-06-16T06:00:31Z

    Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark

commit efd688a4e15b79e92d162073035b03362fcf66f0
Author: Nishkam Ravi <nr...@cloudera.com>
Date:   2014-07-13T00:04:17Z

    Merge branch 'master' of https://github.com/apache/spark

commit 715201e21a963c1d435a167d79e25c5b55b73885
Author: Nishkam Ravi <nr...@cloudera.com>
Date:   2014-07-13T05:33:12Z

    Modify default YARN memory_overhead: from an additive constant to a multiplier

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836123
  
    You are missing my point I think ... To give unscientific anecdotal example
    : our gbdt expiriments , which run on about 22 nodes need no tuning.
    While our collaborative filtering expiriments, running on 300 nodes require
    much higher overhead.
    But QR factorization on the same 300 nodes need much lower overhead.
    The values are all over the place and very app specific.
    
    In an effort to ensure jobs always run to completion, setting overhead to
    high fraction of executor memory might ensure successful completion but at
    high performance loss and substandard scaling.
    
    I would like a good default estimate of overhead ... But that is not
    fraction of executor memory.
    Instead of trying to model the overhead using executor memory, better would
    be to look at actual parameters which influence it (as in, look at code and
    figure it out; followed by validation and tuning of course) and use that as
    estimate.
     On 13-Jul-2014 2:58 pm, "nishkamravi2" <no...@github.com> wrote:
    
    > That's why the parameter is configurable. If you have jobs that cause
    > 20-25% memory_overhead, default values will not help.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835881>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835881
  
    That's why the parameter is configurable. If you have jobs that cause 20-25% memory_overhead, default values will not help. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835312
  
    We have gone over this in the past .. it is suboptimal to make it a linear
    function of executor/driver memory.
    Overhead is a function of number of executors, number of opened files,
    shuffle vm pressure, etc.
    It is NOT a function of executor memory : which is why it is separately
    configured.
     On 13-Jul-2014 11:16 am, "UCB AMPLab" <no...@github.com> wrote:
    
    > Can one of the admins verify this patch?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48832590>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 closed the pull request at:

    https://github.com/apache/spark/pull/1391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835447
  
    That makes sense, but then it doesn't explain why a constant amount works for a given job when executor memory is low, and then doesn't work when it is high. This has also been my experience and I don't have a great grasp on why it would be. More threads and open files in a busy executor? It goes indirectly with how big you need your executor to be, but not directly.
    
    Nishkam do you have a sense of how much extra memory you had to configure to get it to work when executor memory increased? is it pretty marginal, or quite substantial?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56342496
  
    If #2485 is the replacement, can we close this one out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-49480064
  
    I'll let mridul comment on this but I think adding a comment where 0.06 came from would be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-49348179
  
    Bringing the discussion back online. Thanks for all the input so far. 
    
    Ran a few experiments yday and today. Number of executors (which was the other main handle we wanted to factor in) doesn't seem to have any noticeable impact. Tried a few other parameters such as num_partitions, default_parallelism but nothing sticks. Confirmed the proportionality with container size. Have also been trying to tune the multiplier to minimize potential waste and I think 6% (as opposed to 7% as we currently have) is the lowest we should go. Modifying the PR accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56120931
  
    Updated as per @andrewor14 's comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835769
  
    You are lucky :-) for some of our jobs, in a 8gb container, overhead is
    1.8gb !
    On 13-Jul-2014 2:41 pm, "nishkamravi2" <no...@github.com> wrote:
    
    > Sean, the memory_overhead is fairly substantial. More than 2GB for a 30GB
    > executor. Less than 400MB for a 2GB executor.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835560>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836619
  
    Hmm, looks like some of my responses to Sean via mail reply have not shown up here ... Maybe mail gateway delays ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56346539
  
    Shall we let this linger on for just a bit until the other one gets merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56119506
  
    What is the current state of this PR? @tgravescs @mridulm any more thoughts about the current approach? This is a related PR for mesos and I'm wondering if we can use the same approach in both places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48832590
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835618
  
    That would be a function of your jobs.
    Other apps would have a drastically different characteristics ... Which is
    why we can't generalize to a simple fraction of executor memory.
    It actually buys us nothing in general case ... Jobs will continue to fail
    when it is incorrect : while wasting a lot of memory
    On 13-Jul-2014 2:38 pm, "nishkamravi2" <no...@github.com> wrote:
    
    > Yes, I'm aware of the discussion on this issue in the past. Experiments
    > confirm that overhead is a function of executor memory. Why and how can be
    > figured out with due diligence and analysis. It may be a function of other
    > parameters and the function may be fairly complex. However, the
    > proportionality is undeniable. Besides, we are only adjusting the default
    > value and making it a bit more resilient. The memory_overhead parameter can
    > still be configured by the developer separately. The constant additive
    > factor makes little sense (empirically).
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835500>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835566
  
    The default constant is actually a lowerbound to account for other
    overheads (since yarn will aggressively kill tasks)... Unfortunately we
    have not sized this properly : and don't have good recommendation on how to
    set it.
    
    This is compounded by magic constants in spark for various IO ops, non
    deterministic network behaviour (we should be able to estimate upper bound
    here = 2x number of workers), vm memory use (shuffle output is mmapp'ed
    whole ... going foul with yarn virtual men limits) and so on.
    
    Hence sizing this is, unfortunately, app specific.
     On 13-Jul-2014 2:34 pm, "Sean Owen" <no...@github.com> wrote:
    
    > That makes sense, but then it doesn't explain why a constant amount works
    > for a given job when executor memory is low, and then doesn't work when it
    > is high. This has also been my experience and I don't have a great grasp on
    > why it would be. More threads and open files in a busy executor? It goes
    > indirectly with how big you need your executor to be, but not directly.
    >
    > Nishkam do you have a sense of how much extra memory you had to configure
    > to get it to work when executor memory increased? is it pretty marginal, or
    > quite substantial?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835447>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836527
  
    correction: (and that is NOT very high as a fraction).
    
    Tying on phones can suck :-)
    
    
    To add to Sean's point: we definitely need to estimate this better.
    I want to ensure we do that on the right parameters to minimize memory waste while giving good out of the box behaviour


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836408
  
    On Jul 13, 2014 3:16 PM, "nishkamravi2" <no...@github.com> wrote:
    >
    > Mridul, I think you are missing the point. We understand that this
    parameter will in a lot of cases have to be specified by the developer,
    since there is no easy way to model it (that's why we are retaining it as a
    configurable parameter). However, the question is what would be a good
    default value be.
    >
    
    It does not help to estimate using the wrong variable.
    Any correlation which exists are incidental and app specific, as I
    elaborated before.
    
    The only actual correlation between executor memory and overhead is java vm
    overheads in managing very large heaps (and that is very high as a
    fraction). Other factors in spark have far higher impact than this.
    
    > "I would like a good default estimate of overhead ... But that is not
    > fraction of executor memory. "
    >
    > You are mistaken. It may not be a directly correlated variable, but it is
    most certainly indirectly correlated. And it is probably correlated to
    other app-specific parameters as well.
    
    Please see above.
    
    >
    > "Until the magic explanatory variable is found, which one is less
    problematic for end users -- a flat constant that frequently has to be
    tuned, or an imperfect model that could get it right in more cases?"
    >
    > This is the right point of view.
    
    Which has been our view even in previous discussions :-)
    It is unfortunate that we did not approximate this better from the start
    and went with the constant from the prototype.l impl.
    
    Note that this estimation would be very volatile to spark internals
    
    >
    > —
    > Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835727

Yes of course, lots of settings' best or even usable values are ultimately app-specific. Ideally, defaults work for lots of cases. A flat value is the simplest of models, and anecdotally, the current default value does not work in medium- to large-memory YARN jobs. You can increase the default, but then the overhead gets silly for small jobs -- 1GB? And all of these are not-uncommon use cases.

None of that implies the overhead logically scales with container memory. Empirically, it may do, and that's useful. Until the magic explanatory variable is found, which one is less problematic for end users -- a flat constant that frequently has to be tuned, or an imperfect model that could get it right in more cases?

That said it is kind of a developer API change and feels like something to not keep reimagining.

Niskham can you share any anecdotal evidence about how the overhead changes. If executor memory is the only variable changing, that seems to be evidence against it being driven by other factors. but I don't know if that's what we know.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-49483642
  
    6% was experimentally obtained (with the goal of keeping the bound as tight as possible without the containers crashing). Three workloads were experimented with: PageRank, WordCount and KMeans over moderate to large input datasets and configured such that the containers are optimally utilized (neither under-utilized nor over-subscribed). Based on my observations, less than 5% is a no-no. If someone would like to tune this parameter more and make a case for a higher value (keeping in mind that this is a default value that will not cover all workloads), that would be helpful. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by Mridul Muralidharan <mr...@gmail.com>.

You are lucky :-) for some of our jobs, in a 8gb container, overhead is
1.8gb !
On 13-Jul-2014 2:40 pm, "nishkamravi2" <gi...@git.apache.org> wrote:

> Github user nishkamravi2 commented on the pull request:
>
>     https://github.com/apache/spark/pull/1391#issuecomment-48835560
>
>     Sean, the memory_overhead is fairly substantial. More than 2GB for a
> 30GB executor. Less than 400MB for a 2GB executor.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835560
  
    Sean, the memory_overhead is fairly substantial. More than 2GB for a 30GB executor. Less than 400MB for a 2GB executor. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56347057
  
    Noticed that we have a reference to this one in 2485, closing it out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48832556
  
    Also see: https://issues.apache.org/jira/browse/SPARK-2444


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56132524
  
    @nishkamravi2 mind resolving the merge conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835656
  
    The basic issue is you are trying to model overhead using the wrong
    variable... It has no correlation on executor memory actually (other than
    vm overheads as heap increases)
    On 13-Jul-2014 2:44 pm, "Mridul Muralidharan" <mr...@gmail.com> wrote:
    
    > That would be a function of your jobs.
    > Other apps would have a drastically different characteristics ... Which is
    > why we can't generalize to a simple fraction of executor memory.
    > It actually buys us nothing in general case ... Jobs will continue to fail
    > when it is incorrect : while wasting a lot of memory
    > On 13-Jul-2014 2:38 pm, "nishkamravi2" <no...@github.com> wrote:
    >
    >> Yes, I'm aware of the discussion on this issue in the past. Experiments
    >> confirm that overhead is a function of executor memory. Why and how can be
    >> figured out with due diligence and analysis. It may be a function of other
    >> parameters and the function may be fairly complex. However, the
    >> proportionality is undeniable. Besides, we are only adjusting the default
    >> value and making it a bit more resilient. The memory_overhead parameter can
    >> still be configured by the developer separately. The constant additive
    >> factor makes little sense (empirically).
    >>
    >> —
    >> Reply to this email directly or view it on GitHub
    >> <https://github.com/apache/spark/pull/1391#issuecomment-48835500>.
    >>
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56174217
  
    @mridulm  any comments?
    
    I'm ok with it if its a consistent problem for users.  One thing we definitely need to do is document it and possibly look at including better log and error messages. We should at least log the size of the overhead it calculates.  It would also be nice to log what it is when we fail to get a container large enough or it fails due to the cluster max allocation limit was hit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835852
  
    Experimented with three different workloads and noticed common patterns of proportionality. 
    Other parameters were left unchanged and only executor size was increased. The memory-overhead ranges between 0.05-0.08 * executor_memory size. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836220
  
    Mridul, I think you are missing the point. We understand that this parameter will in a lot of cases have to be specified by the developer, since there is no easy way to model it (that's why we are retaining it as a configurable parameter). However, the question is what would be a good default value be. 
    
    "I would like a good default estimate of overhead ... But that is not
    fraction of executor memory. "
    
    You are mistaken. It may not be a directly correlated variable, but it is most certainly indirectly correlated. And it is probably correlated to other app-specific parameters as well. 
    
    "Until the magic explanatory variable is found, which one is less problematic for end users -- a flat constant that frequently has to be tuned, or an imperfect model that could get it right in more cases?"
    
    This is the right point of view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835500
  
    Yes, I'm aware of the discussion on this issue in the past. Experiments confirm that overhead is a function of executor memory. Why and how can be figured out with due diligence and analysis. It may be a function of other parameters and the function may be fairly complex. However, the proportionality is undeniable. Besides, we are only adjusting the default value and making it a bit more resilient. The memory_overhead parameter can still be configured by the developer separately. The constant additive factor makes little sense (empirically).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56132497
  
    These changes look good to me.  This addresses what continues to be the #1 issue that we see in Cloudera customer YARN deployments.  It's worth considering boosting this when using PySpark, but that's probably work for another JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56334843
  
    Have redone the PR against the recent master branch, which has undergone significant structural changes for Yarn. Addressed review comments and changed the multiplier back to 0.07 (to err on the conservative side, since customers are running into this issue).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1391#discussion_r17762675
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala ---
    @@ -92,7 +92,7 @@ private[yarn] class YarnAllocationHandler(
     
       // Additional memory overhead - in mb.
       private def memoryOverhead: Int = sparkConf.getInt("spark.yarn.executor.memoryOverhead",
    -    YarnAllocationHandler.MEMORY_OVERHEAD)
    +    math.max((YarnAllocationHandler.MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, YarnAllocationHandler.MEMORY_OVERHEAD_MIN))
    --- End diff --
    
    line too long, here and other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by nishkamravi2 <gi...@git.apache.org>.

Github user nishkamravi2 commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-56142506
  
    @sryza Thanks Sandy.  Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-54694595
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836879
  
    Since this is a recurring nightmare for our users, let me try to list down
    the factors which influence overhead given current spark codebase state in
    the jira when I am back at my desk  ... And we can add to that and model
    from there (I won't be able to lead the effort though unfortunately, so
    would be great if you or Sean can).
    
    If it so happens that end of the exercise it is linear function of memory,
    I am fine with it : as long as we decide based on actual data :-)
    On 13-Jul-2014 3:26 pm, "Mridul Muralidharan" <mr...@gmail.com> wrote:
    
    >
    > On Jul 13, 2014 3:16 PM, "nishkamravi2" <no...@github.com> wrote:
    > >
    > > Mridul, I think you are missing the point. We understand that this
    > parameter will in a lot of cases have to be specified by the developer,
    > since there is no easy way to model it (that's why we are retaining it as a
    > configurable parameter). However, the question is what would be a good
    > default value be.
    > >
    >
    > It does not help to estimate using the wrong variable.
    > Any correlation which exists are incidental and app specific, as I
    > elaborated before.
    >
    > The only actual correlation between executor memory and overhead is java
    > vm overheads in managing very large heaps (and that is very high as a
    > fraction). Other factors in spark have far higher impact than this.
    >
    > > "I would like a good default estimate of overhead ... But that is not
    > > fraction of executor memory. "
    > >
    > > You are mistaken. It may not be a directly correlated variable, but it
    > is most certainly indirectly correlated. And it is probably correlated to
    > other app-specific parameters as well.
    >
    > Please see above.
    >
    > >
    > > "Until the magic explanatory variable is found, which one is less
    > problematic for end users -- a flat constant that frequently has to be
    > tuned, or an imperfect model that could get it right in more cases?"
    > >
    > > This is the right point of view.
    >
    > Which has been our view even in previous discussions :-)
    > It is unfortunate that we did not approximate this better from the start
    > and went with the constant from the prototype.l impl.
    >
    > Note that this estimation would be very volatile to spark internals
    >
    > >
    > > —
    > > Reply to this email directly or view it on GitHub.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---