You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MartinWeindel <gi...@git.apache.org> on 2014/08/08 23:01:04 UTC

[GitHub] spark pull request: work around for problem with Mesos offering se...

GitHub user MartinWeindel opened a pull request:

    https://github.com/apache/spark/pull/1860

    work around for problem with Mesos offering semantic

    When using Mesos with the fine-grained mode, a Spark job can run into a dead lock on low allocatable memory on Mesos slaves. As a work-around 32 MB (= Mesos MIN_MEM) are allocated for each task, to ensure Mesos making new offers after task completion.
    From my perspective, it would be better to fix this problem in Mesos by dropping the constraint on memory for offers, but as temporary solu
    See [[MESOS-1688] No offers if no memory is allocatable](https://issues.apache.org/jira/browse/MESOS-1688) for details for this problem.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MartinWeindel/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1860
    
----
commit d9d2ca61ee35eedda23e15182f5b2e19aaf62e23
Author: Martin Weindel <ma...@gmail.com>
Date:   2014-08-08T20:44:44Z

    work around for problem with Mesos offering semantic (see [https://issues.apache.org/jira/browse/MESOS-1688])

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by MartinWeindel <gi...@git.apache.org>.
Github user MartinWeindel commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53203084
  
    Hey Patrick,
    
    first of all let me emphasize again that this is only a work-around. The 
    real problem is that Mesos only makes offers if there are at least 32 MB 
    memory available which conflicts with allocating memory only for Spark 
    worker executors and none for tasks.
    You seem to be right, this work-around does not help if executors 
    already consume all memory (up to a remainder of <= 31 MB).
    So I don't know if it will avoid dead locks in all cases.
    
    I can only argue from an experimental point of view, that I have not 
    seen the dead lock in my cluster anymore after applying this patch (I 
    have tested under very heavy work load).
    I suspect the chance is very small that another executor starts before 
    at least one task of the first executor is started.
    In any case, after a task is finished, there are at least 32 MB memory 
    allocatable so that Mesos always will make offers and the dead lock is 
    avoided.
    
    BTW, I have also played with changing the executor memory so that there 
    must always be some Mesos slave memory left over, but to my surprise 
    this did not avoid the dead locks reliable.
    
    So I'm not sure if this patch should be integrated into the Spark source 
    code.
    But I hope it helps to understand the issue. And maybe it makes the 
    fine-grained mode usable for similar setups like mine until a better 
    solution has been found.
    
    If I can help in any way, just tell me.
    
    Best regards,
    Martin
    
    
    Am 24.08.2014 19:16, schrieb Patrick Wendell:
    >
    > Hey Martin,
    >
    > I'm having a bit of trouble seeing how this works around the issue. 
    > From what I can tell the issue is that if someone creates Executors 
    > that consume all memory, Mesos will refuse to make offers for the 
    > tasks. However, this fix just adds 32MB of memory as a requirement for 
    > the task... but it seems like if the offer is never made in the first 
    > place, this will make no difference. Can you describe a sequence of 
    > offers where this change alters the execution? Thanks for looking into 
    > this!
    >
    >   * Patrick
    >
    > —
    > Reply to this email directly or view it on GitHub 
    > <https://github.com/apache/spark/pull/1860#issuecomment-53200124>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53200124
  
    Hey Martin,
    
    I'm having a bit of trouble seeing how this works around the issue. From what I can tell the issue is that if someone creates Executors that consume all memory, Mesos will refuse to make offers for the tasks. However, this fix just adds 32MB of memory as a requirement for the task... but it seems like if the offer is never made in the first place, this will make no difference. Can you describe a sequence of offers where this change alters the execution? Thanks for looking into this!
    
    - Patrick


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53226981
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19133/consoleFull) for   PR 1860 at commit [`d9d2ca6`](https://github.com/apache/spark/commit/d9d2ca61ee35eedda23e15182f5b2e19aaf62e23).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`
      * `    $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "$`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Adding known issue for MESOS-1688

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53515952
  
    Cool, thanks, that looks great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53226740
  
    BTW @MartinWeindel one small request -- can you update the docs/running-on-mesos.md page to explain that each task will consume 32 MB? Otherwise people might set Spark's executor memory to be all of the memory on the Mesos worker, which is going to mean no tasks launched.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Adding known issue for MESOS-1688

Posted by MartinWeindel <gi...@git.apache.org>.
Github user MartinWeindel commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53333888
  
    OK, so I have reverted the work-around patch and added a known issue paragraph to the running-on-mesos documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Adding known issue for MESOS-1688

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1860


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53227103
  
    BTW this failure is due to a style check -- you can run sbt scalastyle locally to find all style issues (the Jenkins log also lists the problem).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53229242
  
    That's true, now that we take 32 MB extra you need to change the logic about how many tasks we can allocate. That will make it trickier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by MartinWeindel <gi...@git.apache.org>.
Github user MartinWeindel commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53260324
  
    Yes, this becomes tricky. And I don't see a satisfying solution, as I would
    have to predict how many tasks will run in parallel to ensure that there is
    enough memory for each task.
    
    This patch solves one problem, but will introduce new ones. Because it's
    only dealing on the symptoms not on the cause.
    I think it is better not to integrate it.
    
    I've already created a pull request to get the cause fixed in Mesos:
    https://github.com/apache/mesos/pull/24
    
    
    
    On Mon, Aug 25, 2014 at 7:58 AM, Matei Zaharia <no...@github.com>
    wrote:
    
    > That's true, now that we take 32 MB extra you need to change the logic
    > about how many tasks we can allocate. That will make it trickier.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1860#issuecomment-53229242>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by iven <gi...@git.apache.org>.
Github user iven commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53228456
  
    @MartinWeindel I think you should check if there's enough memory in the offer first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Adding known issue for MESOS-1688

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-56283933
  
    Great! I'll create a JIRA to update Spark to it when that comes out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53229289
  
    Hey @MartinWeindel - I'm curious, which of the following cases are you in:
    
    Case 1. You have individual executors that attempt to acquire all the memory on the node.
    
    Case 2. You have multiple executors per node, but their total memory adds up the total amount of memory on the node.
    
    I could see how this would help with Case 2 because it could prevent a second executor from being launched in a way that acquires all of the host memroy. But I'm still wondering wither it affects Case 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53226637
  
    From my knowledge of Mesos, this seems like a good fix. I think we should do this until MESOS-1688 is fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53226713
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53226938
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19133/consoleFull) for   PR 1860 at commit [`d9d2ca6`](https://github.com/apache/spark/commit/d9d2ca61ee35eedda23e15182f5b2e19aaf62e23).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-51657125
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Adding known issue for MESOS-1688

Posted by timothysc <gi...@git.apache.org>.
Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-56064543
  
    Just for crossref MESOS-1688 has been committed and will be part of 0.21.0 release cycle. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: work around for problem with Mesos offering se...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1860#issuecomment-53313799
  
    After thinking about this more, it seems that another workaround is to make sure your executors always leave 32 MB free on each node (even if you launch multiple executors, make sure their sizes don't add up to quite the full memory). Would that work? If so, we can just add that to the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org