You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by kayousterhout <gi...@git.apache.org> on 2014/02/27 09:49:50 UTC

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

GitHub user kayousterhout opened a pull request:

    https://github.com/apache/spark/pull/27

    [SPARK-979] Randomize order of offers.

    This commit randomizes the order of resource offers to avoid scheduling
    all tasks on the same small set of machines.
    
    This is a much simpler solution to SPARK-979 than #7.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kayousterhout/spark-1 randomize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/27.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #27
    
----
commit 435d817424d3c9f3d900c65164ee5ed49b037b26
Author: Kay Ousterhout <ka...@gmail.com>
Date:   2014-02-27T08:45:17Z

    [SPARK-979] Randomize order of offers.
    
    This commit randomizes the order of resource offers to avoid scheduling
    all tasks on the same small set of machines.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36271766
  
    IMHO it would be better to try the simpler solution first and see how it works -- With a large enough number of executors I think the probability of seeing the same executors repeatedly should be pretty small 
    
    P.S: (It looks like an instance of balls-bins problem to me, so the imbalance should be at worst log N ?).  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36275312
  
    @shivaram I understand your cautiousness and I agree with Kay on that we would be careful when adding the complexity to the already-complex code base. So, I don't mind closing my PR if we decide to use randomization to resolve the issue...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36221897
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36433891
  
    I've merged this into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36224189
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36221889
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36224190
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12912/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36238278
  
    en...it's much simpler...but randomization can just mitigate the issue with some probability? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/27


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/27#issuecomment-36277558
  
    I see two issues: 1) The deterministic nature of the current scheduler places tasks on the same small set of machines while leaving others largely unused; 2) There is no rebalancing of partitions across worker nodes when new nodes are added to the cluster.  Neither LRU nor randomization really addresses the rebalancing issue, and LRU is only a little better than randomization in addressing the unused workers issue, so I think the additional complexity of LRU weighs against it -- at least until such time as we have evidence that random isn't adequate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---