You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2015/12/17 06:49:58 UTC

[GitHub] spark pull request: [SPARK-12392] Optimize a location order of bro...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/10346

    [SPARK-12392] Optimize a location order of broadcast blocks by considering preferred local hosts

    When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark OptimizeBlockLocationOrder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10346
    
----
commit c39664085ac8ae89754dda653770314308b1c2fa
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2015-12-15T07:15:08Z

    Optimize a location order of blocks by considering preferred local host

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165652320
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165366369
  
    **[Test build #47902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47902/consoleFull)** for PR 10346 at commit [`c396640`](https://github.com/apache/spark/commit/c39664085ac8ae89754dda653770314308b1c2fa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165672690
  
    **[Test build #47975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47975/consoleFull)** for PR 10346 at commit [`c396640`](https://github.com/apache/spark/commit/c39664085ac8ae89754dda653770314308b1c2fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165366598
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392] Optimize a location order of bro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165349326
  
    **[Test build #47902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47902/consoleFull)** for PR 10346 at commit [`c396640`](https://github.com/apache/spark/commit/c39664085ac8ae89754dda653770314308b1c2fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165935707
  
    Looks great @maropu. Thanks for doing the benchmarks. I left two minor comments but other than that this is good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10346#discussion_r48177414
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -578,9 +578,19 @@ private[spark] class BlockManager(
         doGetRemote(blockId, asBlockResult = false).asInstanceOf[Option[ByteBuffer]]
       }
     
    +  private def getLocations(blockId: BlockId): Seq[BlockManagerId] = {
    +    /**
    +     * Return a list of locations for the given block, prioritizing the local machine since
    +     * multiple block managers can share the same host.
    +     */
    --- End diff --
    
    Java docs should live outside the `def`, not inside. I'll fix this myself when I merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165672757
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47975/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10346#discussion_r48087373
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -578,9 +578,18 @@ private[spark] class BlockManager(
         doGetRemote(blockId, asBlockResult = false).asInstanceOf[Option[ByteBuffer]]
       }
     
    +  private def getLocations(blockId: BlockId): Seq[BlockManagerId] = {
    +    // Since block managers can share an identical host, we put these preferred
    +    // locations first.
    --- End diff --
    
    I would move this to the java doc and rephrase it
    ```
    /**
     * Return a list of locations for the given block, prioritizing the local machine since multiple
     * block managers can share the same host.
     */
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165705964
  
    @andrewor14 Could you review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10346


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166199376
  
    **[Test build #48093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48093/consoleFull)** for PR 10346 at commit [`d962f15`](https://github.com/apache/spark/commit/d962f15e186bfe77d3fb3e5e4ec44d10b5523c0f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166179434
  
    @andrewor14 Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166436124
  
    LGTM merging into master. Thanks @maropu 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392] Optimize a location order of bro...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165349453
  
    I did quick benchmarks for large broadcasts;
    - aws m4.x4large x 4, 4 works in a host
    - elapsed time:
    -- w/opt.: 6.887943434s, w/o opt.: 11.738593435s
    - received network bytes in a host:
    -- w/opt.: 14782.93kB/s, w/o opt.: 35394.53kB/s
    ```
    val bv = sc.broadcast((0 until 50000000).toArray)
    val parts = sc.parallelize((1 to 16), 16).cache
    
    def timer[R](block: => R): R = {
      val t0 = System.nanoTime()
      val result = block
      val t1 = System.nanoTime()
      println("Elapsed time: " + ((t1 - t0 + 0.0) / 1000000000.0)+ "s")
      result
    }
    
    timer {
      parts.map(_ => bv.value.size).foreach(_ => {})
    }
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165654168
  
    **[Test build #47975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47975/consoleFull)** for PR 10346 at commit [`c396640`](https://github.com/apache/spark/commit/c39664085ac8ae89754dda653770314308b1c2fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166181392
  
    **[Test build #48093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48093/consoleFull)** for PR 10346 at commit [`d962f15`](https://github.com/apache/spark/commit/d962f15e186bfe77d3fb3e5e4ec44d10b5523c0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165366602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47902/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10346#discussion_r48087346
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -578,9 +578,18 @@ private[spark] class BlockManager(
         doGetRemote(blockId, asBlockResult = false).asInstanceOf[Option[ByteBuffer]]
       }
     
    +  private def getLocations(blockId: BlockId): Seq[BlockManagerId] = {
    +    // Since block managers can share an identical host, we put these preferred
    +    // locations first.
    +    val locs = Random.shuffle(master.getLocations(blockId))
    +    val preferredLocs = locs.filter(loc => blockManagerId.host == loc.host)
    +    val otherLocs = locs.filter(loc => blockManagerId.host != loc.host)
    --- End diff --
    
    you could do
    ```
    val (preferredLocs, otherLocs) = locs.partition { loc => blockManagerId.host == loc.host }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166199432
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48093/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-166199431
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10346#issuecomment-165672756
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org