You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by brkyvz <gi...@git.apache.org> on 2015/09/14 23:36:11 UTC

[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/8757

    [SPARK-10599] Lower communication for block matrix multiplication

    This PR aims to decrease communication costs in BlockMatrix multiplication in two ways:
     - Simulate the multiplication on the driver, and figure out which blocks actually need to be shuffled
     - Send the block once to a partition, and join inside the partition rather than sending multiple copies to the same partition
    
    Initial benchmarking showed promising results (look below), however I did hit some `FileNotFound` exceptions with the new implementation after the shuffle.
    
    Size A: 1e5 x 1e5
    Size B: 1e5 x 1e5
    Block Sizes: 1024 x 1024
    Sparsity: 0.01
    Old implementation: 1m 13s
    New implementation: 9s
    
    cc @avulanov Would you be interested in helping me benchmark this? I used your code from the mailing list (which you sent about 3 months ago?), and the old implementation didn't even run, but the new implementation completed in 268s in a 120 GB / 16 core cluster

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark opt-bmm

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8757.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8757
    
----
commit 8dac58fc9a70fbb92984297f27550c631dfe8b1f
Author: Burak Yavuz <br...@gmail.com>
Date:   2015-09-14T21:17:56Z

    lower communication for block matrix multiplication

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148139413
  
      [Test build #43726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43726/console) for   PR 8757 at commit [`19c4b13`](https://github.com/apache/spark/commit/19c4b13f9ac5220e939e89d4a4be3ffca0cc3205).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8757


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148145666
  
      [Test build #43727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43727/consoleFull) for   PR 8757 at commit [`19c4b13`](https://github.com/apache/spark/commit/19c4b13f9ac5220e939e89d4a4be3ffca0cc3205).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148131106
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8757#discussion_r41586254
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala ---
    @@ -352,6 +353,30 @@ class BlockMatrix @Since("1.3.0") (
         }
       }
     
    +  private type BlockDestinations = Map[(Int, Int), Set[Int]]
    --- End diff --
    
    Document
    ```
    /** Block (i,j) --> Set of destination partitions */
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140470928
  
    More results. Tests performed on 4 executors each with 30 GB RAM, and 4 cores each:
    
     - Size of A: 1e5 x 1e5
     - Size of B: 1e5 x 1e5
     - Block Size: 1024
     - Number of partitions: 128
    
    *Note that for the old implementation, the cluster ran out of disk space for sparsity=0.2*
    ![vs_sparsity](https://cloud.githubusercontent.com/assets/5243515/9884092/c94775b8-5b92-11e5-99aa-34acd5962bff.png)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-146724888
  
      [Test build #1864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1864/console) for   PR 8757 at commit [`ae98edc`](https://github.com/apache/spark/commit/ae98edcb34cd2cb9b3b713d7f0c8717de972d0df).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by avulanov <gi...@git.apache.org>.
Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140554085
  
    Thank you for the update. Indeed, the tests take finite time to finish now. Let's add @mengxr to the discussion.
    
    Distributed matrix multiplication makes sense when it is faster than doing it on a single node. Lets assume that we have squared blocks, and `block*block` takes time `Tblock` on a single machine. I prepared two tests: 
      - Block-diagonal matrix multiplication `(M * M)`, where `M` is `NxN`. Single machine multiplication time will be `N*Tblock`. The optimal distributed time would be `Tblock` if the number of nodes <= `N`. This seems to be embarrassingly parallel.
      - Columnar and row matrix multiplication, (M * M^T), where M has `1` column and `N` row blocks. Single machine multiplication time will be `N*N*Tblock`
    
    I've done a benchmark for single node multiplication, for example it take 0.04s to multiply matrix 1000x1000 and 16.55s for 10000x10000 with OpenBLAS and 2x Xeon X5650  @ 2.67GHz. More results are here https://github.com/avulanov/scala-blas. 
    
    For the following distributed experiment, I am using 6 node with the same CPU, 5 workers and 1 master.
    #### Block-diagonal matrix multiplication:
    
    Size | Block | Time | Est. single node time
    ------------ | ------------- | ---------| ------|
    1000x1000 | block:1000 | 0.539322901 | 0.04
    2000x2000 | block:1000 | 0.594227124 | 0.08
    3000x3000 | block:1000 | 0.541293169 | 0.12
    4000x4000 | block:1000 | 0.520753395 | 0.16
    5000x5000 | block:1000 | 0.702532957 | 0.2  
    
    Size | Block | Time | Est. single node time
    ------------ | ------------- | ---------| ------|
    10000x10000 | block:10000 | 27.565218631 | 16.55
    20000x20000 | block:10000  | 28.363953039 | 33.1
    30000x30000 | block:10000 | 114.133834717 | 49.65
    40000x40000 | block:10000 | 117.701914787 | 66.2
    50000x50000 | block:10000 | 141.827804904 | 82.75
    
    For some reason, distributed operations are slower than the estimation on single node, though they can be well parallalized. Do you know the reason for that?
    
    #### Column and row matrix multiplication
    
    Size | Block | Time | Est. single node time
    ------------ | ------------- | ---------| ------|
    1000x1000 | block:1000 |  0.281162649 | 0.04
    2000x1000 | block:1000 |  0.461582522 | 0.16
    3000x1000 | block:1000 |  0.520122422 | 0.36
    4000x1000 | block:1000 |  0.560923767 | 0.64
    5000x1000 | block:1000 |  0.887406721 | 1
    
    Distributed operations become faster than single node with bigger columnar matrix. The test did not finish for the block size of 10000 because of Out of free space exception, though I used tempfs of 18GB as both spark.local.dir and java tmp. It seems that shuffling is really huge. Should it be so big?
    
    Link to the tests: https://github.com/avulanov/blockmatrix-benchmark/blob/master/src/blockmatrix.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140215666
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8757#discussion_r41586262
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala ---
    @@ -352,6 +353,30 @@ class BlockMatrix @Since("1.3.0") (
         }
       }
     
    +  private type BlockDestinations = Map[(Int, Int), Set[Int]]
    +
    +  /**
    +   * Simulate the multiplication with just indices in order to cut costs on communication, when
    +   * we are actually shuffling the matrices.
    +   */
    +  private def simulateMultiply(
    --- End diff --
    
    It'd be nice to add a unit test for this, though the logic looks fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140214626
  
      [Test build #42448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42448/console) for   PR 8757 at commit [`8dac58f`](https://github.com/apache/spark/commit/8dac58fc9a70fbb92984297f27550c631dfe8b1f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148139530
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-146720466
  
    Making a pass now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148162325
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148145399
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148162329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43727/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140214200
  
      [Test build #42448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42448/consoleFull) for   PR 8757 at commit [`8dac58f`](https://github.com/apache/spark/commit/8dac58fc9a70fbb92984297f27550c631dfe8b1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148139536
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43726/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148851400
  
    This LGTM.  I'll merge it with master.  Thanks for the PR!
    
    @avulanov I looked at your code, but the results seem strange to me.  We'll have to look into it more, I guess.  As far as utility of block-diagonal matrices, I've mainly seen them in the context of specialized applications with very structured feature interactions, but my experience there is from research, not industry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-146717274
  
      [Test build #1864 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1864/consoleFull) for   PR 8757 at commit [`ae98edc`](https://github.com/apache/spark/commit/ae98edcb34cd2cb9b3b713d7f0c8717de972d0df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8757#discussion_r41586259
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala ---
    @@ -352,6 +353,30 @@ class BlockMatrix @Since("1.3.0") (
         }
       }
     
    +  private type BlockDestinations = Map[(Int, Int), Set[Int]]
    +
    +  /**
    +   * Simulate the multiplication with just indices in order to cut costs on communication, when
    --- End diff --
    
    "just indices" --> "just block indices"
    Document return value.
    Copy doc: ```The `colsPerBlock` of this matrix must equal the `rowsPerBlock` of `other`.```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8757#discussion_r41586252
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala ---
    @@ -60,6 +60,7 @@ private[mllib] class GridPartitioner(
        */
       override def getPartition(key: Any): Int = {
         key match {
    +      case i: Int => i
    --- End diff --
    
    Update documentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-146723162
  
    Minor comments only.  Other than that, it looks fine to me.
    
    @avulanov In your "Block-diagonal matrix multiplication" tests, do you know if data were shuffled during the multiplications?  I'm wondering if Spark/BlockMatrix properly avoided shuffling the data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140218846
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148145424
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by avulanov <gi...@git.apache.org>.
Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140234877
  
    @brkyvz Thank you for notifying me. I would be interested to benchmark this PR. Should I use the same code from the mailing list? It can be found here as well https://github.com/avulanov/blockmatrix-benchmark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140235259
  
    @avulanov Feel free to benchmark it in anyway. The same code is also useful. I'm interested in how it would scale, and how it would perform if the matrix is fully dense. I'm doing some benchmarks of my own, it would be nice to have some sanity checks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-146733572
  
    @brkyvz Can you please add "[MLLIB]" to the PR title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140214630
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140212640
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148131064
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by avulanov <gi...@git.apache.org>.
Github user avulanov commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148817255
  
    @jkbradley According to the time taken it actually did the shuffle. However, I am not sure how useful in practice these block-diagonal matrices.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140215646
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140218848
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42451/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148133941
  
      [Test build #43726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43726/consoleFull) for   PR 8757 at commit [`19c4b13`](https://github.com/apache/spark/commit/19c4b13f9ac5220e939e89d4a4be3ffca0cc3205).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148161067
  
      [Test build #43727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43727/console) for   PR 8757 at commit [`19c4b13`](https://github.com/apache/spark/commit/19c4b13f9ac5220e939e89d4a4be3ffca0cc3205).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148130199
  
    @jkbradley Thank you for the review. Addressed your comments 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140212664
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599][MLLIB] Lower communication for b...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-148144514
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10599] Lower communication for block ma...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8757#issuecomment-140214633
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42448/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org