You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2016/07/05 00:06:25 UTC

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] RowMatrix constructor should...

GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/14051

    [SPARK-16372][MLlib] RowMatrix constructor should use retag for Java compatibility

    ## What changes were proposed in this pull request?
    
    The following Java code because of type erasing:
    
    ```Java
    JavaRDD<Vector> rows = jsc.parallelize(...);
    RowMatrix mat = new RowMatrix(rows.rdd());
    QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
    ```
    
    We should use retag to restore the type to prevent the following exception:
    
    ```Java
    java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector;
    ```
    
    
    ## How was this patch tested?
    
    Java unit test
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-16372

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14051
    
----
commit 82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-07-05T00:03:44Z

    add retag

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14051#discussion_r69521248
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -537,7 +537,7 @@ class RowMatrix @Since("1.0.0") (
       def tallSkinnyQR(computeQ: Boolean = false): QRDecomposition[RowMatrix, Matrix] = {
         val col = numCols().toInt
         // split rows horizontally into smaller matrices, and compute QR for each of them
    -    val blockQRs = rows.glom().map { partRows =>
    +    val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
    --- End diff --
    
    Where does the exception actually occur? I guess I'm surprised if this is the only place this is needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMat...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    Merged to master/2.0/1.6. I think it's a reasonably important bug fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61737/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    **[Test build #61739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61739/consoleFull)** for PR 14051 at commit [`0acd1e0`](https://github.com/apache/spark/commit/0acd1e0c2f3a517bda064c889d3f7ee9db2d5c39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMat...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    This one broke branch 1.6. I just reverted it. Please resubmit a backport for branch 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14051


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    **[Test build #61737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61737/consoleFull)** for PR 14051 at commit [`82b4edd`](https://github.com/apache/spark/commit/82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14051#discussion_r69636913
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -537,7 +537,7 @@ class RowMatrix @Since("1.0.0") (
       def tallSkinnyQR(computeQ: Boolean = false): QRDecomposition[RowMatrix, Matrix] = {
         val col = numCols().toInt
         // split rows horizontally into smaller matrices, and compute QR for each of them
    -    val blockQRs = rows.glom().map { partRows =>
    +    val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
    --- End diff --
    
    I agree with fixing it, just wonder exactly where the exception arises (not the nature of the problem; I get that) to verify this is the right place to retag. It seemed a little surprising but I assume you're right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61739/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14051#discussion_r69655115
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -537,7 +537,7 @@ class RowMatrix @Since("1.0.0") (
       def tallSkinnyQR(computeQ: Boolean = false): QRDecomposition[RowMatrix, Matrix] = {
         val col = numCols().toInt
         // split rows horizontally into smaller matrices, and compute QR for each of them
    -    val blockQRs = rows.glom().map { partRows =>
    +    val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
    --- End diff --
    
    From my log, I can see that it arises at the `glom()` function. Just like the `collect()`,1 they have a similar operation `iter: Iterator[T]) => iter.toArray`. So I think maybe here is the best place to call `retag`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    **[Test build #61737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61737/consoleFull)** for PR 14051 at commit [`82b4edd`](https://github.com/apache/spark/commit/82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14051#discussion_r69612547
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -537,7 +537,7 @@ class RowMatrix @Since("1.0.0") (
       def tallSkinnyQR(computeQ: Boolean = false): QRDecomposition[RowMatrix, Matrix] = {
         val col = numCols().toInt
         // split rows horizontally into smaller matrices, and compute QR for each of them
    -    val blockQRs = rows.glom().map { partRows =>
    +    val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
    --- End diff --
    
    Since it's a known Java type erased issue (https://issues.apache.org/jira/browse/SPARK-2737), I am not sure wether to fix it or not. If leaving it as is, then Java users should aware of it and retag the JavaRDD themselves. Otherwise we fix its constructors with either retaging the `rows` or adding a new JavaRDD constructor. However this may not be a single sample. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14051#discussion_r69656311
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -537,7 +537,7 @@ class RowMatrix @Since("1.0.0") (
       def tallSkinnyQR(computeQ: Boolean = false): QRDecomposition[RowMatrix, Matrix] = {
         val col = numCols().toInt
         // split rows horizontally into smaller matrices, and compute QR for each of them
    -    val blockQRs = rows.glom().map { partRows =>
    +    val blockQRs = rows.retag(classOf[Vector]).glom().map { partRows =>
    --- End diff --
    
    I also tried other interfaces of RowMatrix, all work good:
    
    ```Java
    JavaRDD<Vector> rows = jsc.parallelize(Arrays.asList(v1, v2, v3), 1);
    Matrix dm = Matrices.dense(3, 2, new double[] {1.0, 3.0, 5.0, 2.0, 4.0, 6.0});
    RowMatrix mat = new RowMatrix(rows.rdd());
    
    mat.computeGramianMatrix();
    mat.columnSimilarities();
    mat.columnSimilarities(0.5);
    mat.computeColumnSummaryStatistics();
    mat.computeCovariance();
    mat.computePrincipalComponents(1);
    mat.computeSVD(1, false, 1e-9);
    mat.toBreeze();
    mat.rows();
    mat.numCols();
    mat.numRows();
    mat.multiply(dm);
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] Retag RDD to tallSkinnyQR of RowMat...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    @zsxwing crumbs, thanks for that. It looks reasonably sure it's related, though, I still can't quite figure out how it would cause this failure:
    
    ```
    [error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:24: error: cannot find symbol
    [error] import org.apache.spark.SharedSparkSession;
    [error]                        ^
    [error]   symbol:   class SharedSparkSession
    [error]   location: package org.apache.spark
    [error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:31: error: cannot find symbol
    [error] public class JavaRowMatrixSuite extends SharedSparkSession {
    [error]                                         ^
    [error]   symbol: class SharedSparkSession
    [error] /home/jenkins/workspace/spark-branch-1.6-compile-maven-scala-2.11/mllib/src/test/java/org/apache/spark/mllib/linalg/distributed/JavaRowMatrixSuite.java:39: error: cannot find symbol
    [error]     JavaRDD<Vector> rows = jsc.parallelize(Arrays.asList(v1, v2, v3), 1);
    [error]                            ^
    ```
    
    Well, maybe safest to just leave this out of 1.6 in any event


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14051
  
    **[Test build #61739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61739/consoleFull)** for PR 14051 at commit [`0acd1e0`](https://github.com/apache/spark/commit/0acd1e0c2f3a517bda064c889d3f7ee9db2d5c39).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org