You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ghoto <gi...@git.apache.org> on 2017/05/10 18:24:29 UTC

[GitHub] spark pull request #17940: Bug fix/spark 20687

GitHub user ghoto opened a pull request:

    https://github.com/apache/spark/pull/17940

    Bug fix/spark 20687

    ## What changes were proposed in this pull request?
    
    Bugfix for https://issues.apache.org/jira/browse/SPARK-20687
    
    Before converting a CSCMatrix to a Matrix, the trailing buffer 0s added by Breeze to rowIndices and data, are removed to avoid inconsistencies with colPtrs. Notice that this trailing buffers are often generated after operations between matrices such summation or subtraction, and this code causes therefore exceptions on valid BlockMatrix.add, and BlockMatrix.substract operations, because blocks are stored as SparseMatrix, converted to breeze and back to sparse.
    
    http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add
    
    ## How was this patch tested?
    
    Added a test to MatricesSuite that verifies that the conversions are valid and that code doesn't crash. Originally the same code would crash on Spark.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ghoto/spark bug-fix/SPARK-20687

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17940
    
----
commit 62d78a241c95d09896b731776e29a8cb883dfc49
Author: Ignacio Bermudez <ig...@gmail.com>
Date:   2017-05-10T04:31:03Z

    Reproducing SPARK-20687

commit dbbd39121f3210f6edd7a74bb21853fbda20c0cb
Author: Ignacio Bermudez <ig...@gmail.com>
Date:   2017-05-10T18:03:14Z

    [SPARK-20687] mllib.Matrices.fromBreeze may cause crash when converting breeze CSCMatrix
    
    In an operation of two A, B CSCMatrices the resulting C matrix may have some extra 0s
    in rowIndices and data which are created for performance improvement by Breeze.
    This causes problems on converting back to mllib.Matrix because it relies on
    rowIndices and data being coherent with colPtrs. Therefore it is necessary to truncate
    rowIndices and data to the active number of elements hold by the C matrix, before
    creating a Spark's SparseMatrix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3746/testReport)** for PR 17940 at commit [`18ce388`](https://github.com/apache/spark/commit/18ce388d81c54df8d17bb9690f489999a0d19858).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by ghoto <gi...@git.apache.org>.
Github user ghoto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r116155652
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,20 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    +        // .rowIndices and .data, which are added after some matrix
    +        // operations for efficiency.
    +        //
    +        // Therefore the last element of sm.colPtrs would no longer be
    +        // coherent with the size of sm.rowIndices and sm.data
    +        // despite sm being a valid CSCMatrix.
    +        // We need to truncate both arrays (rowIndices, data)
    +        // to the real size of the vector sm.activeSize to allow valid conversion
    +
    +        val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
    +        val truncData = sm.data.slice(0, sm.activeSize)
    --- End diff --
    
    I'm implementing both suggestions, however, wouldn't be the sm.copy more expensive than just doing those 2 slices?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3727 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3727/testReport)** for PR 17940 at commit [`b40a706`](https://github.com/apache/spark/commit/b40a706e4fa5733d8205c31efd48ff67b9c75575).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3710 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3710/testReport)** for PR 17940 at commit [`dbbd391`](https://github.com/apache/spark/commit/dbbd39121f3210f6edd7a74bb21853fbda20c0cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    Merged to master/2.2/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r116139174
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,20 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    +        // .rowIndices and .data, which are added after some matrix
    +        // operations for efficiency.
    +        //
    +        // Therefore the last element of sm.colPtrs would no longer be
    +        // coherent with the size of sm.rowIndices and sm.data
    +        // despite sm being a valid CSCMatrix.
    +        // We need to truncate both arrays (rowIndices, data)
    +        // to the real size of the vector sm.activeSize to allow valid conversion
    +
    +        val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
    +        val truncData = sm.data.slice(0, sm.activeSize)
    --- End diff --
    
    This is the same as calling compact(). But the good thing is that it won't impact the original matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3714 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3714/testReport)** for PR 17940 at commit [`b40a706`](https://github.com/apache/spark/commit/b40a706e4fa5733d8205c31efd48ff67b9c75575).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by ghoto <gi...@git.apache.org>.
Github user ghoto commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    Sorry about that. I added more context in the description and updated the title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r116139610
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
    @@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite {
         }
       }
     
    +  test("Test Breeze Conversion Bug - SPARK-20687") {
    --- End diff --
    
    specific name: Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros.
    
    And move the test after another unit test "fromBreeze with sparse matrix"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3733/testReport)** for PR 17940 at commit [`b40a706`](https://github.com/apache/spark/commit/b40a706e4fa5733d8205c31efd48ff67b9c75575).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    CC @hhbyyh for https://github.com/apache/spark/pull/9520
    Looks credible to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: Bug fix/spark 20687

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    Please fix up the title and description per http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17940


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3746 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3746/testReport)** for PR 17940 at commit [`18ce388`](https://github.com/apache/spark/commit/18ce388d81c54df8d17bb9690f489999a0d19858).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by ghoto <gi...@git.apache.org>.
Github user ghoto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r117619161
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,16 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +        val nsm = if (sm.rowIndices.length > sm.activeSize) {
    +          // This sparse matrix has trainling zeros.
    --- End diff --
    
    ups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r117480337
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,24 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    +        // .rowIndices and .data, which are added after some matrix
    +        // operations for efficiency.
    +        //
    +        // Therefore the last element of sm.colPtrs would no longer be
    +        // coherent with the size of sm.rowIndices and sm.data
    +        // despite sm being a valid CSCMatrix.
    +        // We need to truncate both arrays (rowIndices, data)
    +        // to the real size of the vector sm.activeSize to allow valid conversion
    +
    +        if (sm.rowIndices.length > sm.activeSize) {
    +          val smC = sm.copy
    +          smC.compact()
    +          new SparseMatrix(smC.rows, smC.cols, smC.colPtrs, smC.rowIndices, smC.data)
    +        } else {
    +          new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    --- End diff --
    
    @hhbyyh what do you think of the current state? I wasn't clear if you were requesting a specific change to the comment.
    
    Here, if you're going to change it again, you could simplify by not repeating the `new SparseMatrix(...)` call. Just pick which sparse matrix you're copying in the if statement (original or compacted) and then return the result of converting that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3733 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3733/testReport)** for PR 17940 at commit [`b40a706`](https://github.com/apache/spark/commit/b40a706e4fa5733d8205c31efd48ff67b9c75575).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r116351996
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,24 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    --- End diff --
    
    minor: make this comment more impact in fewer lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: Bug fix/spark 20687

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r116139038
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,20 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    +        // .rowIndices and .data, which are added after some matrix
    +        // operations for efficiency.
    +        //
    +        // Therefore the last element of sm.colPtrs would no longer be
    +        // coherent with the size of sm.rowIndices and sm.data
    +        // despite sm being a valid CSCMatrix.
    +        // We need to truncate both arrays (rowIndices, data)
    +        // to the real size of the vector sm.activeSize to allow valid conversion
    +
    --- End diff --
    
    Maybe we can add some if else here, since slice will copy the array and often that's not needed. 
    Please refer to https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/CSCMatrix.scala#L130


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by ghoto <gi...@git.apache.org>.
Github user ghoto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r117535893
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,24 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +
    +        // Some Breeze CSCMatrices may have extra trailing zeros in
    +        // .rowIndices and .data, which are added after some matrix
    +        // operations for efficiency.
    +        //
    +        // Therefore the last element of sm.colPtrs would no longer be
    +        // coherent with the size of sm.rowIndices and sm.data
    +        // despite sm being a valid CSCMatrix.
    +        // We need to truncate both arrays (rowIndices, data)
    +        // to the real size of the vector sm.activeSize to allow valid conversion
    +
    +        if (sm.rowIndices.length > sm.activeSize) {
    +          val smC = sm.copy
    +          smC.compact()
    +          new SparseMatrix(smC.rows, smC.cols, smC.colPtrs, smC.rowIndices, smC.data)
    +        } else {
    +          new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    --- End diff --
    
    Hey, I shortened the comment and I removed repeating the new SparseMatrix. 
    
    Give it a look now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17940#discussion_r117592116
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala ---
    @@ -992,7 +992,16 @@ object Matrices {
             new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
           case sm: BSM[Double] =>
             // There is no isTranspose flag for sparse matrices in Breeze
    -        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
    +        val nsm = if (sm.rowIndices.length > sm.activeSize) {
    +          // This sparse matrix has trainling zeros.
    --- End diff --
    
    trailing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by ghoto <gi...@git.apache.org>.
Github user ghoto commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    Need to fix line in the test because it's too long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17940
  
    **[Test build #3710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3710/testReport)** for PR 17940 at commit [`dbbd391`](https://github.com/apache/spark/commit/dbbd39121f3210f6edd7a74bb21853fbda20c0cb).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org