You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zjffdu <gi...@git.apache.org> on 2016/08/09 05:32:46 UTC

[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

GitHub user zjffdu opened a pull request:

    https://github.com/apache/spark/pull/14555

    [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector.

    ## What changes were proposed in this pull request?
    
    1. In scala, add negative low bound checking and put all the low/upper bound checking in one place
    2. In python, add low/upper bound checking of indices.
    
    
    ## How was this patch tested?
    
    unit test added
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/spark SPARK-16965

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14555
    
----
commit 57b9c84a727f6bbc6bdc28148696c8c985408f80
Author: Jeff Zhang <zj...@apache.org>
Date:   2016-08-09T05:30:34Z

    [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74208842
  
    --- Diff: python/pyspark/ml/linalg/__init__.py ---
    @@ -511,6 +519,12 @@ def __init__(self, size, *args):
                             "Indices %s and %s are not strictly increasing"
                             % (self.indices[i], self.indices[i + 1]))
     
    +        assert np.max(self.indices) < self.size, \
    --- End diff --
    
    Will this disallow a size 0 vector? not sure what np.max does here.
    PS I like its error message better for "not strictly increasing" and we should probably make the messages generally consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74395330
  
    --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
    @@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
         }
       }
     
    +  test("sparse vector construction with negative indices") {
    +    intercept[IllegalArgumentException] {
    +      Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
    +    }
    +  }
    +
    --- End diff --
    
    Lastly, I believe, do we have a test that exercises the check for missorted indices? I don't see one; I suppose one should have been added before.
    
    Same for Python.
    
    I'm OK with the error messages, but now seeing that Python has parallel checks, it'd be nice to make the error messages match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    OK, one last over-arching comment. I know the arg validation logic was split over two methods, and I think it was on purpose, so that calls to the `SparseVector` constructor didn't pay the full overhead of checking all the values. (CC @jkbradley did we talk about this a long time ago?) I think that may be something desirable to leave in place, even if it won't matter to callers of `Vectors.sparse`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63530/consoleFull)** for PR 14555 at commit [`6ebd4da`](https://github.com/apache/spark/commit/6ebd4da8c447436d0d7a96676cfad6c5bc08d50f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74021332
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    +
    +  private def validate(): Unit = {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    --- End diff --
    
    This allows a size 0 vector now. I guess that's good, because `DenseVector` allows this (a 0 length array).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74021092
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    They wouldn't become fields unless used outside the constructor. You can also use a simple scope `{...}` to guard against this. I understand the argument and don't feel strongly either way, but we don't do this in other code in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74208246
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,26 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  // validate the data
    +  {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    +    require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    +      s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    +      s" ${values.length} values.")
    +    require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    +      s"which exceeds the specified vector size ${size}.")
    +
    +    var prev = -1
    +    if (indices.nonEmpty) {
    +      require(indices(0) >= 0, s"Found negative index: ${indices(0)}.")
    +    }
    +    indices.foreach { i =>
    +      require(prev < i, s"Found duplicate index: $i.")
    --- End diff --
    
    These are just nits now, but `prev` should be declared just before the loop where it's used. This error message could be more general, since it will catch the case that the indices aren't sorted. It could say `s"Index $i follows $prev and is not strictly increasing"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74208355
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,26 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  // validate the data
    +  {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    +    require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    +      s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    +      s" ${values.length} values.")
    +    require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    +      s"which exceeds the specified vector size ${size}.")
    +
    +    var prev = -1
    +    if (indices.nonEmpty) {
    +      require(indices(0) >= 0, s"Found negative index: ${indices(0)}.")
    +    }
    +    indices.foreach { i =>
    +      require(prev < i, s"Found duplicate index: $i.")
    +      prev = i
    +    }
    +    require(prev < size, s"You may not write an element to index $prev because the declared " +
    --- End diff --
    
    Maybe more direct as `s"Index $prev out of bounds for vector of size $size"`? the way it starts it seems like the problem is something being read-only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by adamzjw <gi...@git.apache.org>.
Github user adamzjw commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Hey, all.  One of my colleagues run into a bug related to the discussion above.
    
    He called the Vectors.sparse(size: Int, indices: Array[Int], values: Array[Double]) without noticing that the indices are assumed to be ordered. And the vector he created has all value of 0, if we try to get value via apply method. However, SparseVector.toArray simply generate a array using a for loop that is order insensitive. Hence, you will get a all-0 vector when you call apply method, while you can get correct result using toArray or toDense method.
    
    The lack of indices validation in a public method could be quite problematic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63603/consoleFull)** for PR 14555 at commit [`b291423`](https://github.com/apache/spark/commit/b291423eedb0d29e9006e2db93f39a1a4af9dfb5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74396667
  
    --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
    @@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
         }
       }
     
    +  test("sparse vector construction with negative indices") {
    +    intercept[IllegalArgumentException] {
    +      Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
    +    }
    +  }
    +
    --- End diff --
    
    For now, it is assumed the indices are in order. https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L554


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74005184
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    What's the value in this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63530/consoleFull)** for PR 14555 at commit [`6ebd4da`](https://github.com/apache/spark/commit/6ebd4da8c447436d0d7a96676cfad6c5bc08d50f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63511/consoleFull)** for PR 14555 at commit [`111cb28`](https://github.com/apache/spark/commit/111cb28af34d2b701905cc46696e4f8da373c2bd).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74006057
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    What do you mean ? This method would be called when SparseVector is created, it can refer any variable in SparseVector. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63603/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74005178
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    +
    +  private def validate(): Unit = {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    +    require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    +      s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    +      s" ${values.length} values.")
    +    require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    +      s"which exceeds the specified vector size ${size}.")
    +
    +    var prev = -1
    +    indices.foreach { i =>
    +      require(i >= 0, s"Found negative indice: $i.")
    --- End diff --
    
    index, not indice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #64059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64059/consoleFull)** for PR 14555 at commit [`b291423`](https://github.com/apache/spark/commit/b291423eedb0d29e9006e2db93f39a1a4af9dfb5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74022257
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    +
    +  private def validate(): Unit = {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    --- End diff --
    
    Yes, I do see test for 0 length vector. 
    https://github.com/apache/spark/blob/master/mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala#L81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Ping @zjffdu 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74021856
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    I also thought about `{...}`,  just feel putting into one method is better. Anyway I can do that way if this is not proper for spark code style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63416/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #64059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64059/consoleFull)** for PR 14555 at commit [`b291423`](https://github.com/apache/spark/commit/b291423eedb0d29e9006e2db93f39a1a4af9dfb5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63518/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63416/consoleFull)** for PR 14555 at commit [`57b9c84`](https://github.com/apache/spark/commit/57b9c84a727f6bbc6bdc28148696c8c985408f80).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63511/consoleFull)** for PR 14555 at commit [`111cb28`](https://github.com/apache/spark/commit/111cb28af34d2b701905cc46696e4f8da373c2bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    That's a good point, it's public now. I think I agree with you then, it's fairly important to retain the validation there. It would also be inconsistent to move it back, yes, because I see the other factory methods don't do the same validation.
    
    I suppose that if it were a big issue we could find a way to expose a private alternative constructor that skips validation where internal code knows the inputs are valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74009150
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    2 reasons
    * group the validation code together
    * I may define some temp variable for validation, without method it would become variable of SparseVector


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    @srowen Do you mean putting the validation back to method Vectors.sparse() ?  The constructor of SparseVector is public, so I think we should keep the validation in SparseVector rather than its factory Vectors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63530/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    I think this is about ready, though to be conservative, I think we need to put the validation back where it was, so that the more expensive check only happens in Vectors.sparse and not the constructor. And then add tests accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74006226
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    --- End diff --
    
    Sure, but why bother writing a method? it's invoked once directly above. This is just constructor code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63511/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63416/consoleFull)** for PR 14555 at commit [`57b9c84`](https://github.com/apache/spark/commit/57b9c84a727f6bbc6bdc28148696c8c985408f80).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63518/consoleFull)** for PR 14555 at commit [`7c01da8`](https://github.com/apache/spark/commit/7c01da8085416673e7abf77b9e45107daf80fcf1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64059/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74397043
  
    --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
    @@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
         }
       }
     
    +  test("sparse vector construction with negative indices") {
    +    intercept[IllegalArgumentException] {
    +      Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
    +    }
    +  }
    +
    --- End diff --
    
    ... but https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L210 does not assume this and that's the method to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Last call for @mengxr or @jkbradley -- just want to make sure one's OK. Certainly good for correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63603/consoleFull)** for PR 14555 at commit [`b291423`](https://github.com/apache/spark/commit/b291423eedb0d29e9006e2db93f39a1a4af9dfb5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14555


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74005289
  
    --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
         @Since("2.0.0") val indices: Array[Int],
         @Since("2.0.0") val values: Array[Double]) extends Vector {
     
    -  require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    -    s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    -    s" ${values.length} values.")
    -  require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    -    s"which exceeds the specified vector size ${size}.")
    +  validate()
    +
    +  private def validate(): Unit = {
    +    require(size >= 0, "The size of the requested sparse vector must be greater than 0.")
    +    require(indices.length == values.length, "Sparse vectors require that the dimension of the" +
    +      s" indices match the dimension of the values. You provided ${indices.length} indices and " +
    +      s" ${values.length} values.")
    +    require(indices.length <= size, s"You provided ${indices.length} indices and values, " +
    +      s"which exceeds the specified vector size ${size}.")
    +
    +    var prev = -1
    +    indices.foreach { i =>
    +      require(i >= 0, s"Found negative indice: $i.")
    +      require(prev < i, s"Found duplicate indices: $i.")
    +      prev = i
    +    }
    +    require(prev < size, s"You may not write an element to index $prev because the declared " +
    --- End diff --
    
    If you're doing it this way, then just check if the first index is >= 0. If any of them is, it will be.
    This message could be a little more straightforward: "found index that exceeds size" or something


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14555#discussion_r74208436
  
    --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
    @@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
         }
       }
     
    +  test("sparse vector construction with negative indices") {
    +    intercept[IllegalArgumentException] {
    +      Vectors.sparse(3, Array(1, -2), Array(3.0, 5.0))
    --- End diff --
    
    This actually won't test the check for negative indices, because the indices are missorted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    **[Test build #63518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63518/consoleFull)** for PR 14555 at commit [`7c01da8`](https://github.com/apache/spark/commit/7c01da8085416673e7abf77b9e45107daf80fcf1).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14555
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org