You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hzlyx <gi...@git.apache.org> on 2014/12/24 16:07:28 UTC

[GitHub] spark pull request: Vectors.sparse() add support to unsorted indic...

GitHub user hzlyx opened a pull request:

    https://github.com/apache/spark/pull/3791

    Vectors.sparse() add support to unsorted indices

    For original method, when the indices is not strictly increasing, the sparse vector can be created without any warning or error. But when use apply() method, only zero will be returned.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hzlyx/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3791
    
----
commit fa3df2406cea3b73e3058510d0a6c5e4098dc22a
Author: Yuxi Liao <li...@huawei.com>
Date:   2014-12-24T14:54:15Z

    Vectors.sparse() add support to unsorted indices
    
    For original method, when the indices is not strictly increasing, the sparse vector can be created without any warning or error. But when use apply() method, only zero will be returned.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3791#discussion_r22257875
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala ---
    @@ -173,11 +173,13 @@ object Vectors {
        * Creates a sparse vector providing its index array and value array.
        *
        * @param size vector size.
    -   * @param indices index array, must be strictly increasing.
    -   * @param values value array, must have the same length as indices.
    +   * @param indices index array.
    +   * @param values value array.
        */
    -  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector =
    -    new SparseVector(size, indices, values)
    +  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector = {
    +    val (newIndices, newValues) = indices.zip(values).sortBy(_._1).unzip
    --- End diff --
    
    This is non-trivial overhead to introduce every time a vector is made, when the common case is that the indices are sorted, and the other cases are really caller error. I'd still suggest merely checking the sorting. There are lots of one-liners but this may be among the most efficient `require((1 until indices.length).forall(i => indices(i-1) <= indices(i)))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3791


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3791#issuecomment-68057633
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Posted by hzlyx <gi...@git.apache.org>.
Github user hzlyx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3791#discussion_r22268406
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala ---
    @@ -173,11 +173,13 @@ object Vectors {
        * Creates a sparse vector providing its index array and value array.
        *
        * @param size vector size.
    -   * @param indices index array, must be strictly increasing.
    -   * @param values value array, must have the same length as indices.
    +   * @param indices index array.
    +   * @param values value array.
        */
    -  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector =
    -    new SparseVector(size, indices, values)
    +  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector = {
    +    val (newIndices, newValues) = indices.zip(values).sortBy(_._1).unzip
    --- End diff --
    
    I agree with your method. At least a error should be reported, otherwise it's hard to locate problems in applications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [MLlib]Vectors.sparse() add support to unsorte...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/3791#issuecomment-68279740
  
    @hzlyx As @srowen mentioned, this is a contract to avoid the expensive check. You can use 
    
    https://github.com/hzlyx/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L191
    
    if the indices are not ordered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org