You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bhargav <gi...@git.apache.org> on 2015/10/12 02:18:25 UTC

[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

GitHub user bhargav opened a pull request:

    https://github.com/apache/spark/pull/9069

    [SPARK-11050][MLLIB] PySpark SparseVector can return wrong index in e…

    …rror message
    
    For negative indices in the SparseVector, we update the index value. If we have an incorrect index
    at this point, the error message has the incorrect *updated* index instead of the original one. This
    change contains the fix for the same.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bhargav/spark spark-10759

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9069
    
----
commit 316acac64b31894f0e15caa6d612d57aa17402e4
Author: Bhargav Mangipudi <bh...@gmail.com>
Date:   2015-10-11T23:10:21Z

    [SPARK-11050][MLLIB] PySpark SparseVector can return wrong index in error message
    
    For negative indices in the SparseVector, we update the index value. If we have an incorrect index
    at this point, the error message has the incorrect *updated* index instead of the original one. This
    change contains the fix for the same.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by bhargav <gi...@git.apache.org>.
Github user bhargav commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147488311
  
    Updated the PR. Though I now see that the branch name is different from the JIRA issue number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-148842155
  
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147281748
  
    It'd be _really_ fun to try using Hypothesis to write some property-based tests for this: https://hypothesis.readthedocs.org/en/master/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-148788943
  
      [Test build #1916 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1916/consoleFull) for   PR 9069 at commit [`19bc764`](https://github.com/apache/spark/commit/19bc764bf62837764dc9b97f5b28c5d9535570d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147467958
  
      [Test build #1882 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1882/consoleFull) for   PR 9069 at commit [`316acac`](https://github.com/apache/spark/commit/316acac64b31894f0e15caa6d612d57aa17402e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-148794016
  
    Btw, it doesn't matter what you call your branch name; that's not a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147268489
  
    I've been seeing a lot of bugfix patches for the PySpark SparseVector recently. This suggests to me that we need to write significantly more tests for this component.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-148796105
  
      [Test build #1916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1916/console) for   PR 9069 at commit [`19bc764`](https://github.com/apache/spark/commit/19bc764bf62837764dc9b97f5b28c5d9535570d2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147472922
  
      [Test build #1882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1882/console) for   PR 9069 at commit [`316acac`](https://github.com/apache/spark/commit/316acac64b31894f0e15caa6d612d57aa17402e4).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ChildProcAppHandle implements SparkAppHandle `
      * `abstract class LauncherConnection implements Closeable, Runnable `
      * `final class LauncherProtocol `
      * `  static class Message implements Serializable `
      * `  static class Hello extends Message `
      * `  static class SetAppId extends Message `
      * `  static class SetState extends Message `
      * `  static class Stop extends Message `
      * `class LauncherServer implements Closeable `
      * `class NamedThreadFactory implements ThreadFactory `
      * `class OutputRedirector `
      * `public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable `
      * `  /** Run a function within Hive state (SessionState, HiveConf, Hive client and class loader) */`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by bhargav <gi...@git.apache.org>.
Github user bhargav commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147821728
  
    @jkbradley Gentle ping. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-148788590
  
    @bhargav Thanks for the update!  LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by bhargav <gi...@git.apache.org>.
Github user bhargav commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147281230
  
    @jkbradley for this.
    
    If there is a task for adding more tests, I can take that up as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147467355
  
    @JoshRosen I think it's that PySpark has less coverage than Scala, and our linear algebra code could use some more coverage too.  And PySpark SparseVector is in the intersection of these problems.   (Though 4 of those bug fix patches may have been patch + 3 backports.)
    
    Hypothesis looks cool...future work?  : )
    
    @bhargav There is not a specific task for adding more tests, but since you're interested, it'd be awesome if you could check through some of the PySpark Vector and Matrix APIs and unit tests and see if you can find missing coverage.  Note also that the tests are split between doc tests (in each .py file) and unit tests (in tests.py files).  If you find missing items, can you please make one or more JIRAs?  If you're unsure about any, I'd recommend making a single JIRA and listing there; we can then create subtasks for each major issue.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9069#discussion_r41779732
  
    --- Diff: python/pyspark/mllib/linalg/__init__.py ---
    @@ -764,10 +764,12 @@ def __getitem__(self, index):
             if not isinstance(index, int):
                 raise TypeError(
                     "Indices must be of type integer, got type %s" % type(index))
    +
    +        original_index = index
             if index < 0:
                 index += self.size
             if index >= self.size or index < 0:
    -            raise ValueError("Index %d out of bounds." % index)
    +            raise ValueError("Index %d out of bounds." % original_index)
    --- End diff --
    
    It might be better to move this check to line 768, before index gets modified.  That way, you don't have to even create a copy of index (though you'll need to adjust the check's index range).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11050][MLLIB] PySpark SparseVector can ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9069#issuecomment-147265465
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org