You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2015/10/07 17:18:17 UTC

[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/9009

    [SPARK-10973] __gettitem__ method throws IndexError exception when we…

    __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry
    
        from pyspark.mllib.linalg import Vectors
        sv = Vectors.sparse(5, {1: 3})
        sv[0]
        ## 0.0
        sv[1]
        ## 3.0
        sv[2]
        ## Traceback (most recent call last):
        ##   File "<stdin>", line 1, in <module>
        ##   File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__
        ##     row_ind = inds[insert_index]
        ## IndexError: index out of bounds

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark sparse_vector_index_error

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9009
    
----
commit d28a644f6f65ddc48549766f6037dec2f1f1dc8d
Author: zero323 <ma...@gmail.com>
Date:   2015-10-07T11:18:15Z

    [SPARK-10973] __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147108493
  
    @jkbradley https://github.com/apache/spark/pull/9062, https://github.com/apache/spark/pull/9063, https://github.com/apache/spark/pull/9064


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9009


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146714417
  
    Thanks!  I'll merge this with master once tests pass.
    
    Would you be able to send PRs against branch-1.3, branch-1.4, branch-1.5 in order to backport this to previous Spark versions?  They can use the same JIRA number.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146228854
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146643732
  
    @zero323 Can you please add tags "[ML] [PYTHON]" to the title of this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147117437
  
    @zero323 Good point, that's better for sure.  Do you mind preparing a patch for 1.6 for that?  (I don't think it's necessary to backport it everywhere.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147308172
  
    Yes, please.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147114654
  
    It should be possible to push this check before binary search: 8a695fe2c3344acd19279fcd539177426d436a02 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146376151
  
    LGTM
    
    Ping @mengxr   FYI.  Also, which Spark versions are we patching?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146732821
  
    merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146644210
  
      [Test build #1859 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1859/consoleFull) for   PR 9009 at commit [`d28a644`](https://github.com/apache/spark/commit/d28a644f6f65ddc48549766f6037dec2f1f1dc8d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146374074
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146649020
  
      [Test build #1859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1859/console) for   PR 9009 at commit [`d28a644`](https://github.com/apache/spark/commit/d28a644f6f65ddc48549766f6037dec2f1f1dc8d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147753463
  
    @jkbradley It's my pleasure. I've created a [JIRA](https://issues.apache.org/jira/browse/SPARK-11084) and opened a [PR](https://github.com/apache/spark/pull/9098).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146689536
  
    @jkbradley Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146716786
  
      [Test build #1862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1862/console) for   PR 9009 at commit [`a1898ee`](https://github.com/apache/spark/commit/a1898ee172d1b3b4e8f69650edb2ecbc507f13d7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146714531
  
      [Test build #1862 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1862/consoleFull) for   PR 9009 at commit [`a1898ee`](https://github.com/apache/spark/commit/a1898ee172d1b3b4e8f69650edb2ecbc507f13d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973][ML][PYTHON] __gettitem__ method ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-147244101
  
    @jkbradley Sure, I can do it later this week. Should I open a new JIRA for that? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/9009#issuecomment-146373504
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10973] __gettitem__ method throws Index...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9009#discussion_r41464516
  
    --- Diff: python/pyspark/mllib/linalg/__init__.py ---
    @@ -770,6 +770,9 @@ def __getitem__(self, index):
                 raise ValueError("Index %d out of bounds." % index)
     
             insert_index = np.searchsorted(inds, index)
    +        if insert_index >= self.indices.size:
    --- End diff --
    
    May as well use ```inds``` here for clarity since that's what is used elsewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org