You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by jk...@apache.org on 2015/10/09 03:34:21 UTC

spark git commit: [SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception when we…

Repository: spark
Updated Branches:
  refs/heads/master 3390b400d -> 8e67882b9


[SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception when we…

__gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry

    from pyspark.mllib.linalg import Vectors
    sv = Vectors.sparse(5, {1: 3})
    sv[0]
    ## 0.0
    sv[1]
    ## 3.0
    sv[2]
    ## Traceback (most recent call last):
    ##   File "<stdin>", line 1, in <module>
    ##   File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__
    ##     row_ind = inds[insert_index]
    ## IndexError: index out of bounds

Author: zero323 <ma...@gmail.com>

Closes #9009 from zero323/sparse_vector_index_error.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e67882b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e67882b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e67882b

Branch: refs/heads/master
Commit: 8e67882b905683a1f151679214ef0b575e77c7e1
Parents: 3390b40
Author: zero323 <ma...@gmail.com>
Authored: Thu Oct 8 18:34:15 2015 -0700
Committer: Joseph K. Bradley <jo...@databricks.com>
Committed: Thu Oct 8 18:34:15 2015 -0700

----------------------------------------------------------------------
 python/pyspark/mllib/linalg/__init__.py |  3 +++
 python/pyspark/mllib/tests.py           | 12 +++++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/8e67882b/python/pyspark/mllib/linalg/__init__.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/linalg/__init__.py b/python/pyspark/mllib/linalg/__init__.py
index ea42127..d903b90 100644
--- a/python/pyspark/mllib/linalg/__init__.py
+++ b/python/pyspark/mllib/linalg/__init__.py
@@ -770,6 +770,9 @@ class SparseVector(Vector):
             raise ValueError("Index %d out of bounds." % index)
 
         insert_index = np.searchsorted(inds, index)
+        if insert_index >= inds.size:
+            return 0.
+
         row_ind = inds[insert_index]
         if row_ind == index:
             return vals[insert_index]

http://git-wip-us.apache.org/repos/asf/spark/blob/8e67882b/python/pyspark/mllib/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/tests.py b/python/pyspark/mllib/tests.py
index 96cf134..2a6a5cd 100644
--- a/python/pyspark/mllib/tests.py
+++ b/python/pyspark/mllib/tests.py
@@ -237,15 +237,17 @@ class VectorTests(MLlibTestCase):
         self.assertTrue(dv.array.dtype == 'float64')
 
     def test_sparse_vector_indexing(self):
-        sv = SparseVector(4, {1: 1, 3: 2})
+        sv = SparseVector(5, {1: 1, 3: 2})
         self.assertEqual(sv[0], 0.)
         self.assertEqual(sv[3], 2.)
         self.assertEqual(sv[1], 1.)
         self.assertEqual(sv[2], 0.)
-        self.assertEqual(sv[-1], 2)
-        self.assertEqual(sv[-2], 0)
-        self.assertEqual(sv[-4], 0)
-        for ind in [4, -5]:
+        self.assertEqual(sv[4], 0.)
+        self.assertEqual(sv[-1], 0.)
+        self.assertEqual(sv[-2], 2.)
+        self.assertEqual(sv[-3], 0.)
+        self.assertEqual(sv[-5], 0.)
+        for ind in [5, -6]:
             self.assertRaises(ValueError, sv.__getitem__, ind)
         for ind in [7.8, '1']:
             self.assertRaises(TypeError, sv.__getitem__, ind)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org