You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2018/07/07 03:39:37 UTC
spark git commit: [SPARK-24740][PYTHON][ML] Make PySpark's tests
compatible with NumPy 1.14+
Repository: spark
Updated Branches:
refs/heads/master 74f6a92fc -> 044b33b2e
[SPARK-24740][PYTHON][ML] Make PySpark's tests compatible with NumPy 1.14+
## What changes were proposed in this pull request?
This PR proposes to make PySpark's tests compatible with NumPy 0.14+
NumPy 0.14.x introduced rather radical changes about its string representation.
For example, the tests below are failed:
```
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 895, in __main__.DenseMatrix.__str__
Failed example:
print(dm)
Expected:
DenseMatrix([[ 0., 2.],
[ 1., 3.]])
Got:
DenseMatrix([[0., 2.],
[1., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 899, in __main__.DenseMatrix.__str__
Failed example:
print(dm)
Expected:
DenseMatrix([[ 0., 1.],
[ 2., 3.]])
Got:
DenseMatrix([[0., 1.],
[2., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 939, in __main__.DenseMatrix.toArray
Failed example:
m.toArray()
Expected:
array([[ 0., 2.],
[ 1., 3.]])
Got:
array([[0., 2.],
[1., 3.]])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 324, in __main__.DenseVector.dot
Failed example:
dense.dot(np.reshape([1., 2., 3., 4.], (2, 2), order='F'))
Expected:
array([ 5., 11.])
Got:
array([ 5., 11.])
**********************************************************************
File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 567, in __main__.SparseVector.dot
Failed example:
a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]))
Expected:
array([ 22., 22.])
Got:
array([22., 22.])
```
See [release note](https://docs.scipy.org/doc/numpy-1.14.0/release.html#compatibility-notes).
## How was this patch tested?
Manually tested:
```
$ ./run-tests --python-executables=python3.6,python2.7 --modules=pyspark-ml,pyspark-mllib
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python3.6', 'python2.7']
Will test the following Python modules: ['pyspark-ml', 'pyspark-mllib']
Starting test(python2.7): pyspark.mllib.tests
Starting test(python2.7): pyspark.ml.classification
Starting test(python3.6): pyspark.mllib.tests
Starting test(python2.7): pyspark.ml.clustering
Finished test(python2.7): pyspark.ml.clustering (54s)
Starting test(python2.7): pyspark.ml.evaluation
Finished test(python2.7): pyspark.ml.classification (74s)
Starting test(python2.7): pyspark.ml.feature
Finished test(python2.7): pyspark.ml.evaluation (27s)
Starting test(python2.7): pyspark.ml.fpm
Finished test(python2.7): pyspark.ml.fpm (0s)
Starting test(python2.7): pyspark.ml.image
Finished test(python2.7): pyspark.ml.image (17s)
Starting test(python2.7): pyspark.ml.linalg.__init__
Finished test(python2.7): pyspark.ml.linalg.__init__ (1s)
Starting test(python2.7): pyspark.ml.recommendation
Finished test(python2.7): pyspark.ml.feature (76s)
Starting test(python2.7): pyspark.ml.regression
Finished test(python2.7): pyspark.ml.recommendation (69s)
Starting test(python2.7): pyspark.ml.stat
Finished test(python2.7): pyspark.ml.regression (45s)
Starting test(python2.7): pyspark.ml.tests
Finished test(python2.7): pyspark.ml.stat (28s)
Starting test(python2.7): pyspark.ml.tuning
Finished test(python2.7): pyspark.ml.tuning (20s)
Starting test(python2.7): pyspark.mllib.classification
Finished test(python2.7): pyspark.mllib.classification (31s)
Starting test(python2.7): pyspark.mllib.clustering
Finished test(python2.7): pyspark.mllib.tests (260s)
Starting test(python2.7): pyspark.mllib.evaluation
Finished test(python3.6): pyspark.mllib.tests (266s)
Starting test(python2.7): pyspark.mllib.feature
Finished test(python2.7): pyspark.mllib.evaluation (21s)
Starting test(python2.7): pyspark.mllib.fpm
Finished test(python2.7): pyspark.mllib.feature (38s)
Starting test(python2.7): pyspark.mllib.linalg.__init__
Finished test(python2.7): pyspark.mllib.linalg.__init__ (1s)
Starting test(python2.7): pyspark.mllib.linalg.distributed
Finished test(python2.7): pyspark.mllib.fpm (34s)
Starting test(python2.7): pyspark.mllib.random
Finished test(python2.7): pyspark.mllib.clustering (64s)
Starting test(python2.7): pyspark.mllib.recommendation
Finished test(python2.7): pyspark.mllib.random (15s)
Starting test(python2.7): pyspark.mllib.regression
Finished test(python2.7): pyspark.mllib.linalg.distributed (47s)
Starting test(python2.7): pyspark.mllib.stat.KernelDensity
Finished test(python2.7): pyspark.mllib.stat.KernelDensity (0s)
Starting test(python2.7): pyspark.mllib.stat._statistics
Finished test(python2.7): pyspark.mllib.recommendation (40s)
Starting test(python2.7): pyspark.mllib.tree
Finished test(python2.7): pyspark.mllib.regression (38s)
Starting test(python2.7): pyspark.mllib.util
Finished test(python2.7): pyspark.mllib.stat._statistics (19s)
Starting test(python3.6): pyspark.ml.classification
Finished test(python2.7): pyspark.mllib.tree (26s)
Starting test(python3.6): pyspark.ml.clustering
Finished test(python2.7): pyspark.mllib.util (27s)
Starting test(python3.6): pyspark.ml.evaluation
Finished test(python3.6): pyspark.ml.evaluation (30s)
Starting test(python3.6): pyspark.ml.feature
Finished test(python2.7): pyspark.ml.tests (234s)
Starting test(python3.6): pyspark.ml.fpm
Finished test(python3.6): pyspark.ml.fpm (1s)
Starting test(python3.6): pyspark.ml.image
Finished test(python3.6): pyspark.ml.clustering (55s)
Starting test(python3.6): pyspark.ml.linalg.__init__
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Starting test(python3.6): pyspark.ml.recommendation
Finished test(python3.6): pyspark.ml.classification (71s)
Starting test(python3.6): pyspark.ml.regression
Finished test(python3.6): pyspark.ml.image (18s)
Starting test(python3.6): pyspark.ml.stat
Finished test(python3.6): pyspark.ml.stat (37s)
Starting test(python3.6): pyspark.ml.tests
Finished test(python3.6): pyspark.ml.regression (59s)
Starting test(python3.6): pyspark.ml.tuning
Finished test(python3.6): pyspark.ml.feature (93s)
Starting test(python3.6): pyspark.mllib.classification
Finished test(python3.6): pyspark.ml.recommendation (83s)
Starting test(python3.6): pyspark.mllib.clustering
Finished test(python3.6): pyspark.ml.tuning (29s)
Starting test(python3.6): pyspark.mllib.evaluation
Finished test(python3.6): pyspark.mllib.evaluation (26s)
Starting test(python3.6): pyspark.mllib.feature
Finished test(python3.6): pyspark.mllib.classification (43s)
Starting test(python3.6): pyspark.mllib.fpm
Finished test(python3.6): pyspark.mllib.clustering (81s)
Starting test(python3.6): pyspark.mllib.linalg.__init__
Finished test(python3.6): pyspark.mllib.linalg.__init__ (2s)
Starting test(python3.6): pyspark.mllib.linalg.distributed
Finished test(python3.6): pyspark.mllib.fpm (48s)
Starting test(python3.6): pyspark.mllib.random
Finished test(python3.6): pyspark.mllib.feature (54s)
Starting test(python3.6): pyspark.mllib.recommendation
Finished test(python3.6): pyspark.mllib.random (18s)
Starting test(python3.6): pyspark.mllib.regression
Finished test(python3.6): pyspark.mllib.linalg.distributed (55s)
Starting test(python3.6): pyspark.mllib.stat.KernelDensity
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (1s)
Starting test(python3.6): pyspark.mllib.stat._statistics
Finished test(python3.6): pyspark.mllib.recommendation (51s)
Starting test(python3.6): pyspark.mllib.tree
Finished test(python3.6): pyspark.mllib.regression (45s)
Starting test(python3.6): pyspark.mllib.util
Finished test(python3.6): pyspark.mllib.stat._statistics (21s)
Finished test(python3.6): pyspark.mllib.tree (27s)
Finished test(python3.6): pyspark.mllib.util (27s)
Finished test(python3.6): pyspark.ml.tests (264s)
```
Author: hyukjinkwon <gu...@apache.org>
Closes #21715 from HyukjinKwon/SPARK-24740.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/044b33b2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/044b33b2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/044b33b2
Branch: refs/heads/master
Commit: 044b33b2ed2d423d798f2a632fab110c46f41567
Parents: 74f6a92
Author: hyukjinkwon <gu...@apache.org>
Authored: Sat Jul 7 11:39:29 2018 +0800
Committer: hyukjinkwon <gu...@apache.org>
Committed: Sat Jul 7 11:39:29 2018 +0800
----------------------------------------------------------------------
python/pyspark/ml/clustering.py | 6 ++++++
python/pyspark/ml/linalg/__init__.py | 5 +++++
python/pyspark/ml/stat.py | 6 ++++++
python/pyspark/mllib/clustering.py | 6 ++++++
python/pyspark/mllib/evaluation.py | 6 ++++++
python/pyspark/mllib/linalg/__init__.py | 6 ++++++
python/pyspark/mllib/linalg/distributed.py | 6 ++++++
python/pyspark/mllib/stat/_statistics.py | 6 ++++++
8 files changed, 47 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/ml/clustering.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/clustering.py b/python/pyspark/ml/clustering.py
index 6d77baf..2f06600 100644
--- a/python/pyspark/ml/clustering.py
+++ b/python/pyspark/ml/clustering.py
@@ -1345,8 +1345,14 @@ class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, JavaMLReada
if __name__ == "__main__":
import doctest
+ import numpy
import pyspark.ml.clustering
from pyspark.sql import SparkSession
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = pyspark.ml.clustering.__dict__.copy()
# The small batch size here ensures that we see multiple batches,
# even in these small test examples:
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/ml/linalg/__init__.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/linalg/__init__.py b/python/pyspark/ml/linalg/__init__.py
index 6a611a2..2548fd0 100644
--- a/python/pyspark/ml/linalg/__init__.py
+++ b/python/pyspark/ml/linalg/__init__.py
@@ -1156,6 +1156,11 @@ class Matrices(object):
def _test():
import doctest
+ try:
+ # Numpy 1.14+ changed it's string format.
+ np.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
(failure_count, test_count) = doctest.testmod(optionflags=doctest.ELLIPSIS)
if failure_count:
sys.exit(-1)
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/ml/stat.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/stat.py b/python/pyspark/ml/stat.py
index a06ab31..370154f 100644
--- a/python/pyspark/ml/stat.py
+++ b/python/pyspark/ml/stat.py
@@ -388,8 +388,14 @@ class SummaryBuilder(JavaWrapper):
if __name__ == "__main__":
import doctest
+ import numpy
import pyspark.ml.stat
from pyspark.sql import SparkSession
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = pyspark.ml.stat.__dict__.copy()
# The small batch size here ensures that we see multiple batches,
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/mllib/clustering.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/clustering.py b/python/pyspark/mllib/clustering.py
index 0cbabab..b09469b 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -1042,7 +1042,13 @@ class LDA(object):
def _test():
import doctest
+ import numpy
import pyspark.mllib.clustering
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = pyspark.mllib.clustering.__dict__.copy()
globs['sc'] = SparkContext('local[4]', 'PythonTest', batchSize=2)
(failure_count, test_count) = doctest.testmod(globs=globs, optionflags=doctest.ELLIPSIS)
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/mllib/evaluation.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/evaluation.py b/python/pyspark/mllib/evaluation.py
index 36cb033..6c65da5 100644
--- a/python/pyspark/mllib/evaluation.py
+++ b/python/pyspark/mllib/evaluation.py
@@ -532,8 +532,14 @@ class MultilabelMetrics(JavaModelWrapper):
def _test():
import doctest
+ import numpy
from pyspark.sql import SparkSession
import pyspark.mllib.evaluation
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = pyspark.mllib.evaluation.__dict__.copy()
spark = SparkSession.builder\
.master("local[4]")\
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/mllib/linalg/__init__.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/linalg/__init__.py b/python/pyspark/mllib/linalg/__init__.py
index 60d96d8..4afd666 100644
--- a/python/pyspark/mllib/linalg/__init__.py
+++ b/python/pyspark/mllib/linalg/__init__.py
@@ -1368,6 +1368,12 @@ class QRDecomposition(object):
def _test():
import doctest
+ import numpy
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
(failure_count, test_count) = doctest.testmod(optionflags=doctest.ELLIPSIS)
if failure_count:
sys.exit(-1)
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/mllib/linalg/distributed.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/linalg/distributed.py b/python/pyspark/mllib/linalg/distributed.py
index bba8854..7e8b150 100644
--- a/python/pyspark/mllib/linalg/distributed.py
+++ b/python/pyspark/mllib/linalg/distributed.py
@@ -1364,9 +1364,15 @@ class BlockMatrix(DistributedMatrix):
def _test():
import doctest
+ import numpy
from pyspark.sql import SparkSession
from pyspark.mllib.linalg import Matrices
import pyspark.mllib.linalg.distributed
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = pyspark.mllib.linalg.distributed.__dict__.copy()
spark = SparkSession.builder\
.master("local[2]")\
http://git-wip-us.apache.org/repos/asf/spark/blob/044b33b2/python/pyspark/mllib/stat/_statistics.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/stat/_statistics.py b/python/pyspark/mllib/stat/_statistics.py
index 3c75b13..937bb15 100644
--- a/python/pyspark/mllib/stat/_statistics.py
+++ b/python/pyspark/mllib/stat/_statistics.py
@@ -303,7 +303,13 @@ class Statistics(object):
def _test():
import doctest
+ import numpy
from pyspark.sql import SparkSession
+ try:
+ # Numpy 1.14+ changed it's string format.
+ numpy.set_printoptions(legacy='1.13')
+ except TypeError:
+ pass
globs = globals().copy()
spark = SparkSession.builder\
.master("local[4]")\
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org