You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2019/01/07 10:38:02 UTC
[spark] branch branch-2.4 updated: [SPARK-26559][ML][PYSPARK] ML
image can't work with numpy versions prior to 1.9
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new cb1aad6 [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9
cb1aad6 is described below
commit cb1aad69b781bf9612b9b14f5338b338344365f4
Author: Liang-Chi Hsieh <vi...@gmail.com>
AuthorDate: Mon Jan 7 18:36:52 2019 +0800
[SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 1.9
## What changes were proposed in this pull request?
Due to [API change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc) at 1.9, PySpark image doesn't work with numpy version prior to 1.9.
When running image test with numpy version prior to 1.9, we can see error:
```
test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR
test_read_images_multiple_times (pyspark.ml.tests.test_image.ImageReaderTest2) ... ok
======================================================================
ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", line 36, in test_read_images
self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), first_row)
File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", line 193, in toImage
data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'
----------------------------------------------------------------------
Ran 2 tests in 29.040s
FAILED (errors=1)
```
## How was this patch tested?
Manually test with numpy version prior and after 1.9.
Closes #23484 from viirya/fix-pyspark-image.
Authored-by: Liang-Chi Hsieh <vi...@gmail.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
(cherry picked from commit a927c764c1eee066efc1c2c713dfee411de79245)
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
python/pyspark/ml/image.py | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py
index edb90a3..a1aacea 100644
--- a/python/pyspark/ml/image.py
+++ b/python/pyspark/ml/image.py
@@ -28,6 +28,7 @@ import sys
import warnings
import numpy as np
+from distutils.version import LooseVersion
from pyspark import SparkContext
from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string
@@ -190,7 +191,11 @@ class _ImageSchema(object):
# Running `bytearray(numpy.array([1]))` fails in specific Python versions
# with a specific Numpy version, for example in Python 3.6.0 and NumPy 1.13.3.
# Here, it avoids it by converting it to bytes.
- data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+ if LooseVersion(np.__version__) >= LooseVersion('1.9'):
+ data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+ else:
+ # Numpy prior to 1.9 don't have `tobytes` method.
+ data = bytearray(array.astype(dtype=np.uint8).ravel())
# Creating new Row with _create_row(), because Row(name = value, ... )
# orders fields by name, which conflicts with expected schema order
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org