You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2018/02/28 15:44:18 UTC
spark git commit: [SPARK-23517][PYTHON] Make
`pyspark.util._exception_message` produce the trace from Java side by
Py4JJavaError
Repository: spark
Updated Branches:
refs/heads/master 6a8abe29e -> fab563b9b
[SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce the trace from Java side by Py4JJavaError
## What changes were proposed in this pull request?
This PR proposes for `pyspark.util._exception_message` to produce the trace from Java side by `Py4JJavaError`.
Currently, in Python 2, it uses `message` attribute which `Py4JJavaError` didn't happen to have:
```python
>>> from pyspark.util import _exception_message
>>> try:
... sc._jvm.java.lang.String(None)
... except Exception as e:
... pass
...
>>> e.message
''
```
Seems we should use `str` instead for now:
https://github.com/bartdag/py4j/blob/aa6c53b59027925a426eb09b58c453de02c21b7c/py4j-python/src/py4j/protocol.py#L412
but this doesn't address the problem with non-ascii string from Java side -
`https://github.com/bartdag/py4j/issues/306`
So, we could directly call `__str__()`:
```python
>>> e.__str__()
u'An error occurred while calling None.java.lang.String.\n: java.lang.NullPointerException\n\tat java.lang.String.<init>(String.java:588)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:422)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:214)\n\tat java.lang.Thread.run(Thread.java:745)\n'
```
which doesn't type coerce unicodes to `str` in Python 2.
This can be actually a problem:
```python
from pyspark.sql.functions import udf
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.range(1).select(udf(lambda x: [[]])()).toPandas()
```
**Before**
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
RuntimeError:
Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
```
**After**
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
RuntimeError: An error occurred while calling o47.collectAsArrowToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 1 times, most recent failure: Lost task 7.0 in stage 0.0 (TID 7, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/.../spark/python/pyspark/worker.py", line 245, in main
process()
File "/.../spark/python/pyspark/worker.py", line 240, in process
...
Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
```
## How was this patch tested?
Manually tested and unit tests were added.
Author: hyukjinkwon <gu...@gmail.com>
Closes #20680 from HyukjinKwon/SPARK-23517.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fab563b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fab563b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fab563b9
Branch: refs/heads/master
Commit: fab563b9bd1581112462c0fc0b299ad6510b6564
Parents: 6a8abe2
Author: hyukjinkwon <gu...@gmail.com>
Authored: Thu Mar 1 00:44:13 2018 +0900
Committer: hyukjinkwon <gu...@gmail.com>
Committed: Thu Mar 1 00:44:13 2018 +0900
----------------------------------------------------------------------
python/pyspark/tests.py | 11 +++++++++++
python/pyspark/util.py | 7 +++++++
2 files changed, 18 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/fab563b9/python/pyspark/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 5115857..9111dbb 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -2293,6 +2293,17 @@ class KeywordOnlyTests(unittest.TestCase):
self.assertEqual(b._x, 2)
+class UtilTests(PySparkTestCase):
+ def test_py4j_exception_message(self):
+ from pyspark.util import _exception_message
+
+ with self.assertRaises(Py4JJavaError) as context:
+ # This attempts java.lang.String(null) which throws an NPE.
+ self.sc._jvm.java.lang.String(None)
+
+ self.assertTrue('NullPointerException' in _exception_message(context.exception))
+
+
@unittest.skipIf(not _have_scipy, "SciPy not installed")
class SciPyTests(PySparkTestCase):
http://git-wip-us.apache.org/repos/asf/spark/blob/fab563b9/python/pyspark/util.py
----------------------------------------------------------------------
diff --git a/python/pyspark/util.py b/python/pyspark/util.py
index e5d332c..ad4a0bc 100644
--- a/python/pyspark/util.py
+++ b/python/pyspark/util.py
@@ -15,6 +15,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
+from py4j.protocol import Py4JJavaError
__all__ = []
@@ -33,6 +34,12 @@ def _exception_message(excp):
>>> msg == _exception_message(excp)
True
"""
+ if isinstance(excp, Py4JJavaError):
+ # 'Py4JJavaError' doesn't contain the stack trace available on the Java side in 'message'
+ # attribute in Python 2. We should call 'str' function on this exception in general but
+ # 'Py4JJavaError' has an issue about addressing non-ascii strings. So, here we work
+ # around by the direct call, '__str__()'. Please see SPARK-23517.
+ return excp.__str__()
if hasattr(excp, "message"):
return excp.message
return str(excp)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org