You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2019/04/23 18:24:00 UTC
[jira] [Created] (SPARK-27548) PySpark toLocalIterator does not
raise errors from worker
Bryan Cutler created SPARK-27548:
------------------------------------
Summary: PySpark toLocalIterator does not raise errors from worker
Key: SPARK-27548
URL: https://issues.apache.org/jira/browse/SPARK-27548
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.1
Reporter: Bryan Cutler
When using a PySpark RDD local iterator and an error occurs on the worker, it is not picked up by Py4J and raised in the Python driver process. So unless looking at logs, there is no way for the application to know the worker had an error. This is a test that should pass if the error is raised in the driver:
{code}
def test_to_local_iterator_error(self):
def fail(_):
raise RuntimeError("local iterator error")
rdd = self.sc.parallelize(range(10)).map(fail)
with self.assertRaisesRegexp(Exception, "local iterator error"):
for _ in rdd.toLocalIterator():
pass{code}
but it does not raise an exception:
{noformat}
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 428, in main
process()
File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 423, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 505, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/home/bryan/git/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper
return f(*args, **kwargs)
File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 742, in fail
raise RuntimeError("local iterator error")
RuntimeError: local iterator error
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:453)
...
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
FAIL
======================================================================
FAIL: test_to_local_iterator_error (pyspark.tests.test_rdd.RDDTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/bryan/git/spark/python/pyspark/tests/test_rdd.py", line 748, in test_to_local_iterator_error
pass
AssertionError: Exception not raised{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org