You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/11/13 02:57:49 UTC

[spark] branch branch-2.4 updated: [SPARK-33339][PYTHON][2.4] Pyspark application will hang due to non Exception error

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 0297d54  [SPARK-33339][PYTHON][2.4] Pyspark application will hang due to non Exception error
0297d54 is described below

commit 0297d5405dcc49609c0ac15742bcf8844c0a1194
Author: lrz <lr...@lrzdeMacBook-Pro.local>
AuthorDate: Thu Nov 12 18:54:54 2020 -0800

    [SPARK-33339][PYTHON][2.4] Pyspark application will hang due to non Exception error
    
    ### What changes were proposed in this pull request?
    Backport [SPARK-33339](https://github.com/apache/spark/pull/30248) fix to branch-2.4
    
    When a system.exit exception occurs during the process, the python worker exits abnormally, and then the executor task is still waiting for the worker for reading from socket, causing it to hang.
    The system.exit exception may be caused by the user's error code, but spark should at least throw an error to remind the user, not get stuck
    we can run a simple test to reproduce this case:
    
    ```
    from pyspark.sql import SparkSession
    def err(line):
      raise SystemExit
    spark = SparkSession.builder.appName("test").getOrCreate()
    spark.sparkContext.parallelize(range(1,2), 2).map(err).collect()
    spark.stop()
    ```
    
    ### Why are the changes needed?
    to make sure pyspark application won't hang if there's non-Exception error in python worker
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    added a new test and also manually tested the case above
    
    Closes #30361 from li36909/branch-2.4.
    
    Authored-by: lrz <lr...@lrzdeMacBook-Pro.local>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 python/pyspark/tests.py  | 9 +++++++++
 python/pyspark/worker.py | 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index 26c9126..ceaa8a7 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -2027,6 +2027,15 @@ class WorkerTests(ReusedPySparkTestCase):
             self.assertRaises(Exception, lambda: rdd.foreach(raise_exception))
         self.assertEqual(100, rdd.map(str).count())
 
+    def test_after_non_exception_error(self):
+        # SPARK-33339: Pyspark application will hang due to non Exception
+        def raise_system_exit(_):
+            raise SystemExit()
+        rdd = self.sc.parallelize(range(100), 1)
+        with QuietTest(self.sc):
+            self.assertRaises(Exception, lambda: rdd.foreach(raise_system_exit))
+        self.assertEqual(100, rdd.map(str).count())
+
     def test_after_jvm_exception(self):
         tempFile = tempfile.NamedTemporaryFile(delete=False)
         tempFile.write(b"Hello World!")
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 4b51046..6c7ad27 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -375,14 +375,14 @@ def main(infile, outfile):
             profiler.profile(process)
         else:
             process()
-    except Exception:
+    except BaseException:
         try:
             write_int(SpecialLengths.PYTHON_EXCEPTION_THROWN, outfile)
             write_with_length(traceback.format_exc().encode("utf-8"), outfile)
         except IOError:
             # JVM close the socket
             pass
-        except Exception:
+        except BaseException:
             # Write the error to stderr if it happened while serializing
             print("PySpark worker failed with exception:", file=sys.stderr)
             print(traceback.format_exc(), file=sys.stderr)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org