You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ka...@apache.org on 2014/06/18 22:16:32 UTC

git commit: [SPARK-1466] Raise exception if pyspark Gateway process doesn't start.

Repository: spark
Updated Branches:
  refs/heads/master dd96fcda0 -> 387024874


[SPARK-1466] Raise exception if pyspark Gateway process doesn't start.

If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging.

Thanks to @shivaram and @stogers for helping to fix this issue!

Author: Kay Ousterhout <ka...@gmail.com>

Closes #383 from kayousterhout/pyspark and squashes the following commits:

36dd54b [Kay Ousterhout] [SPARK-1466] Raise exception if Gateway process doesn't start.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38702487
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38702487
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38702487

Branch: refs/heads/master
Commit: 3870248740d83b0292ccca88a494ce19783847f0
Parents: dd96fcd
Author: Kay Ousterhout <ka...@gmail.com>
Authored: Wed Jun 18 13:16:26 2014 -0700
Committer: Kay Ousterhout <ka...@gmail.com>
Committed: Wed Jun 18 13:16:26 2014 -0700

----------------------------------------------------------------------
 python/pyspark/java_gateway.py | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/38702487/python/pyspark/java_gateway.py
----------------------------------------------------------------------
diff --git a/python/pyspark/java_gateway.py b/python/pyspark/java_gateway.py
index 91ae826..19235d5 100644
--- a/python/pyspark/java_gateway.py
+++ b/python/pyspark/java_gateway.py
@@ -43,12 +43,19 @@ def launch_gateway():
             # Don't send ctrl-c / SIGINT to the Java gateway:
             def preexec_func():
                 signal.signal(signal.SIGINT, signal.SIG_IGN)
-            proc = Popen(command, stdout=PIPE, stdin=PIPE, preexec_fn=preexec_func)
+            proc = Popen(command, stdout=PIPE, stdin=PIPE, stderr=PIPE, preexec_fn=preexec_func)
         else:
             # preexec_fn not supported on Windows
-            proc = Popen(command, stdout=PIPE, stdin=PIPE)
-        # Determine which ephemeral port the server started on:
-        gateway_port = int(proc.stdout.readline())
+            proc = Popen(command, stdout=PIPE, stdin=PIPE, stderr=PIPE)
+        
+        try:
+            # Determine which ephemeral port the server started on:
+            gateway_port = int(proc.stdout.readline())
+        except:
+            error_code = proc.poll()
+            raise Exception("Launching GatewayServer failed with exit code %d: %s" %
+                (error_code, "".join(proc.stderr.readlines())))
+
         # Create a thread to echo output from the GatewayServer, which is required
         # for Java log output to show up:
         class EchoOutputThread(Thread):