You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/08/01 01:01:00 UTC

[jira] [Updated] (SPARK-39920) Interrupt on running spark job doesn't kill underneath spark job

     [ https://issues.apache.org/jira/browse/SPARK-39920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-39920:
---------------------------------
    Priority: Major  (was: Blocker)

> Interrupt on running spark job doesn't kill underneath spark job
> ----------------------------------------------------------------
>
>                 Key: SPARK-39920
>                 URL: https://issues.apache.org/jira/browse/SPARK-39920
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.2
>            Reporter: Neha Singla
>            Priority: Major
>         Attachments: interrupt_spark_job.ipynb
>
>
> If I start a long running spark job in jupyter notebook with pyspark kernel and hit interrupt while spark job is running. I get keyboard interrupt message but underneath spark job keeps running. Attaching my notebook file.
> I tried to create my own launcher and register a different signal handler for Interrupt, but that gives py4j error.
> {quote}from signal import signal, SIGINT, SIG_IGN
> from ipykernel.ipkernel import IPythonKernel
> from functools import partial
> from pyspark.sql import SparkSession
> import os
> class PyKernel(IPythonKernel):
> def pre_handler_hook(self):
> """Hook to execute before calling message handler"""
> self.log.info("PyKernel: Registering PreHandler Hook")
> self.saved_sigint_handler = signal(SIGINT, self.signal_handler)
> def post_handler_hook(self):
> """Hook to execute after calling message handler"""
> self.log.info("PyKernel: Registering PostHandler Hook")
> signal(SIGINT, self.saved_sigint_handler)
> def signal_handler(signal, frame):
> self.log.info("PyKernel: Registering Interrupt handler for PyKernel")
> SparkSession.builder.getOrCreate().sparkContext.cancelAllJobs()
> raise KeyboardInterrupt()
> {quote}
> Here is the error message:
> {quote}ERROR:root:Exception while sending command. Traceback (most recent call last): File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 511, in send_command answer = smart_decode(self.stream.readline()[:-1]) RuntimeError: reentrant call inside <_io.BufferedReader name=71> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1038, in send_command response = connection.send_command(command) File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 540, in send_command "Error while sending or receiving", e, proto.ERROR_ON_RECEIVE) py4j.protocol.Py4JNetworkError: Error while sending or receiving ERROR:root:Exception while sending command. Traceback (most recent call last): File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 511, in send_command answer = smart_decode(self.stream.readline()[:-1]) File "/app/.runtimes/python/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) File "/app/.runtimes/python/lib/python3.7/site-packages/kernel/pykernel.py", line 32, in signal_handler SparkSession.builder.getOrCreate().sparkContext.cancelAllJobs() File "/mnt/aci-spark/app/apache-spark/python/lib/pyspark.zip/pyspark/context.py", line 1195, in cancelAllJobs self._jsc.sc().cancelAllJobs() File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1322, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/mnt/aci-spark/app/apache-spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 336, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o134.sc During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1038, in send_command response = connection.send_command(command) File "/mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 540, in send_command "Error while sending or receiving", e, proto.ERROR_ON_RECEIVE) py4j.protocol.Py4JNetworkError: Error while sending or receiving{quote}
> {quote} 
> --------------------------------------------------------------------------- Py4JError Traceback (most recent call last) /tmp/ipykernel_177/3554383046.py in <module> ----> 1 df1.join(df2, df1.id1 == df2.id2, 'inner').show(5, False) /mnt/aci-spark/app/apache-spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in show(self, n, truncate, vertical)  500 "Parameter 'truncate={}' should be either bool or int.".format(truncate))  501 --> 502 print(self._jdf.showString(n, int_truncate, vertical))  503  504 def __repr__(self): /mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)  1320 answer = self.gateway_client.send_command(command)  1321 return_value = get_return_value( -> 1322 answer, self.gateway_client, self.target_id, self.name)  1323  1324 for temp_arg in temp_args: /mnt/aci-spark/app/apache-spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)  109 def deco(*a, **kw):  110 try: --> 111 return f(*a, **kw)  112 except py4j.protocol.Py4JJavaError as e:  113 converted = convert_exception(e.java_exception) /mnt/aci-spark/app/apache-spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)  334 raise Py4JError(  335 "An error occurred while calling \{0}{1}\{2}". --> 336 format(target_id, ".", name))  337 else:  338 type = answer[1] Py4JError: An error occurred while calling o247.showString{quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org