You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jan (Jira)" <ji...@apache.org> on 2019/08/29 08:37:00 UTC
[jira] [Created] (ARROW-6382) Unable to catch Python UDF exceptions
when using PyArrow
Jan created ARROW-6382:
--------------------------
Summary: Unable to catch Python UDF exceptions when using PyArrow
Key: ARROW-6382
URL: https://issues.apache.org/jira/browse/ARROW-6382
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.1
Environment: Ubuntu 18.04
Reporter: Jan
When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?
If so, what is the rationale. If not, how do I fix this?
Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.
To reproduce:
{{import pandas as pdfrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import udf
spark = SparkSession.builder.getOrCreate()# setting this to false will allow the exception to be caughtspark.conf.set("spark.sql.execution.arrow.enabled", "true")@udfdef disrupt(x):raise Exception("Test EXCEPTION")data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))try: test = data.withColumn("test", disrupt("A")).toPandas()except:print("exception caught")print('end')}}
I would hope there's a way to catch the exception with the general except clause.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)