You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2021/02/20 04:03:00 UTC
[jira] [Closed] (ARROW-6382) [Python] Unable to catch Spark Python
UDF exceptions
[ https://issues.apache.org/jira/browse/ARROW-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney closed ARROW-6382.
-------------------------------
Resolution: Won't Fix
It's unclear this is a pyarrow problem — if this is still occurring in 2021 please provide more information about how pyarrow might be misbehaving
> [Python] Unable to catch Spark Python UDF exceptions
> ----------------------------------------------------
>
> Key: ARROW-6382
> URL: https://issues.apache.org/jira/browse/ARROW-6382
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.1
> Environment: Ubuntu 18.04
> Reporter: Jan
> Priority: Minor
>
> When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?
> If so, what is the rationale. If not, how do I fix this?
> Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.
> To reproduce:
> {code:java}
> import pandas as pd
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> # setting this to false will allow the exception to be caught
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> @udfdef disrupt:
> raise Exception("Test EXCEPTION")
> data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
> try:
> test = data.withColumn("test", disrupt("A")).toPandas()
> except:
> print("exception caught")
> print('end'){code}
> I would hope there's a way to catch the exception with the general except clause.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)