You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2021/02/20 04:03:00 UTC

[jira] [Closed] (ARROW-6382) [Python] Unable to catch Spark Python UDF exceptions

     [ https://issues.apache.org/jira/browse/ARROW-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney closed ARROW-6382.
-------------------------------
    Resolution: Won't Fix

It's unclear this is a pyarrow problem — if this is still occurring in 2021 please provide more information about how pyarrow might be misbehaving

> [Python] Unable to catch Spark Python UDF exceptions
> ----------------------------------------------------
>
>                 Key: ARROW-6382
>                 URL: https://issues.apache.org/jira/browse/ARROW-6382
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>         Environment: Ubuntu 18.04
>            Reporter: Jan
>            Priority: Minor
>
> When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?
> If so, what is the rationale. If not, how do I fix this?
> Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.
> To reproduce:
> {code:java}
> import pandas as pd
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> # setting this to false will allow the exception to be caught
> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> @udfdef disrupt:
>     raise Exception("Test EXCEPTION")
> data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
> try: 
>     test = data.withColumn("test", disrupt("A")).toPandas()
> except:
>     print("exception caught")
> print('end'){code}
> I would hope there's a way to catch the exception with the general except clause.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)