You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/10/06 07:08:00 UTC

[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures

    [ https://issues.apache.org/jira/browse/SPARK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208523#comment-17208523 ] 

Apache Spark commented on SPARK-33073:
--------------------------------------

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/29951

> Improve error handling on Pandas to Arrow conversion failures
> -------------------------------------------------------------
>
>                 Key: SPARK-33073
>                 URL: https://issues.apache.org/jira/browse/SPARK-33073
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.0.1
>            Reporter: Bryan Cutler
>            Priority: Major
>
> Currently, when converting from Pandas to Arrow for Pandas UDF return values or from createDataFrame(), PySpark will catch all ArrowExceptions and display info on how to disable the safe conversion config. This is displayed with the original error as a tuple:
> {noformat}
> ('Exception thrown when converting pandas.Series (object) to Arrow Array (int32). It can be caused by overflows or other unsafe conversions warned by Arrow. Arrow safe type check can be disabled by using SQL config `spark.sql.execution.pandas.convertToArrowArraySafely`.', ArrowInvalid('Could not convert a with type str: tried to convert to int'))
> {noformat}
> The problem is that this is meant mainly for thing like float truncation or overflow, but will also show if the user has an invalid schema with types that are incompatible. The extra information is confusing in this case and the real error is buried.
> This could be improved by only printing the extra info on how to disable safe checking if the config is actually set and using exception chaining to better show the original error. Also, any safe failures would be a ValueError, which ArrowInvaildError is a subclass, so the catch could be made more narrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org