You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/10/28 15:36:00 UTC

[jira] [Resolved] (SPARK-33268) Fix bugs for casting data from/to PythonUserDefinedType

     [ https://issues.apache.org/jira/browse/SPARK-33268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-33268.
-----------------------------------
    Fix Version/s: 3.1.0
         Assignee: Takeshi Yamamuro
       Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/30169

> Fix bugs for casting data from/to PythonUserDefinedType
> -------------------------------------------------------
>
>                 Key: SPARK-33268
>                 URL: https://issues.apache.org/jira/browse/SPARK-33268
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.4.8, 3.0.2, 3.1.0
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>            Priority: Major
>             Fix For: 3.1.0
>
>
> This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;
> {code} 
> >>> from pyspark.sql import Row
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.types import *
> >>> from pyspark.testing.sqlutils import *
> >>> 
> >>> row = Row(point=ExamplePoint(1.0, 2.0))
> >>> df = spark.createDataFrame([row])
> >>> df.select(col("point").cast(PythonOnlyUDT()))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select
>     jdf = self._jdf.select(self._jcols(*cols))
>   File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
>   File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco
>     return f(*a, **kw)
>   File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
> : java.lang.NullPointerException
> 	at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
> 	at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
> 	at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
> 	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
> 	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
> {code} 
> A root cause of this issue is that, since {{PythonUserDefinedType#userClassis}} always null, {{isAssignableFrom}} in {{UserDefinedType#acceptsType}} throws a null exception. To fix it, this PR defines {{acceptsType}} in {{PythonUserDefinedType}} and filters out the null case in {{UserDefinedType#acceptsType}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org