You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/10/28 15:36:00 UTC
[jira] [Resolved] (SPARK-33268) Fix bugs for casting data from/to
PythonUserDefinedType
[ https://issues.apache.org/jira/browse/SPARK-33268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-33268.
-----------------------------------
Fix Version/s: 3.1.0
Assignee: Takeshi Yamamuro
Resolution: Fixed
This is resolved via https://github.com/apache/spark/pull/30169
> Fix bugs for casting data from/to PythonUserDefinedType
> -------------------------------------------------------
>
> Key: SPARK-33268
> URL: https://issues.apache.org/jira/browse/SPARK-33268
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 2.4.8, 3.0.2, 3.1.0
> Reporter: Takeshi Yamamuro
> Assignee: Takeshi Yamamuro
> Priority: Major
> Fix For: 3.1.0
>
>
> This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;
> {code}
> >>> from pyspark.sql import Row
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.types import *
> >>> from pyspark.testing.sqlutils import *
> >>>
> >>> row = Row(point=ExamplePoint(1.0, 2.0))
> >>> df = spark.createDataFrame([row])
> >>> df.select(col("point").cast(PythonOnlyUDT()))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select
> jdf = self._jdf.select(self._jcols(*cols))
> File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
> File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco
> return f(*a, **kw)
> File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
> : java.lang.NullPointerException
> at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
> at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
> at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
> at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
> at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
> {code}
> A root cause of this issue is that, since {{PythonUserDefinedType#userClassis}} always null, {{isAssignableFrom}} in {{UserDefinedType#acceptsType}} throws a null exception. To fix it, this PR defines {{acceptsType}} in {{PythonUserDefinedType}} and filters out the null case in {{UserDefinedType#acceptsType}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org