You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (Jira)" <ji...@apache.org> on 2020/10/28 12:23:00 UTC

[jira] [Created] (SPARK-33268) Fix bugs for casting data from/to PythonUserDefinedType

Takeshi Yamamuro created SPARK-33268:
----------------------------------------

             Summary: Fix bugs for casting data from/to PythonUserDefinedType
                 Key: SPARK-33268
                 URL: https://issues.apache.org/jira/browse/SPARK-33268
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 2.4.8, 3.0.2, 3.1.0
            Reporter: Takeshi Yamamuro


This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;

{code} 
>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import col
>>> from pyspark.sql.types import *
>>> from pyspark.testing.sqlutils import *
>>> 
>>> row = Row(point=ExamplePoint(1.0, 2.0))
>>> df = spark.createDataFrame([row])
>>> df.select(col("point").cast(PythonOnlyUDT()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select
    jdf = self._jdf.select(self._jcols(*cols))
  File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
: java.lang.NullPointerException
	at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
	at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
	at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
{code} 

A root cause of this issue is that, since {{PythonUserDefinedType#userClassis}} always null, {{isAssignableFrom}} in {{UserDefinedType#acceptsType}} throws a null exception. To fix it, this PR defines {{acceptsType}} in {{PythonUserDefinedType}} and filters out the null case in {{UserDefinedType#acceptsType}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org