You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Darcy Shen <sa...@zoho.com.cn.INVALID> on 2021/04/25 04:13:06 UTC

Correctness Issue for UDT Support in PySpark

There is a correctness in the following code snippet. (https://issues.apache.org/jira/browse/SPARK-35211)

```

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

from pyspark.testing.sqlutils import ExamplePoint

import pandas as pd

pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 2)])})

df = spark.createDataFrame(pdf, verifySchema=False)

df.show()

```



I created two pr to resolve it:



PR 1 of 2: for inferred schema, also perform schema verification


https://github.com/apache/spark/pull/32320



PR 2 of 2: with schema verification disabled, do number conversion properly


https://github.com/apache/spark/pull/32327



Hope to get them reviewed.





BTW


And for UDT Support in PySpark, besides correctness issue, arrow support is also missing. (https://issues.apache.org/jira/browse/SPARK-34771) I've created a PR to solve it.