You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Darcy Shen <sa...@zoho.com.cn.INVALID> on 2021/04/25 04:13:06 UTC
Correctness Issue for UDT Support in PySpark
There is a correctness in the following code snippet. (https://issues.apache.org/jira/browse/SPARK-35211)
```
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")
from pyspark.testing.sqlutils import ExamplePoint
import pandas as pd
pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 2)])})
df = spark.createDataFrame(pdf, verifySchema=False)
df.show()
```
I created two pr to resolve it:
PR 1 of 2: for inferred schema, also perform schema verification
https://github.com/apache/spark/pull/32320
PR 2 of 2: with schema verification disabled, do number conversion properly
https://github.com/apache/spark/pull/32327
Hope to get them reviewed.
BTW
And for UDT Support in PySpark, besides correctness issue, arrow support is also missing. (https://issues.apache.org/jira/browse/SPARK-34771) I've created a PR to solve it.