You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/05/27 21:10:17 UTC
[jira] [Created] (SPARK-7902) SQL UDF doesn't support UDT in
PySpark
Xiangrui Meng created SPARK-7902:
------------------------------------
Summary: SQL UDF doesn't support UDT in PySpark
Key: SPARK-7902
URL: https://issues.apache.org/jira/browse/SPARK-7902
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
We don't convert Python SQL internal types to Python types in SQL UDF execution. This causes problems if the input arguments contain UDTs or the return type is a UDT. Right now, the raw SQL types are passed into the Python UDF and the return value is not converted to Python SQL types.
This is the code to produce this bug. (Actually, it triggers another bug first right now.)
{code}
from pyspark.mllib.linalg import SparseVector
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType
df = sqlContext.createDataFrame([(SparseVector(2, {0: 0.0}),)], ["features"])
sz = udf(lambda s: s.size, IntegerType())
df.select(sz(df.features).alias("sz")).collect()
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org