You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/05/27 21:10:17 UTC

[jira] [Created] (SPARK-7902) SQL UDF doesn't support UDT in PySpark

Xiangrui Meng created SPARK-7902:
------------------------------------

             Summary: SQL UDF doesn't support UDT in PySpark
                 Key: SPARK-7902
                 URL: https://issues.apache.org/jira/browse/SPARK-7902
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 1.4.0
            Reporter: Xiangrui Meng


We don't convert Python SQL internal types to Python types in SQL UDF execution. This causes problems if the input arguments contain UDTs or the return type is a UDT. Right now, the raw SQL types are passed into the Python UDF and the return value is not converted to Python SQL types.

This is the code to produce this bug. (Actually, it triggers another bug first right now.)
{code}
from pyspark.mllib.linalg import SparseVector
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType

df = sqlContext.createDataFrame([(SparseVector(2, {0: 0.0}),)], ["features"])
sz = udf(lambda s: s.size, IntegerType())
df.select(sz(df.features).alias("sz")).collect()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org