You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:13:42 UTC

[jira] [Resolved] (SPARK-16205) dict -> StructType conversion is undocumented

     [ https://issues.apache.org/jira/browse/SPARK-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-16205.
----------------------------------
    Resolution: Incomplete

> dict -> StructType conversion is undocumented
> ---------------------------------------------
>
>                 Key: SPARK-16205
>                 URL: https://issues.apache.org/jira/browse/SPARK-16205
>             Project: Spark
>          Issue Type: Documentation
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>              Labels: bulk-closed
>
> According to the docs, StructType is equivalent only to python list and tuple. I accidentally returned a dict from a udf function that registered its return value as StructType.
> Expected behavior: either (1) an exception is raised (if strict type is checked); or (2) dict is treated as an iterable, resulting in a struct being created in an arbitrary order from the keys of the dict (horribly dangerous, but I'd understand).
> Actual behavior: struct was created "properly", in the sense that keys were matched to the field names of the struct, and values were used for values.
> This is wonderful, but completely undocumented as far as I can tell.
> {code}
> import pyspark.sql.functions as F
> import pyspark.sql.types as T
> df = sqlContext.createDataFrame([(1,), (2,)], ['value'])
> fields = 'abcdefgh'
> def udf(type_):
>   def to_udf(func):
>     return F.udf(func, type_)
>   return to_udf
> struct = T.StructType()
> for c in fields:
>   struct.add(c, T.StringType())
> @udf(struct)
> def f(row):
>   d = dict(zip(fields, fields.upper()))
>   return d
> df.select(f('value')).show()
> # output is unexpectedly meaningful, with uppercase letters as values
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org