You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuriy Davygora (JIRA)" <ji...@apache.org> on 2018/08/27 08:13:00 UTC

[jira] [Commented] (SPARK-25227) Extend functionality of to_json to support arrays of differently-typed elements

    [ https://issues.apache.org/jira/browse/SPARK-25227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593286#comment-16593286 ] 

Yuriy Davygora commented on SPARK-25227:
----------------------------------------

[~hyukjin.kwon] I only know, that in the upcoming release, the from_json function will support arrays of primitive types. I don't know about to_json. Maybe [~maxgekk] can comment more on that.

As for the code, I cannot provide you with a snippet, because I cannot even reach the stage where I would use the to_json function. Let's say I have a dataframe with two columns: "string" which is of type string and "int" which is of integer type. I cannot even do:

{noformat}
df = df.withColumn("new_column", F.array("string", "int"))
{noformat}

because Spark ArrayType does not support elements of different type. If this were otherwise, next step for me would have been smth. like

{noformat}
df.select(F.to_json("new_column"))
{noformat}

See also [[SPARK-25226]]

> Extend functionality of to_json to support arrays of differently-typed elements
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-25227
>                 URL: https://issues.apache.org/jira/browse/SPARK-25227
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Yuriy Davygora
>            Priority: Minor
>
> At the moment, the 'to_json' function only supports a STRUCT or an ARRAY of STRUCTS as input. Support for ARRAY of primitives is, apparently, coming with Spark 2.4, but it will only support arrays of elements of same data type. It will not, for example, support JSON-arrays like
> {noformat}
> ["string_value", 0, true, null]
> {noformat}
> which is JSON-valid with schema
> {noformat}
> {"containsNull":true,"elementType":["string","integer","boolean"],"type":"array"}
> {noformat}
> We would like to kindly ask you to add support for different-typed element arrays in the 'to_json' function. This will necessitate extending the functionality of ArrayType or maybe adding a new type (refer to [[SPARK-25225]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org