You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruifeng Zheng (Jira)" <ji...@apache.org> on 2023/01/21 02:32:00 UTC
[jira] [Resolved] (SPARK-41987) createDataFrame supports column with map type.
[ https://issues.apache.org/jira/browse/SPARK-41987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruifeng Zheng resolved SPARK-41987.
-----------------------------------
Resolution: Resolved
> createDataFrame supports column with map type.
> ----------------------------------------------
>
> Key: SPARK-41987
> URL: https://issues.apache.org/jira/browse/SPARK-41987
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: jiaan.geng
> Priority: Major
>
> Currently, Connect API createDataFrame does not support create dataframe with map type.
> For example,
> {code:java}
> >>> df = spark.createDataFrame(
> ... [(1, ["foo", "bar"], {"x": 1.0}), (2, [], {}), (3, None, None)],
> ... ("id", "an_array", "a_map")
> ... )
> {code}
> The above code want create a dataframe with column 'a_map' which is map type.
> But pyarrow recognize {"x": 1.0} as a struct not map.
> pyarrow supports map with format [('x', 1.0)]
> Because the data frame's schema is not correct, so the other sequence operator will be impacted.
> For example:
> {code:java}
> df.select("id", "a_map", posexplode_outer("an_array")).show()
> {code}
> Expected:
> {code:java}
> +---+----------+----+----+
> | id| a_map| pos| col|
> +---+----------+----+----+
> | 1|{x -> 1.0}| 0| foo|
> | 1|{x -> 1.0}| 1| bar|
> | 2| {}|null|null|
> | 3| null|null|null|
> +---+----------+----+----+
> {code}
> Got:
> {code:java}
> +---+------+----+----+
> | id| a_map| pos| col|
> +---+------+----+----+
> | 1| {1.0}| 0| foo|
> | 1| {1.0}| 1| bar|
> | 2|{null}|null|null|
> | 3| null|null|null|
> +---+------+----+----+
> <BLANKLINE>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org