You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/09/08 05:19:00 UTC

[jira] [Resolved] (SPARK-25225) Add support for "List"-Type columns

     [ https://issues.apache.org/jira/browse/SPARK-25225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-25225.
----------------------------------
    Resolution: Won't Fix

Let me leave this resolve for now. I don't think we will add this type.

> Add support for "List"-Type columns
> -----------------------------------
>
>                 Key: SPARK-25225
>                 URL: https://issues.apache.org/jira/browse/SPARK-25225
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Yuriy Davygora
>            Priority: Minor
>
> At the moment, Spark Dataframe ArrayType-columns only support all elements of the array being of same data type.
> At our company, we are currently rewriting old MapReduce code with Spark. One of the frequent use-cases is aggregating data into timeseries:
> Example input:
> {noformat}
> ID	date		data
> 1	2017-01-01	data_1_1
> 1	2018-02-02	data_1_2
> 2	2017-03-03	data_2_1
> 3	2018-04-04	data 2_2
> ...
> {noformat}
> Expected outpus:
> {noformat}
> ID	timeseries
> 1	[[2017-01-01, data_1_1],[2018-02-02, data1_2]]
> 2	[[2017-03-03, data_2_1],[2018-04-04, data2_2]]
> ...
> {noformat}
> Here, the values in the data column of the input are, in most cases, not primitive, but, for example, lists, dicts, nested lists, etc. Spark, however, does not support creating an array column of a string column and a non-string column.
> We would like to kindly ask you to implement one of the following:
> 1. Extend ArrayType to support elements of different data type
> 2. Introduce a new container type (ListType?) which would support elements of different type
> UPDATE: The background here is, that I want to be able to parse JSON-arrays of differently-typed elements into SPARK Dataframe columns, as well as create JSON arrays from such columns. See also [[SPARK-25226]] and [[SPARK-25227]]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org