You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Szymkiewicz (JIRA)" <ji...@apache.org> on 2015/10/23 15:48:27 UTC
[jira] [Created] (SPARK-11283) List column gets additional level of
nesting when converted to Spark DataFrame
Maciej Szymkiewicz created SPARK-11283:
------------------------------------------
Summary: List column gets additional level of nesting when converted to Spark DataFrame
Key: SPARK-11283
URL: https://issues.apache.org/jira/browse/SPARK-11283
Project: Spark
Issue Type: Bug
Components: SparkR
Affects Versions: 1.6.0
Environment: R 3.2.2, Spark build from master 487d409e71767c76399217a07af8de1bb0da7aa8
Reporter: Maciej Szymkiewicz
When input data frame contains list column there is an additional level of nesting in a Spark DataFrame and as a result collected data is no longer identical to input:
{code}
ldf <- data.frame(row.names=1:2)
ldf$x <- list(list(1), list(2))
sdf <- createDataFrame(sqlContext, ldf)
printSchema(sdf)
## root
## |-- x: array (nullable = true)
## | |-- element: array (containsNull = true)
## | | |-- element: double (containsNull = true)
identical(ldf, collect(sdf))
## [1] FALSE
{code}
Comparing structure:
Local df
{code}
unclass(ldf)
## $x
## $x[[1]]
## $x[[1]][[1]]
## [1] 1
##
## $x[[2]]
## $x[[2]][[1]]
## [1] 2
##
## attr(,"row.names")
## [1] 1 2
{code}
Collected
{code}
unclass(collect(sdf))
## $x
## $x[[1]]
## $x[[1]][[1]]
## $x[[1]][[1]][[1]]
## [1] 1
##
## $x[[2]]
## $x[[2]][[1]]
## $x[[2]][[1]][[1]]
## [1] 2
##
## attr(,"row.names")
## [1] 1 2
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org