You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Szymkiewicz (JIRA)" <ji...@apache.org> on 2015/10/23 15:48:27 UTC

[jira] [Created] (SPARK-11283) List column gets additional level of nesting when converted to Spark DataFrame

Maciej Szymkiewicz created SPARK-11283:
------------------------------------------

             Summary: List column gets additional level of nesting when converted to Spark DataFrame
                 Key: SPARK-11283
                 URL: https://issues.apache.org/jira/browse/SPARK-11283
             Project: Spark
          Issue Type: Bug
          Components: SparkR
    Affects Versions: 1.6.0
         Environment: R 3.2.2, Spark build from master 487d409e71767c76399217a07af8de1bb0da7aa8
            Reporter: Maciej Szymkiewicz


When input data frame contains list column there is an additional level of nesting in a Spark DataFrame and as a result collected data is no longer identical to input:

{code}
ldf <- data.frame(row.names=1:2)
ldf$x <- list(list(1), list(2))
sdf <- createDataFrame(sqlContext, ldf)

printSchema(sdf)
## root
##  |-- x: array (nullable = true)
##  |    |-- element: array (containsNull = true)
##  |    |    |-- element: double (containsNull = true)

identical(ldf, collect(sdf))
## [1] FALSE
{code}

Comparing structure:

Local df

{code}
unclass(ldf)
## $x
## $x[[1]]
## $x[[1]][[1]]
## [1] 1
##
## $x[[2]]
## $x[[2]][[1]]
## [1] 2
##
## attr(,"row.names")
## [1] 1 2
{code}

Collected

{code}
unclass(collect(sdf))
## $x
## $x[[1]]
## $x[[1]][[1]]
## $x[[1]][[1]][[1]]
## [1] 1
## 
## $x[[2]]
## $x[[2]][[1]]
## $x[[2]][[1]][[1]]
## [1] 2
##
## attr(,"row.names")
## [1] 1 2
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org