You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/07/18 06:36:00 UTC
[jira] [Comment Edited] (SPARK-21450) List of NA is flattened
inside a SparkR struct type
[ https://issues.apache.org/jira/browse/SPARK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091180#comment-16091180 ]
Hyukjin Kwon edited comment on SPARK-21450 at 7/18/17 6:35 AM:
---------------------------------------------------------------
[~falaki], so, if this describes
{code}
> str(collect(sql("SELECT NULL as `jsontostructs(col)`")))
'data.frame': 1 obs. of 1 variable:
$ jsontostructs(col):List of 1
..$ : logi NA
{code}
should be
{code}
> str(collect(sql("SELECT named_struct('date', NULL) as `jsontostructs(col)`")))
'data.frame': 1 obs. of 1 variable:
$ jsontostructs(col):List of 1
..$ :List of 1
.. ..$ date: logi NA
.. ..- attr(*, "class")= chr "struct"
{code}
I think this is not an issue. I assume we documented this behaviour.
was (Author: hyukjin.kwon):
[~falaki], so, if this describes
{code}
> str(collect(sql("SELECT NULL as `jsontostructs(col)`")))
'data.frame': 1 obs. of 1 variable:
$ jsontostructs(col):List of 1
..$ : logi NA
{code}
should be
{code}
> str(collect(sql("SELECT named_struct('date', NULL) as `jsontostructs(col)`")))
'data.frame': 1 obs. of 1 variable:
$ jsontostructs(col):List of 1
..$ :List of 1
.. ..$ date: logi NA
.. ..- attr(*, "class")= chr "struct"
{code}
I think this is not an R specific issue.
> List of NA is flattened inside a SparkR struct type
> ---------------------------------------------------
>
> Key: SPARK-21450
> URL: https://issues.apache.org/jira/browse/SPARK-21450
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 2.2.0
> Reporter: Hossein Falaki
>
> Consider the following two cases copied from {{test_sparkSQL.R}}:
> {code}
> df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
> schema <- structType(structField("date", "date"))
> s1 <- collect(select(df, from_json(df$col, schema)))
> s2 <- collect(select(df, from_json(df$col, schema, dateFormat = "dd/MM/yyyy")))
> {code}
> If you inspect s1 using {{str(s1)}} you will find:
> {code}
> 'data.frame': 2 obs. of 1 variable:
> $ jsontostructs(col):List of 2
> ..$ : logi NA
> {code}
> But for s2, running {{str(s2)}} results in:
> {code}
> 'data.frame': 2 obs. of 1 variable:
> $ jsontostructs(col):List of 2
> ..$ :List of 1
> .. ..$ date: Date, format: "2014-10-21"
> .. ..- attr(*, "class")= chr "struct"
> {code}
> I assume this is not intentional and is just a subtle bug. Do you think otherwise? [~shivaram] and [~felixcheung]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org