You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/07/18 06:36:00 UTC

[jira] [Comment Edited] (SPARK-21450) List of NA is flattened inside a SparkR struct type

    [ https://issues.apache.org/jira/browse/SPARK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091180#comment-16091180 ] 

Hyukjin Kwon edited comment on SPARK-21450 at 7/18/17 6:35 AM:
---------------------------------------------------------------

[~falaki], so, if this describes 

{code}
> str(collect(sql("SELECT NULL as `jsontostructs(col)`")))
'data.frame':	1 obs. of  1 variable:
 $ jsontostructs(col):List of 1
  ..$ : logi NA
{code}

should be

{code}
> str(collect(sql("SELECT named_struct('date', NULL) as `jsontostructs(col)`")))
'data.frame':	1 obs. of  1 variable:
 $ jsontostructs(col):List of 1
  ..$ :List of 1
  .. ..$ date: logi NA
  .. ..- attr(*, "class")= chr "struct"
{code}

I think this is not an issue. I assume we documented this behaviour.


was (Author: hyukjin.kwon):
[~falaki], so, if this describes 

{code}
> str(collect(sql("SELECT NULL as `jsontostructs(col)`")))
'data.frame':	1 obs. of  1 variable:
 $ jsontostructs(col):List of 1
  ..$ : logi NA
{code}

should be

{code}
> str(collect(sql("SELECT named_struct('date', NULL) as `jsontostructs(col)`")))
'data.frame':	1 obs. of  1 variable:
 $ jsontostructs(col):List of 1
  ..$ :List of 1
  .. ..$ date: logi NA
  .. ..- attr(*, "class")= chr "struct"
{code}

I think this is not an R specific issue.

> List of NA is flattened inside a SparkR struct type
> ---------------------------------------------------
>
>                 Key: SPARK-21450
>                 URL: https://issues.apache.org/jira/browse/SPARK-21450
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Hossein Falaki
>
> Consider the following two cases copied from {{test_sparkSQL.R}}:
> {code}
> df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
> schema <- structType(structField("date", "date"))
> s1 <- collect(select(df, from_json(df$col, schema)))
> s2 <- collect(select(df, from_json(df$col, schema, dateFormat = "dd/MM/yyyy")))
> {code}
> If you inspect s1 using {{str(s1)}} you will find:
> {code}
> 'data.frame':	2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ : logi NA
> {code}
> But for s2, running {{str(s2)}} results in:
> {code}
> 'data.frame':	2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ :List of 1
>   .. ..$ date: Date, format: "2014-10-21"
>   .. ..- attr(*, "class")= chr "struct"
> {code}
> I assume this is not intentional and is just a subtle bug. Do you think otherwise? [~shivaram] and [~felixcheung]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org