You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Henri DF (JIRA)" <ji...@apache.org> on 2015/12/01 03:15:11 UTC
[jira] [Comment Edited] (SPARK-11941) JSON representation of nested StructTypes could be more uniform

    [ https://issues.apache.org/jira/browse/SPARK-11941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032931#comment-15032931 ] 

Henri DF edited comment on SPARK-11941 at 12/1/15 2:14 AM:
-----------------------------------------------------------

I think "might be nicer if it was flat' is a bit of an understatement  

The current representation isn't of much use with nested structs. If it's hard to fix, wouldn't it be better to make this private rather than leave exposed it in its current state? 


was (Author: henridf):
I think "might be nicer if it was flat' is a bit of an understatement  

The current representation isn't of much use with nested structs. If it's hard to fix, would it be better to remove this than leave it in its current state? 

> JSON representation of nested StructTypes could be more uniform
> ---------------------------------------------------------------
>
>                 Key: SPARK-11941
>                 URL: https://issues.apache.org/jira/browse/SPARK-11941
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Henri DF
>
> I have a json file with a single row {code}{"a":1, "b": 1.0, "c": "asdfasd", "d":[1, 2, 4]}{code} After reading that file in, the schema is correctly inferred:
> {code}
> scala> df.printSchema
> root
>  |-- a: long (nullable = true)
>  |-- b: double (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: array (nullable = true)
>  |    |-- element: long (containsNull = true)
> {code}
> However, the json representation has a strange nesting under "type" for column "d":
> {code}
> scala> df.collect()(0).schema.prettyJson
> res60: String = 
> {
>   "type" : "struct",
>   "fields" : [ {
>     "name" : "a",
>     "type" : "long",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "b",
>     "type" : "double",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "c",
>     "type" : "string",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "d",
>     "type" : {
>       "type" : "array",
>       "elementType" : "long",
>       "containsNull" : true
>     },
>     "nullable" : true,
>     "metadata" : { }
>   }]
> }
> {code}
> Specifically, in the last element, "type" is an object instead of being a string. I would expect the last element to be:
> {code}
>       {
>          "name":"d",
>          "type":"array",
>          "elementType":"long",
>          "containsNull":true,
>          "nullable":true,
>          "metadata":{}
>       }
> {code}
> There's a similar issue for nested structs.
> (I ran into this while writing node.js bindings, wanted to recurse down this representation, which would be nicer if it was uniform...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org