You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Benton (JIRA)" <ji...@apache.org> on 2014/11/01 22:27:33 UTC

[jira] [Commented] (SPARK-4185) JSON schema inference failed when dealing with type conflicts in arrays

    [ https://issues.apache.org/jira/browse/SPARK-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193491#comment-14193491 ] 

William Benton commented on SPARK-4185:
---------------------------------------

I'm actually not sure this is a bug!  My main concern in this case is that inferring any typing for this collection of objects makes it very difficult to write meaningful queries.  In the fedmsg case, the problem was that the source data overloaded the meaning of a field name, so I was able to preprocess the fields to do the renaming.  I was thinking that maybe a good solution might be to have Spark SQL automatically rename fields with conflicting types in different records (e.g. to “branches_1” and “branches_2” in this case).

> JSON schema inference failed when dealing with type conflicts in arrays
> -----------------------------------------------------------------------
>
>                 Key: SPARK-4185
>                 URL: https://issues.apache.org/jira/browse/SPARK-4185
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> {code}
> val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
> val diverging = sparkContext.parallelize(List("""{"branches": ["foo"]}""", """{"branches": [{"foo":42}]}"""))
> sqlContext.jsonRDD(diverging)  // throws a MatchError
> {code}
> The case is from http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org