You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Crossland (JIRA)" <ji...@apache.org> on 2015/05/01 14:21:05 UTC

[jira] [Created] (SPARK-7301) Issue with duplicated fields in interpreted json schemas

David Crossland created SPARK-7301:
--------------------------------------

             Summary: Issue with duplicated fields in interpreted json schemas
                 Key: SPARK-7301
                 URL: https://issues.apache.org/jira/browse/SPARK-7301
             Project: Spark
          Issue Type: Bug
            Reporter: David Crossland


I have a large json dataset that has evolved over time as such some fields seem to have slight renames or have been capitalised in some way.  This means there are certain fields that spark considers ambiguous when i attempt to access them 

i get a 

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(Currency,StringType,true), StructField(currency,StringType,true);

error

There appears to be no way to resolve an ambiguous field after its been inferred by spark sql other than to manually construct the schema using StructType/StructField which is a bit heavy handed as the schema is quite large.  Is there some way to resolve an ambiguous reference? or affect the schema post inference? It seems like something of a bug that i cant tell spark to treat both fields as though they were the same.  Ive created a test where i manually defined a schema as 

val schema = StructType(Seq(StructField("A", StringType, true)))

And it returns 2 rows when i perform a count on the following dataset

{"A":"test1"}
{"a":"test2"}

If i could modify the schema to remove the duplicate entries then i could work around this issue.  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org