You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/10/08 10:55:21 UTC

[jira] [Closed] (SPARK-7301) Issue with duplicated fields in interpreted json schemas

     [ https://issues.apache.org/jira/browse/SPARK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon closed SPARK-7301.
-------------------------------
    Resolution: Duplicate

It seems a duplicate of SPARK-13493 Since that JIRA has a PR open, I am closing this.

> Issue with duplicated fields in interpreted json schemas
> --------------------------------------------------------
>
>                 Key: SPARK-7301
>                 URL: https://issues.apache.org/jira/browse/SPARK-7301
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: David Crossland
>
> I have a large json dataset that has evolved over time as such some fields seem to have slight renames or have been capitalised in some way.  This means there are certain fields that spark considers ambiguous when i attempt to access them 
> i get a 
> org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(Currency,StringType,true), StructField(currency,StringType,true);
> error
> There appears to be no way to resolve an ambiguous field after its been inferred by spark sql other than to manually construct the schema using StructType/StructField which is a bit heavy handed as the schema is quite large.  Is there some way to resolve an ambiguous reference? or affect the schema post inference? It seems like something of a bug that i cant tell spark to treat both fields as though they were the same.  Ive created a test where i manually defined a schema as 
> val schema = StructType(Seq(StructField("A", StringType, true)))
> And it returns 2 rows when i perform a count on the following dataset
> {"A":"test1"}
> {"a":"test2"}
> If i could modify the schema to remove the duplicate entries then i could work around this issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org