You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Crossland (JIRA)" <ji...@apache.org> on 2015/05/01 14:21:05 UTC
[jira] [Created] (SPARK-7301) Issue with duplicated fields in
interpreted json schemas
David Crossland created SPARK-7301:
--------------------------------------
Summary: Issue with duplicated fields in interpreted json schemas
Key: SPARK-7301
URL: https://issues.apache.org/jira/browse/SPARK-7301
Project: Spark
Issue Type: Bug
Reporter: David Crossland
I have a large json dataset that has evolved over time as such some fields seem to have slight renames or have been capitalised in some way. This means there are certain fields that spark considers ambiguous when i attempt to access them
i get a
org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(Currency,StringType,true), StructField(currency,StringType,true);
error
There appears to be no way to resolve an ambiguous field after its been inferred by spark sql other than to manually construct the schema using StructType/StructField which is a bit heavy handed as the schema is quite large. Is there some way to resolve an ambiguous reference? or affect the schema post inference? It seems like something of a bug that i cant tell spark to treat both fields as though they were the same. Ive created a test where i manually defined a schema as
val schema = StructType(Seq(StructField("A", StringType, true)))
And it returns 2 rows when i perform a count on the following dataset
{"A":"test1"}
{"a":"test2"}
If i could modify the schema to remove the duplicate entries then i could work around this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org