You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mukul Murthy (JIRA)" <ji...@apache.org> on 2019/06/13 23:39:00 UTC
[jira] [Created] (SPARK-28043) Reading json with duplicate columns
drops the first column value
Mukul Murthy created SPARK-28043:
------------------------------------
Summary: Reading json with duplicate columns drops the first column value
Key: SPARK-28043
URL: https://issues.apache.org/jira/browse/SPARK-28043
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.0
Reporter: Mukul Murthy
When reading a JSON blob with duplicate fields, Spark appears to ignore the value of the first one. JSON recommends unique names but does not require it; since JSON and Spark SQL both allow duplicate field names, we should fix the bug where the first column value is getting dropped.
Repro (Python, 2.4):
>>> jsonRDD = spark.sparkContext.parallelize(["\{ \"a\": \"blah\", \"a\": \"blah2\"}"])
>>> df = spark.read.json(jsonRDD)
>>> df.show()
+----+-----+
| a| a|
+----+-----+
|null|blah2|
+----+-----+
The expected response would be:
+----+-----+
| a| a|
+----+-----+
|blah|blah2|
+----+-----+
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org