You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/17 09:01:00 UTC
[jira] [Resolved] (SPARK-28986) from_json cannot handle data with
the same name but different case in the json string.
[ https://issues.apache.org/jira/browse/SPARK-28986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-28986.
-----------------------------------
Resolution: Invalid
Hi, [~iduanyingjie]. JSON key name should be case sensitive.
{code}
When the names within an object are not
unique, the behavior of software that receives such an object is
unpredictable. Many implementations report the last name/value pair
only. Other implementations report an error or fail to parse the
object, and some implementations report all of the name/value pairs,
including duplicates.
{code}
> from_json cannot handle data with the same name but different case in the json string.
> --------------------------------------------------------------------------------------
>
> Key: SPARK-28986
> URL: https://issues.apache.org/jira/browse/SPARK-28986
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: iduanyingjie
> Priority: Major
>
> In spark sql, when using this from_json to parse the json string of a field, the defined field information is case-sensitive, such as [a String] in this example, if I want to parse this data with id 2 It won't work. I can no longer define a column of [A String] in the schema, because the schema field name in the DataFrame is not case sensitive, I feel this is a bit contradictory.
>
> Code:
> {code:java}
> import org.apache.spark.sql.SparkSession
> object JsonCaseInsensitive {
> case class User(id: String, fields: String)
> val users = Seq(User("1", "{\"a\": \"b\"}"), User("2", "{\"A\": \"B\"}"))
> def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local").appName("JsonCaseInsensitive").getOrCreate()
> import spark.implicits._
> spark.createDataset(users)
> .selectExpr("id", "from_json(fields, 'a String')")
> .show()
> }
> }
> {code}
> Output:
> {code:java}
> +---+---------------------+
> | id|jsontostructs(fields)|
> +---+---------------------+
> | 1| [b]|
> | 2| []|
> +---+---------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org