You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/17 09:01:00 UTC

[jira] [Resolved] (SPARK-28986) from_json cannot handle data with the same name but different case in the json string.

     [ https://issues.apache.org/jira/browse/SPARK-28986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-28986.
-----------------------------------
    Resolution: Invalid

Hi, [~iduanyingjie]. JSON key name should be case sensitive.

{code}
   When the names within an object are not
   unique, the behavior of software that receives such an object is
   unpredictable.  Many implementations report the last name/value pair
   only.  Other implementations report an error or fail to parse the
   object, and some implementations report all of the name/value pairs,
   including duplicates.
{code}

> from_json cannot handle data with the same name but different case in the json string.
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-28986
>                 URL: https://issues.apache.org/jira/browse/SPARK-28986
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: iduanyingjie
>            Priority: Major
>
> In spark sql, when using this from_json to parse the json string of a field, the defined field information is case-sensitive, such as [a String] in this example, if I want to parse this data with id 2 It won't work. I can no longer define a column of [A String] in the schema, because the schema field name in the DataFrame is not case sensitive, I feel this is a bit contradictory.
>  
> Code:
> {code:java}
> import org.apache.spark.sql.SparkSession
> object JsonCaseInsensitive {  
>   case class User(id: String, fields: String)
>   val users = Seq(User("1", "{\"a\": \"b\"}"), User("2", "{\"A\": \"B\"}"))
>   def main(args: Array[String]): Unit = {
>     val spark = SparkSession.builder().master("local").appName("JsonCaseInsensitive").getOrCreate()
>     import spark.implicits._
>     spark.createDataset(users)
>          .selectExpr("id", "from_json(fields, 'a String')")
>          .show()
>   }
> }
> {code}
>  Output:
> {code:java}
> +---+---------------------+
> | id|jsontostructs(fields)|
> +---+---------------------+
> |  1|                  [b]|
> |  2|                   []|
> +---+---------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org