You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "TANG ZHAO (Jira)" <ji...@apache.org> on 2022/03/18 11:01:00 UTC

[jira] [Commented] (SPARK-38599) support load json file in case-insensitive way

    [ https://issues.apache.org/jira/browse/SPARK-38599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508719#comment-17508719 ] 

TANG ZHAO commented on SPARK-38599:
-----------------------------------

I'd like to contribute to this issue.

> support load json file in case-insensitive way
> ----------------------------------------------
>
>                 Key: SPARK-38599
>                 URL: https://issues.apache.org/jira/browse/SPARK-38599
>             Project: Spark
>          Issue Type: New Feature
>          Components: Input/Output, SQL
>    Affects Versions: 3.1.1
>            Reporter: TANG ZHAO
>            Priority: Major
>
> The task is to load json files into dataFrame.
>  
> Currently we use this method:
> // textfile is rdd[string], read from json files
> val table = spark.table(hiveTableName)
> val hiveSchema = table.schema
> var df = spark.read.option("mode", "DROPMALFORMED").schema(hiveSchema).json(textfile)
>  
> The problem is that the field in hiveSchema is all in lower-case,  however the field of json string have upper case. 
> For example:
> hive schema:
> (id  bigint,  name string)
>  
> json string
> {"Id":123, "Name":"Tom"}
>  
> in this case,  the json string will not be loaded into dataFrame
> I have to use the schema of hive table, due to business requirement, that's the pre-condition.
> currently I have to transform the key in json string to lower case, like \{"id":123, "name":"Tom"}
>  
> but I was wondering if there's any better solution for this issue?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org