You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "TANG ZHAO (Jira)" <ji...@apache.org> on 2022/03/18 11:01:00 UTC
[jira] [Commented] (SPARK-38599) support load json file in case-insensitive way
[ https://issues.apache.org/jira/browse/SPARK-38599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508719#comment-17508719 ]
TANG ZHAO commented on SPARK-38599:
-----------------------------------
I'd like to contribute to this issue.
> support load json file in case-insensitive way
> ----------------------------------------------
>
> Key: SPARK-38599
> URL: https://issues.apache.org/jira/browse/SPARK-38599
> Project: Spark
> Issue Type: New Feature
> Components: Input/Output, SQL
> Affects Versions: 3.1.1
> Reporter: TANG ZHAO
> Priority: Major
>
> The task is to load json files into dataFrame.
>
> Currently we use this method:
> // textfile is rdd[string], read from json files
> val table = spark.table(hiveTableName)
> val hiveSchema = table.schema
> var df = spark.read.option("mode", "DROPMALFORMED").schema(hiveSchema).json(textfile)
>
> The problem is that the field in hiveSchema is all in lower-case, however the field of json string have upper case.
> For example:
> hive schema:
> (id bigint, name string)
>
> json string
> {"Id":123, "Name":"Tom"}
>
> in this case, the json string will not be loaded into dataFrame
> I have to use the schema of hive table, due to business requirement, that's the pre-condition.
> currently I have to transform the key in json string to lower case, like \{"id":123, "name":"Tom"}
>
> but I was wondering if there's any better solution for this issue?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org