You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cazen Lee (JIRA)" <ji...@apache.org> on 2016/01/02 14:03:39 UTC
[jira] [Commented] (SPARK-12537) Add option to accept quoting of
all character backslash quoting mechanism
[ https://issues.apache.org/jira/browse/SPARK-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076510#comment-15076510 ]
Cazen Lee commented on SPARK-12537:
-----------------------------------
Happy New Year!
The situation seemed to require further discussion
Tell me what I can to help on this issue
Thank you
> Add option to accept quoting of all character backslash quoting mechanism
> -------------------------------------------------------------------------
>
> Key: SPARK-12537
> URL: https://issues.apache.org/jira/browse/SPARK-12537
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Cazen Lee
> Assignee: Apache Spark
>
> We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.
> For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record
> {code:title=JSON File|borderStyle=solid}
> {"name": "Cazen Lee", "price": "$10"}
> {"name": "John Doe", "price": "\$20"}
> {"name": "Tracy", "price": "$10"}
> {code}
> corrupt_record(returns null)
> {code}
> scala> df.show
> +--------------------+---------+-----+
> | _corrupt_record| name|price|
> +--------------------+---------+-----+
> | null|Cazen Lee| $10|
> |{"name": "John Do...| null| null|
> | null| Tracy| $10|
> +--------------------+---------+-----+
> {code}
> And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below
> {code}
> scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
> df: org.apache.spark.sql.DataFrame = [name: string, price: string]
> scala> df.show
> +---------+-----+
> | name|price|
> +---------+-----+
> |Cazen Lee| $10|
> | John Doe| $20|
> | Tracy| $10|
> +---------+-----+
> {code}
> This issue similar to HIVE-11825, HIVE-12717.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org