You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Cazen <gi...@git.apache.org> on 2015/12/28 14:07:23 UTC
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
GitHub user Cazen opened a pull request:
https://github.com/apache/spark/pull/10496
[SPARK-12537] [SQL] Add option to accept quoting of all character backslash quoting mechanism
We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Cazen/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10496.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10496
----
commit 93a52b5916f12772f758efae368f87ff3730312e
Author: cazen.lee <ca...@samsung.com>
Date: 2015-12-28T07:42:50Z
Add json property(ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER)
commit 8d0f52ab75fe41789ef6cea0ad3e80cb96631bd9
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T07:56:52Z
chz ignore to test
commit 16997c794766befb57f277f3f379800624ec2c6f
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T07:59:22Z
chz test1
commit cc4ad3b018bab6304f05e0bc4c98fa213e2083e3
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T08:01:13Z
chz test2
commit 9b5bf958a944fca55c1978aa951c08cb81dd325e
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T08:02:19Z
chz test3
commit a8abb179e73e49b6e09a46d6ac023d7e2cd6dd13
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T08:03:25Z
chz test4
commit c757fe88275285efe3650e33de2f1340ea8581ef
Author: Cazen <ca...@korea.com>
Date: 2015-12-28T08:05:44Z
chz test5
commit 41b9231cf68fb7a1cfc72fc07f66eb9f16d8194e
Author: Cazen Lee <ca...@samsung.com>
Date: 2015-12-28T13:05:07Z
Merge pull request #1 from Cazen/testCazen
Test cazen
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10496#discussion_r48586694
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala ---
@@ -111,4 +111,21 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext {
assert(df.schema.head.name == "age")
assert(df.first().getDouble(0).isNaN)
}
+
+ test("allowBackslashEscapingAnyCharacter off") {
+ val str = """{"name": "Cazen Lee", "price": "\$10"}"""
+ val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+ val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "false").json(rdd)
+
+ assert(df.schema.head.name == "_corrupt_record")
+ }
+
+ test("allowBackslashEscapingAnyCharacter on") {
+ val str = """{"name": "Cazen Lee", "price": "\$10"}"""
+ val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+ val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json(rdd)
+
+ assert(df.schema.head.name == "name")
+ assert(df.first().getString(0) == "Cazen Lee")
--- End diff --
should we also test the price field?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on the pull request:
https://github.com/apache/spark/pull/10496#issuecomment-167938272
Hi Xin Thank you for review
I've created PR(11496, this PR) but it doesn't connect with jira(SPARK-12537) so I've closed.
After that, I recreated PR(11497) but linked 11496 in the jira instead of 11497 automatically.
Should I reopen this PR and close new one(11497)?
I'm sorry about confusing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on the pull request:
https://github.com/apache/spark/pull/10496#issuecomment-167571045
For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record
JSON File
{"name": "Cazen Lee", "price": "$10"}
{"name": "John Doe", "price": "\$20"}
{"name": "Tracy", "price": "$10"}
corrupt_record(returns null)
scala> df.show
+--------------------+---------+-----+
| _corrupt_record| name|price|
+--------------------+---------+-----+
| null|Cazen Lee| $10|
|{"name": "John Do...| null| null|
| null| Tracy| $10|
+--------------------+---------+-----+
And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
df: org.apache.spark.sql.DataFrame = [name: string, price: string]
scala> df.show
+---------+-----+
| name|price|
+---------+-----+
|Cazen Lee| $10|
| John Doe| $20|
| Tracy| $10|
+---------+-----+
This issue similar to HIVE-11825, HIVE-12717.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10496#issuecomment-167939054
It's fine to use https://github.com/apache/spark/pull/10497
Just update it there.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10496#issuecomment-167566226
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10496#discussion_r48587091
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala ---
@@ -111,4 +111,21 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext {
assert(df.schema.head.name == "age")
assert(df.first().getDouble(0).isNaN)
}
+
+ test("allowBackslashEscapingAnyCharacter off") {
+ val str = """{"name": "Cazen Lee", "price": "\$10"}"""
+ val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+ val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "false").json(rdd)
+
+ assert(df.schema.head.name == "_corrupt_record")
+ }
+
+ test("allowBackslashEscapingAnyCharacter on") {
+ val str = """{"name": "Cazen Lee", "price": "\$10"}"""
+ val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+ val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json(rdd)
+
+ assert(df.schema.head.name == "name")
+ assert(df.first().getString(0) == "Cazen Lee")
--- End diff --
You right. It needeed
I'll modify test code soon
Thz
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10496#issuecomment-167936614
@Cazen how come you closed the pull request?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...
Posted by Cazen <gi...@git.apache.org>.
Github user Cazen closed the pull request at:
https://github.com/apache/spark/pull/10496
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org