You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Cazen <gi...@git.apache.org> on 2015/12/28 14:07:23 UTC

[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

GitHub user Cazen opened a pull request:

    https://github.com/apache/spark/pull/10496

    [SPARK-12537] [SQL] Add option to accept quoting of all character backslash quoting mechanism

    We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Cazen/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10496
    
----
commit 93a52b5916f12772f758efae368f87ff3730312e
Author: cazen.lee <ca...@samsung.com>
Date:   2015-12-28T07:42:50Z

    Add json property(ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER)

commit 8d0f52ab75fe41789ef6cea0ad3e80cb96631bd9
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T07:56:52Z

    chz ignore to test

commit 16997c794766befb57f277f3f379800624ec2c6f
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T07:59:22Z

    chz test1

commit cc4ad3b018bab6304f05e0bc4c98fa213e2083e3
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T08:01:13Z

    chz test2

commit 9b5bf958a944fca55c1978aa951c08cb81dd325e
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T08:02:19Z

    chz test3

commit a8abb179e73e49b6e09a46d6ac023d7e2cd6dd13
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T08:03:25Z

    chz test4

commit c757fe88275285efe3650e33de2f1340ea8581ef
Author: Cazen <ca...@korea.com>
Date:   2015-12-28T08:05:44Z

    chz test5

commit 41b9231cf68fb7a1cfc72fc07f66eb9f16d8194e
Author: Cazen Lee <ca...@samsung.com>
Date:   2015-12-28T13:05:07Z

    Merge pull request #1 from Cazen/testCazen
    
    Test cazen

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10496#discussion_r48586694
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala ---
    @@ -111,4 +111,21 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext {
         assert(df.schema.head.name == "age")
         assert(df.first().getDouble(0).isNaN)
       }
    +
    +  test("allowBackslashEscapingAnyCharacter off") {
    +    val str = """{"name": "Cazen Lee", "price": "\$10"}"""
    +    val rdd = sqlContext.sparkContext.parallelize(Seq(str))
    +    val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "false").json(rdd)
    +
    +    assert(df.schema.head.name == "_corrupt_record")
    +  }
    +
    +  test("allowBackslashEscapingAnyCharacter on") {
    +    val str = """{"name": "Cazen Lee", "price": "\$10"}"""
    +    val rdd = sqlContext.sparkContext.parallelize(Seq(str))
    +    val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json(rdd)
    +
    +    assert(df.schema.head.name == "name")
    +    assert(df.first().getString(0) == "Cazen Lee")
    --- End diff --
    
    should we also test the price field?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on the pull request:

    https://github.com/apache/spark/pull/10496#issuecomment-167938272
  
    Hi Xin Thank you for review
    
    I've created PR(11496, this PR) but it doesn't connect with jira(SPARK-12537) so I've closed.
    
    After that, I recreated PR(11497) but linked 11496 in the jira instead of 11497 automatically.
    
    Should I reopen this PR and close new one(11497)?
    
    I'm sorry about confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on the pull request:

    https://github.com/apache/spark/pull/10496#issuecomment-167571045
  
    For example, if JSON file that includes not listed by JSON backslash quoting specification, it returns corrupt_record
    JSON File
    {"name": "Cazen Lee", "price": "$10"}
    {"name": "John Doe", "price": "\$20"}
    {"name": "Tracy", "price": "$10"}
    corrupt_record(returns null)
    scala> df.show
    +--------------------+---------+-----+
    |     _corrupt_record|     name|price|
    +--------------------+---------+-----+
    |                null|Cazen Lee|  $10|
    |{"name": "John Do...|     null| null|
    |                null|    Tracy|  $10|
    +--------------------+---------+-----+
    And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like below
    scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
    df: org.apache.spark.sql.DataFrame = [name: string, price: string]
    
    scala> df.show
    +---------+-----+
    |     name|price|
    +---------+-----+
    |Cazen Lee|  $10|
    | John Doe|  $20|
    |    Tracy|  $10|
    +---------+-----+
    This issue similar to HIVE-11825, HIVE-12717.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10496#issuecomment-167939054
  
    It's fine to use https://github.com/apache/spark/pull/10497
    
    Just update it there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10496#issuecomment-167566226
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by Cazen <gi...@git.apache.org>.
Github user Cazen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10496#discussion_r48587091
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala ---
    @@ -111,4 +111,21 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext {
         assert(df.schema.head.name == "age")
         assert(df.first().getDouble(0).isNaN)
       }
    +
    +  test("allowBackslashEscapingAnyCharacter off") {
    +    val str = """{"name": "Cazen Lee", "price": "\$10"}"""
    +    val rdd = sqlContext.sparkContext.parallelize(Seq(str))
    +    val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "false").json(rdd)
    +
    +    assert(df.schema.head.name == "_corrupt_record")
    +  }
    +
    +  test("allowBackslashEscapingAnyCharacter on") {
    +    val str = """{"name": "Cazen Lee", "price": "\$10"}"""
    +    val rdd = sqlContext.sparkContext.parallelize(Seq(str))
    +    val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json(rdd)
    +
    +    assert(df.schema.head.name == "name")
    +    assert(df.first().getString(0) == "Cazen Lee")
    --- End diff --
    
    You right. It needeed
    I'll modify test code soon
    Thz


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10496#issuecomment-167936614
  
    @Cazen how come you closed the pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12537] [SQL] Add option to accept quoti...

Posted by Cazen <gi...@git.apache.org>.
Github user Cazen closed the pull request at:

    https://github.com/apache/spark/pull/10496


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org